SlideShare a Scribd company logo
RL Adventure
TO THE RAINBOW
성태경 양홍선 이의령 김예찬
DQN, Double DQN & Dueling DQN
PER and NoisyNet
Distributional RL
RAINBOW
성태경 양홍선 이의령 김예찬
RL Adventure
DQN, Double DQN & Duel DQN
성태경
OUTLINE
2
DQN
Double
DQN
Dueling
DQN
PER
C51NoisyNet Rainbow + 구현
OUTLINE
3
DQN
Double
DQN
Dueling
DQN
PER
C51NoisyNet Rainbow + 구현
RL APPLICATIONS
[Atari]
[Robotics] [Autonomous driving]
[Mario] [Pommerman] [Go]
4
HIGH-LEVEL PROCESS
5
[Decision]
[Pixel information]
reward
HIGH-LEVEL PROCESS
6
[Decision]
[Pixel information]
[
………
]
[Input values]
PREPROCESS
[Neural networks]
MLPs, CNNs, RNNs, …
TRAINING
DQN, 

Double DQN, 

DDQN, …
[Objective function]
reward
DEEP Q-NETWORK (DQN)
7
DQN
NEURAL NETWORKS IN ONE SLIDE
8
Weight 연산
Backpropagation
Non-linear function
DQN
NEURAL NETWORKS IN ONE SLIDE
9
Convolutional neural network
Max-pooling
Softmax
Weight 연산
Backpropagation
Non-linear function
DQN
Q-LEARNING
‣ 목적: 현재의 상황에서 어떤 행동을 하는 것이 가장 좋은지
V. Minh, et al. Playing Atari with Deep Reinforcement Learning. NIPS, 2013
C. J. C. H. Watkins, P. Dayan. Q-learning. 1992.
Qnew
(st, at) ← (1 − α)Q(st, at) + α(rt+γmaxaQ(st+1, at))
[Value iteration update]
Qπ(s, a) = 𝔼[
∞
∑
t=0
γt
R(xt, at)], γ ∈ (0,1)
[Expected rewards]
10
다음 state의 reward값현재의 reward값
DQN
MOTIVATION
11
V. Minh, et al. Playing Atari with Deep Reinforcement Learning. NIPS, 2013
Q(s, a) → → Q(s, a; θ)
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′Q(s′, a′; θi) − Q(s, a; θi))
2
]
목표값
TD error
예측값
뉴럴 네트워크로 Q함수를 근사화
DQN
PROBLEM
12
‣ Unstable update
‣ 입력 데이터간의 high correlations
V. Minh, et al. Playing Atari with Deep Reinforcement Learning. NIPS, 2013
https://curt-park.github.io/2018-05-17/dqn/
‣ Non-stationary targets (같은 네트워크 파라미터)
[Objective function]
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′Q(s′, a′; θi) − Q(s, a; θi))
2
]
DQN
SOLUTION
13
‣ Experience replay
Matiisen, Tambet Demystifying Deep Reinforcement Learning. Computational Neuroscience LAB. 2015.
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature, 2015.
→ {s1, a1, r1, s2, …, sT−1, aT, rT−1, sT}
{
[Buffer]
Training
sampling
Episode
Experience
DQN
SOLUTION
14
‣ Experience replay
Matiisen, Tambet Demystifying Deep Reinforcement Learning. Computational Neuroscience LAB. 2015.
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature, 2015.
→ {s1, a1, r1, s2, …, sT−1, aT, rT−1, sT}
{
[Buffer]
Training
sampling
Episode
Experience
‣ Fixed Q-targets
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′
̂Q(s′, a′; θ−
i ) − Q(s, a; θi))
2
]
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′Q(s′, a′; θi) − Q(s, a; θi))
2
]
[Objective function]
DQN
DQN - PREPROCESSING
15
210
160
https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/
[
………
]
DQN
DQN - PREPROCESSING
16
https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/
사이즈를 줄이고 grayscale로 바꿈으로써 입력데이터 사이즈를 줄이자
84
84
[
………
]
DQN
DQN - PREPROCESSING
17
https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/
Parameter: 4
x1
x3 x5
x2
x7
x4
x6
s1 = (x1, x2, x3, x4)
s2 = (x2, x3, x4, x5)
[Input]: (84 x 84 x 4)
[
………
]
Frame skipping
DQN
DQN - PREPROCESSING
18
https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/
DeepMind took the component-wise maximum over two consecutive frames (Atari setting)
[
………
]
Frame skipping
DQN
IMPLEMENTATION
19
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
20
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
21
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
22
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
23
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
24
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
25
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
26
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DQN
IMPLEMENTATION
27
https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb
DOUBLE DQN
28
DOUBLE Q-LEARNING
MOTIVATION
‣ DQN의 문제:
van Hasselt H., Guez A. and Silver D. Deep reinforcement learning with double Q-learning, AAAI, 2015
van Hasselt H., Double Q-learning, NIPS, 2011
29
Q(s, a) = r(s, a) + γmaxaQ(s′, a)
Q-target Accumulated rewards Maximum Q-value of next state
Overestimating the action values. What if the environment is noisy?
DOUBLE Q-LEARNING
MOTIVATION
‣ DQN의 문제:
van Hasselt H., Guez A. and Silver D. Deep reinforcement learning with double Q-learning, AAAI, 2015
van Hasselt H., Double Q-learning, NIPS, 2011
30
Q(s, a) = r(s, a) + γmaxaQ(s′, a)
Q-target Accumulated rewards Maximum Q-value of next state
Overestimating the action values.
‣ 해결:
Q(s, a) = r(s, a)+γQ(s′, argmaxaQ(s′, a))
DQN Network choose action for next state
What if the environment is noisy?
DOUBLE Q-LEARNING
IMPLEMENTATION
https://github.com/higgsfield/RL-Adventure/blob/master/2.double%20dqn.ipynb
31
DOUBLE Q-LEARNING
IMPLEMENTATION
https://github.com/higgsfield/RL-Adventure/blob/master/2.double%20dqn.ipynb
32
DOUBLE Q-LEARNING
IMPLEMENTATION
https://github.com/higgsfield/RL-Adventure/blob/master/2.double%20dqn.ipynb
33
DUELING DQN
34
DUELING DQN
MOTIVATION
35
Q(s, a) = V(s) + A(s, a)
[Q-value decomposition] State value Advantage value
‣ 현재 state의 가치에 비교가치로 정보를 추가한다
‣ 가치의 차이(advantage value) —> 더 빠른 학습속도
하나의 action 값만 반영

다른 actions는 그대로
선택한 하나의 action보다 

얼마나 더 좋은지(비교)를 나타낸다
DUELING DQN
IMPLEMENTATION
https://github.com/higgsfield/RL-Adventure/blob/master/3.dueling%20dqn.ipynb
36
DUELING DQN
IMPLEMENTATION
https://github.com/higgsfield/RL-Adventure/blob/master/3.dueling%20dqn.ipynb
37
DUELING DQN
IMPLEMENTATION
https://github.com/higgsfield/RL-Adventure/blob/master/3.dueling%20dqn.ipynb
38
EXPERIMENTS
EXPERIMENTAL COMPARISON
[DQN] [Double DQN] [Dueling DQN]
39
EXTRA
DQN LINEAGE
40
Niels Justesen, Philip Bontrager, Julian Togelius, Sebastian Risi. Deep Learning for Video Game Playing. 2017.
감사합니다
41
RL Adventure
PER and NoisyNet
양홍선
1
PER
Prioritized Experience Replay
2
Replay Memory
3
𝑠𝑠𝑡𝑡, 𝑎𝑎𝑡𝑡
𝑟𝑟𝑡𝑡, 𝑠𝑠𝑡𝑡+1
Replay Buffer
frequency
Replay Memory
4
Replay Buffer
More task-relevant
frequency
𝑠𝑠𝑡𝑡, 𝑎𝑎𝑡𝑡
𝑟𝑟𝑡𝑡, 𝑠𝑠𝑡𝑡+1
Replay Memory
5
Replay Buffer
More task-relevant
frequency
𝑠𝑠𝑡𝑡, 𝑎𝑎𝑡𝑡
𝑟𝑟𝑡𝑡, 𝑠𝑠𝑡𝑡+1
6
Which experiences to store
Which experiences to replay
Design of Replay Memory
7
Which experiences to store
Which experiences to replay
Design of Replay Memory
A Motivating Example
8
Two actions: ‘right(→→)’ and ‘wrong(→)’
The environment requires an exponential number of random steps until the first
non-zero reward
The most relevant transitions are hidden in a mass of highly redundant failure
cases
9
How?
10
Prioritizing with TD-Error
A transition’s TD error 𝛿𝛿
how ‘surprising’ or unexpected the transition is
11
A low TD-Error on first visit may not be replayed for a long time
The PER with TD-Error is sensitive to noise spikes
Greedy prioritization focuses on a small subset of the experience
Weakness
12
Stochastic Sampling!
Stochastic Prioritization
Proportional prioritization
• 𝑝𝑝𝑖𝑖 = 𝛿𝛿𝑖𝑖 + 𝜖𝜖
• 𝑃𝑃 𝑖𝑖 =
𝑝𝑝𝑖𝑖
𝛼𝛼
∑𝑘𝑘 𝑝𝑝𝑘𝑘
𝛼𝛼
• 𝑝𝑝𝑖𝑖 > 0: the priority of transition 𝑖𝑖
• 𝛼𝛼: determines how much prioritization is used
• Sum-tree
13
Stochastic Prioritization
Rank-based prioritization
• 𝑝𝑝𝑖𝑖 =
1
rank(𝑖𝑖)
• rank(𝑖𝑖) is the rank of transition 𝑖𝑖 when the replay memory is sorted
according to 𝛿𝛿𝑖𝑖
• More robust
• Binary heap
14
Annealing the Bias
• Importance-Sampling (IS) weights
• 𝑤𝑤𝑖𝑖 =
1
𝑁𝑁
1
𝑃𝑃 𝑖𝑖
𝛽𝛽
• Normalize:
1
max𝑖𝑖 𝑤𝑤𝑖𝑖
• ∆← ∆ + 𝑤𝑤𝑖𝑖 � 𝛿𝛿𝑖𝑖 � ∇𝜃𝜃 𝑄𝑄(𝑆𝑆𝑖𝑖−1, 𝐴𝐴𝑖𝑖−1)
15
17
Proportional prioritization (without sum-tree)
18
19
𝑃𝑃 𝑖𝑖 =
𝑝𝑝𝑖𝑖
𝛼𝛼
∑𝑘𝑘 𝑝𝑝𝑘𝑘
𝛼𝛼
20
21
최소 한번은 replay
22
23
𝑃𝑃 𝑖𝑖 =
𝑝𝑝𝑖𝑖
𝛼𝛼
∑𝑘𝑘 𝑝𝑝𝑘𝑘
𝛼𝛼
24
IS weights
25
26
TD-Error로 업데이트
NoisyNet
Noisy Networks for Exploration
27
28
Exploration
Exploitation
29
High exploration
Optimal
High exploitation
Exploration
Exploitation
30
Exploration
Efficient
31
Exploration methods
𝝐𝝐 −greedy
Entropy regularization
Loss에 추가하는 패널티로 한쪽으로 치우치지 않게 함
− �
𝑎𝑎
π(s,a) log π(s,a)
일정 확률 (𝜖𝜖) 만큼 무작위로 행동
32
𝜖𝜖 − greedy, Entropy regularization
33
𝜖𝜖 − greedy, Entropy regularization
Random perturbations
34
𝜖𝜖 − greedy, Entropy regularization
Random perturbations
Hard to the large-scale behavioural patterns
35
NoisyNet!!
36
NoisyNet learn perturbations of the
network weights are used to drive
exploration
37
𝜃𝜃 ≔ 𝜇𝜇 + ∑ ⊙ 𝜖𝜖
38
𝜃𝜃 ≔ 𝜇𝜇 + ∑ ⊙ 𝜖𝜖
Learnable parameters
Noise variables
39
𝜃𝜃 ≔ 𝜇𝜇 + ∑ ⊙ 𝜖𝜖
Learnable parameters
Noise variables
𝜁𝜁 ≔ (𝜇𝜇, ∑)
40
𝑦𝑦 = 𝑤𝑤𝑤𝑤 + 𝑏𝑏
𝑦𝑦 ≔ 𝜇𝜇 𝑤𝑤
+ 𝜎𝜎 𝑤𝑤
⊙ 𝜖𝜖 𝑤𝑤
𝑥𝑥 + 𝜇𝜇𝑏𝑏
+ 𝜎𝜎 𝑏𝑏
⊙ 𝜖𝜖 𝑏𝑏
• p inputs and q outputs
• Independent Gaussian noise
• Using an independent Gaussian noise entry per weight
• pq+q
• Factorised Gaussian noise
• Using and independent noise per each output and input
• p+q
41
NoisyNet
42
�𝐿𝐿 𝜁𝜁 = 𝔼𝔼 𝔼𝔼 𝑥𝑥,𝑎𝑎,𝑟𝑟,𝑦𝑦 ~𝐷𝐷[𝑟𝑟 + 𝛾𝛾 max
𝑏𝑏∈𝐴𝐴
𝑄𝑄 𝑦𝑦, 𝑏𝑏, 𝜖𝜖′; 𝜁𝜁− − 𝑄𝑄 𝑥𝑥, 𝑎𝑎, 𝜖𝜖; 𝜁𝜁 2
𝐿𝐿(𝜃𝜃) = 𝔼𝔼 𝔼𝔼 𝑥𝑥,𝑎𝑎,𝑟𝑟,𝑦𝑦 ~𝐷𝐷[𝑟𝑟 + 𝛾𝛾 max
𝑏𝑏∈𝐴𝐴
𝑄𝑄 𝑦𝑦, 𝑏𝑏; 𝜃𝜃−
− 𝑄𝑄 𝑥𝑥, 𝑎𝑎; 𝜃𝜃) 2
Loss
43
Loss
�𝐿𝐿 𝜁𝜁 = 𝔼𝔼 𝔼𝔼 𝑥𝑥,𝑎𝑎,𝑟𝑟,𝑦𝑦 ~𝐷𝐷[𝑟𝑟 + 𝛾𝛾 max
𝑏𝑏∈𝐴𝐴
𝑄𝑄 𝑦𝑦, 𝑏𝑏, 𝜖𝜖′; 𝜁𝜁− − 𝑄𝑄 𝑥𝑥, 𝑎𝑎, 𝜖𝜖; 𝜁𝜁 2
𝐿𝐿(𝜃𝜃) = 𝔼𝔼 𝔼𝔼 𝑥𝑥,𝑎𝑎,𝑟𝑟,𝑦𝑦 ~𝐷𝐷[𝑟𝑟 + 𝛾𝛾 max
𝑏𝑏∈𝐴𝐴
𝑄𝑄 𝑦𝑦, 𝑏𝑏; 𝜃𝜃−
− 𝑄𝑄 𝑥𝑥, 𝑎𝑎; 𝜃𝜃) 2
Initialisation of NoisyNet
• An unfactorized NoisyNet
• 𝜇𝜇𝑖𝑖,𝑗𝑗 ~ 𝑢𝑢 −
3
𝑝𝑝
, +
3
𝑝𝑝
• p: The number of inputs
• 𝜎𝜎𝑖𝑖,𝑗𝑗 = 0.017
• Factorised NosiyNet
• 𝜇𝜇𝑖𝑖,𝑗𝑗 ~ 𝑢𝑢 −
1
𝑝𝑝
, +
1
𝑝𝑝
• 𝜎𝜎𝑖𝑖,𝑗𝑗 =
𝜎𝜎0
𝑝𝑝
• 𝜎𝜎0 = 0.5
44
45
The learning curves of
the average noise parameter �∑
46
The learning curves of
the average noise parameter �∑
47
Factorised NosiyNet
48
49
Learnable parameters
50
Factorised NosiyNet
𝜇𝜇𝑖𝑖,𝑗𝑗 ~ 𝑢𝑢 −
1
𝑝𝑝
, +
1
𝑝𝑝
51
Factorised
52
53
𝜃𝜃 ≔ 𝜇𝜇 + ∑ ⊙ 𝜖𝜖
54
Code: https://github.com/higgsfield/RL-Adventure
PER: https://arxiv.org/abs/1511.05952
NoisyNet: https://arxiv.org/abs/1706.10295
감사합니다
Q&A
55
1
RL Adventure
Distributional RL
이의령
C51
Distributional RL
목차
1. Motivation
2. Distributional RL(C51) 설명
3. C51 Result
4. 코드 구현체 분석
3
1. Motivation
4
5
Motivation
+ $ 200
- $ 1,800
Ε	[R x ] =		
()
36
	×	200	 −	
0
36
× 1,800
	= 144
6
Motivation
+ $ 200
- $ 1,800
𝑅230 + 𝛾𝑅236 +	⋯	+ 𝛾89290 𝑅8
보상의 합
7
Expected RL
+ $ 200
- $ 1,800
벨만 방정식
𝑣 𝑥 = 𝑬	 𝑅230 + 𝛾𝑅236 +	⋯	|	𝑆2 = 𝑥 	
= 𝑬	R x + 𝛾	𝑬	𝑣(𝑥)
= 𝑬	 𝑅230 + 𝛾	𝑣 𝑥 	|	𝑆2 = 𝑥
Reward를 Random Variable 관점에서 바라보면…
§ 가치함수는 discount된 미래 보상에 대한 기댓값을 리턴한다.
§ 기댓값 = Scalar(o) / Distribution(x)
§ 미래 보상 값들은 complex, Multimodal의 특성을 가진다.
§ 기댓값은 각 보상들이 가지는 intrinsic(본질적인)한 특성을 담아내지 못한다.
8
Expected RL
Ε	[R x ] =		
()
36
	×	200	 −	
0
36
× 1,800
	= 144
Reward를 Random Variable 관점에서 바라보면…
9
Expected RL
이러한 Expected RL의 한계점을 보완책
-> A Distributional Perspective on RL (C51)
Return을 Distribution으로 만들어
Randomness한 특성과 정보를 최대한 반영해보자
𝑉B 	= 	𝐸 	𝑍B 𝑥 	 	= 	𝐸 	𝑅 𝑥 	 	+ 	𝐸[	𝑍B 𝑋F 	]
Return을 Distribution으로 만들어
Randomness한 특성과 정보를 최대한 반영해보자
𝑉B 	= 	𝐸 	𝑍B 𝑥 	 	= 	𝐸 	𝑅 𝑥 	 	+ 	𝐸[	𝑍B 𝑋F 	]
	𝑍B 𝑥 = 𝑅 𝑥 + 𝑍B 𝑋F
2. Distributional RL
13
§ Expected RL à Distributional RL
§ Return에 대한 Value Distribution을 만들자.
§ C51 = Categorical / 이산형 분포
§ 51개의 bin을 이용하여 분포를 만든다.
14
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
https://arxiv.org/abs/1707.06887
§ Distributional Bellman Equation
§ Cf) Bellman Equation
§ 𝑍 𝑠, 𝑎 는 Distribution을 의미, 이를 이용하여 Distribution을 생성
15
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
𝑄 𝑥, 𝑎 = 𝑅(𝑥, 𝑎) + 𝛾𝑄B(𝑥′, 𝑎′)
𝑄 𝑠, 𝑎 = 𝐸 𝑍 𝑠, 𝑎 =	L 𝑝N 𝑥N
O
NP0
16
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
17
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
18
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
19
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
20
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
C51 = DQN + Projection Distribution
(분포 만들기)
Distributional DQN
1. Return에 대한 Value Distribution(51개 bin)을 만든다.
2. 각 스텝마다 만든 Value Distribution 들간의 거리를 구한다.
à 논문에서 이론상 Wasserstein distance로 정의했지만
실험에서 KL-divergence로 계산
3. Cross entropy로 분포간의 Loss 계산
21
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
22
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
23
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
Replay Buffer에서 Batch size만큼 추출
24
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
Projection Distribution
(분포 만들기)
25
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
Bellman	distributional	operator
𝑽 𝒎𝒂𝒙 	= 𝟏𝟎
𝑽 𝒎𝒊𝒎 = -10
26
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
27
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
KL-divergence(cross entropy)로
Loss 구하기
28
Performance
A Distributional Perspective on Reinforcement Learning (C51)
Relative PerformanceComparison
3. 코드 구현체 분석
29
감사합니다.
» urleee@naver.com
30
RL Adventure
RAINBOW
김예찬
1
INDEX
1. Environment
2. Before RAINBOW
DDQN(Double Deep Q-Learning)
Dueling DQN
Multi-Step TD(Temporal Difference)
PER(Prioritized Experience Replay)
Noisy Network
Categorical DQN(C51)
3. RAINBOW
4. RAINBOW - Code
2
OPENAI GYM
HTTPS://GYM.OPENAI.COM
HTTPS://GITHUB.COM/OPENAI/GYM
1. EXPERIMENT ENVIRONMENT
3
2. BEFORE RAINBOW : DOUBLE DQN
4
HTTPS://ARXIV.ORG/ABS/1509.06461
2. BEFORE RAINBOW : DUELING DQN
HTTPS://ARXIV.ORG/ABS/1511.06581
5
2. BEFORE RAINBOW : DUELING DQN
6
HTTPS://ARXIV.ORG/ABS/1511.06581
2. BEFORE RAINBOW : MULTI-STEP LEARNING
7
2. BEFORE RAINBOW : PER
HTTPS://ARXIV.ORG/ABS/1511.05952
8
2. BEFORE RAINBOW : NOISY NETWORK
HTTPS://ARXIV.ORG/ABS/1706.10295
9
2. BEFORE RAINBOW : NOISY NETWORK
HTTPS://ARXIV.ORG/ABS/1706.10295
10
2. BEFORE RAINBOW : CATEGORICAL DQN(C51)
HTTPS://ARXIV.ORG/PDF/1707.06887.PDF
11
2. BEFORE RAINBOW : CATEGORICAL DQN(C51)
HTTPS://ARXIV.ORG/PDF/1707.06887.PDF
12
RAINBOW
3. RAINBOW
13
3. RAINBOW
RAINBOW
DDQN(Double Deep Q-Learning)
+
Dueling DQN
+
Multi-Step TD(Temporal Difference)
+
PER(Prioritized Experience Replay)
+
Noisy Network
+
Categorical DQN(C51)
14
3. RAINBOW
15
3. RAINBOW
HYPERPARAMETERS
16
3. RAINBOW
17
3. RAINBOW
18
PONG
4. RAINBOW - CODE
19
NOISYLINEAR
4. RAINBOW - CODE
20
DUELING + NOISY + C51
4. RAINBOW - CODE
21
PROJECTION STEP
4. RAINBOW - CODE
22
CROSS-ENTROPY LOSS
4. RAINBOW - CODE
23
TEST
4. RAINBOW - CODE
24
Thank you
RAINBOW
김예찬
25

More Related Content

What's hot

파이썬과 케라스로 배우는 강화학습 저자특강
파이썬과 케라스로 배우는 강화학습 저자특강파이썬과 케라스로 배우는 강화학습 저자특강
파이썬과 케라스로 배우는 강화학습 저자특강
Woong won Lee
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
Woong won Lee
 
[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル
[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル
[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル
Deep Learning JP
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2
Dongmin Lee
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
Euijin Jeong
 
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
Deep Learning JP
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C model
WEBFARMER. ltd.
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
Dongmin Lee
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
Kaggleのテクニック
Yasunori Ozaki
 
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Yusuke Uchida
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
最適腕識別と多重検定
最適腕識別と多重検定最適腕識別と多重検定
最適腕識別と多重検定
Masa Kato
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
MeetupDataScienceRoma
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
Shane (Seungwhan) Moon
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
Jooyoul Lee
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
Yisong Yue
 
論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learning論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learning
Kazuki Adachi
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
Euijin Jeong
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
Owin Will
 

What's hot (20)

파이썬과 케라스로 배우는 강화학습 저자특강
파이썬과 케라스로 배우는 강화학습 저자특강파이썬과 케라스로 배우는 강화학습 저자특강
파이썬과 케라스로 배우는 강화학습 저자특강
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 
[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル
[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル
[DL輪読会]Deep Learning 第16章 深層学習のための構造化確率モデル
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
 
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C model
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
Kaggleのテクニック
 
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
Image Retrieval Overview (from Traditional Local Features to Recent Deep Lear...
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
最適腕識別と多重検定
最適腕識別と多重検定最適腕識別と多重検定
最適腕識別と多重検定
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
 
論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learning論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learning
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 

Similar to pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"

zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
Alex Pruden
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
Yoonho Lee
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
San Kim
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
Burak Himmetoglu
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
Tech in Asia ID
 
Tensorflow + Keras & Open AI Gym
Tensorflow + Keras & Open AI GymTensorflow + Keras & Open AI Gym
Tensorflow + Keras & Open AI Gym
HO-HSUN LIN
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
Barbara Fusinska
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic Encryption
Göktuğ Serez
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural Networks
Shinya Takamaeda-Y
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
ssuserf07225
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
SungminYou
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojure
ztellman
 
Gan seminar
Gan seminarGan seminar
Gan seminar
San Kim
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
NAVER Engineering
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
Raphael Reitzig
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
Masayuki Tanaka
 

Similar to pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지" (20)

zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
 
Tensorflow + Keras & Open AI Gym
Tensorflow + Keras & Open AI GymTensorflow + Keras & Open AI Gym
Tensorflow + Keras & Open AI Gym
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic Encryption
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural Networks
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojure
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 

More from YeChan(Paul) Kim

강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent
YeChan(Paul) Kim
 
Neural module Network
Neural module NetworkNeural module Network
Neural module Network
YeChan(Paul) Kim
 
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
YeChan(Paul) Kim
 
Multiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement LearningMultiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement Learning
YeChan(Paul) Kim
 
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
YeChan(Paul) Kim
 
3D Environment : HomeNavigation
3D Environment : HomeNavigation3D Environment : HomeNavigation
3D Environment : HomeNavigation
YeChan(Paul) Kim
 
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionDiversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
YeChan(Paul) Kim
 
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
YeChan(Paul) Kim
 

More from YeChan(Paul) Kim (8)

강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent강화학습과 LV&A 그리고 Navigation Agent
강화학습과 LV&A 그리고 Navigation Agent
 
Neural module Network
Neural module NetworkNeural module Network
Neural module Network
 
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Ne...
 
Multiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement LearningMultiagent Cooperative and Competition with Deep Reinforcement Learning
Multiagent Cooperative and Competition with Deep Reinforcement Learning
 
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
 
3D Environment : HomeNavigation
3D Environment : HomeNavigation3D Environment : HomeNavigation
3D Environment : HomeNavigation
 
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionDiversity is all you need(DIAYN) : Learning Skills without a Reward Function
Diversity is all you need(DIAYN) : Learning Skills without a Reward Function
 
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
pyconkr 2018 RL_Adventure : Rainbow(value based Reinforcement Learning)
 

Recently uploaded

HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 

Recently uploaded (20)

HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 

pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"