지금까지 제가 구현한논문은
• Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
• Deep Visual Analogy-Making Computer Vision
24.
지금까지 제가 구현한논문은
• Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
• Deep Visual Analogy-Making
• End-To-End Memory Networks
• Neural Variational Inference for Text Processing
• Character-Aware Neural Language Models
• Teaching Machines to Read and Comprehend
Computer Vision
Natural Language Processing
25.
지금까지 제가 구현한논문은
• Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
• Deep Visual Analogy-Making
• End-To-End Memory Networks
• Neural Variational Inference for Text Processing
• Character-Aware Neural Language Models
• Teaching Machines to Read and Comprehend
• Playing Atari with Deep Reinforcement Learning
• Human-level control through deep reinforcement learning
• Deep Reinforcement Learning with Double Q-learning
• Dueling Network Architectures for Deep Reinforcement Learning
• Language Understanding for Text-based Games Using Deep Reinforcement Learning
• Asynchronous Methods for Deep Reinforcement Learning
Computer Vision
Natural Language Processing
Reinforcement Learning
26.
지금까지 제가 구현한논문은
• Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
• Deep Visual Analogy-Making
• End-To-End Memory Networks
• Neural Variational Inference for Text Processing
• Character-Aware Neural Language Models
• Teaching Machines to Read and Comprehend
• Playing Atari with Deep Reinforcement Learning
• Human-level control through deep reinforcement learning
• Deep Reinforcement Learning with Double Q-learning
• Dueling Network Architectures for Deep Reinforcement Learning
• Language Understanding for Text-based Games Using Deep Reinforcement Learning
• Asynchronous Methods for Deep Reinforcement Learning
• Neural Turing Machines
Computer Vision
Algorithm Learning
Natural Language Processing
Reinforcement Learning
27.
재밌었던 논문은
• UnsupervisedRepresentation Learning with Deep Convolutional Generative Adversarial Networks
• Deep Visual Analogy-Making
• End-To-End Memory Networks # 간단하면서도 쉽고 결과도 재밌음
• Neural Variational Inference for Text Processing
• Character-Aware Neural Language Models
• Teaching Machines to Read and Comprehend
• Playing Atari with Deep Reinforcement Learning # RL 입문으로 괜찮
• Human-level control through deep reinforcement learning
• Deep Reinforcement Learning with Double Q-learning
• Dueling Network Architectures for Deep Reinforcement Learning
• Language Understanding for Text-based Games Using Deep Reinforcement Learning
• Asynchronous Methods for Deep Reinforcement Learning # 구현이 challenging함
• Neural Turing Machines # 모델이 흥미로움
# 모델이 흥미로움
Recurrent attention model
withexternal memory
필요한 사전 지식
Recurrent Neural Network
Soft and hard attention in Neural Network
Internal and external memory access in Neural Network
37.
Input: Context 문장들과질문
𝑥: = Mary journeyed to the den.
𝑥? = Mary went back to the kitchen.
𝑥A = John journeyed to the bedroom.
𝑥C = Mary discarded the milk.
Q: Where was the milk before the den?
𝑥: 𝑥? 𝑥A 𝑥C
문장 하나: 단어의조합
𝑥H = 𝑥H:, 𝑥H?, …, 𝑥HK
𝑥H = Mary journeyed to the den.
𝑥H: = mary
𝑥H? = journeyed
𝑥HA = to
𝑥HC = the
𝑥HL = den
40.
단어 하나: Bag-of-Words(BoW)로 표현
1
0
0
0
.
.
.
0
mary
𝑥H = Mary journeyed to the den.
Bag-of-Words (BoW)
𝑥H:
=
𝑥H: = mary
𝑥H = 𝑥H:, 𝑥H?, …, 𝑥HK
41.
문장 하나: BoW의set
1
0
0
0
.
.
.
0
mary
0
1
0
0
.
.
.
0
journeyed
0
0
1
0
.
.
.
0
to
0
0
0
1
.
.
.
0
the
𝑥H = Mary journeyed to the den.
𝑥H = 𝑥H:, 𝑥H?, …, 𝑥HK
𝑥H: = mary
𝑥H? = journeyed
𝑥HA = to
𝑥HC = the
𝑥HL = den
42.
Input: BoW로 표현된Context 문장들과 질문
𝑥: = Mary journeyed to the den.
𝑥? = Mary went back to the kitchen.
𝑥A = John journeyed to the bedroom.
𝑥C = Mary discarded the milk.
𝑥: 𝑥A 𝑥T 𝑥? 𝑥C 𝑥U 𝑥V
Q: Where was the milk before the den?
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0
43.
Memory: 문장 하나로부터만들어진다
𝑚H = X 𝑨𝑥HZ
Z
1
0
0
0
.
.
.
0
0
1
0
0
.
.
.
0
0
0
1
0
.
.
.
0
0
0
0
1
.
.
.
0
𝑚: = 𝑨 +𝑨 +𝑨 +𝑨
𝑥:: 𝑥:? 𝑥:A 𝑥:C
mary journeyed to the
𝑨: embedding matrix
# optimizer를 정의한다
opt= tf.GradientDescentOptimizer(learning_rate=0.1)
# 주어진 variable의 loss에 대한 gradients를 계산한다.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars은 tuple(gradient, variable)의 리스트다.
# 이제 gradient를 마음대로 바꾸고 optimizer를 적용(apply_gradient)하면 된다.
capped_grads_and_vars =
[(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]
opt.apply_gradients(capped_grads_and_vars)
Tip: 에서
custom gradient를 정의하는 방법?
69.
# optimizer를 정의한다
opt= tf.GradientDescentOptimizer(learning_rate=0.1)
# 주어진 variable의 loss에 대한 gradients를 계산한다.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars은 tuple(gradient, variable)의 리스트다.
# 이제 gradient를 마음대로 바꾸고 optimizer를 적용(apply_gradient)하면 된다.
capped_grads_and_vars =
[(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]
opt.apply_gradients(capped_grads_and_vars)
Tip: 에서
custom gradient를 정의하는 방법?
70.
# optimizer를 정의한다
opt= tf.GradientDescentOptimizer(learning_rate=0.1)
# 주어진 variable의 loss에 대한 gradients를 계산한다.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars은 tuple(gradient, variable)의 리스트다.
# 이제 gradient를 마음대로 바꾸고 optimizer를 적용(apply_gradient)하면 된다.
capped_grads_and_vars =
[(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]
optim = opt.apply_gradients(capped_grads_and_vars)
Tip: 에서
custom gradient를 정의하는 방법?
71.
params = [A,B, C, W]
grads_and_vars = opt.compute_gradients(loss, params)
clipped_grads_and_vars = [(tf.clip_by_norm(gv[0], 50), gv[1])
for gv in grads_and_vars]
optim = opt.apply_gradients(clipped_grads_and_vars)
Optimization:
𝑔𝑟𝑎𝑖𝑑𝑒𝑛𝑡 𝑐𝑙𝑖𝑝𝑝𝑖𝑛𝑔을 한다
72.
Train: 학습 시키면끝
for idx in xrange(N):
sess.run(optim, feed_dict={input: x,
target: target,
context: context})
73.
Resources
• Paper “End-To-EndMemory Networks”
• 저자의 코드 : https://github.com/facebook/MemNN
• Question Answering에 더 관심이 있다면
• Paper “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”
• Paper “Teaching machines to read and comprehend”
Dynamic Memory Networks Attentive Impatient Reader
74.
QA dataset: ThebAbI tasks
> AI가 문장들에서 정보를 추론할 수 있을까?
https://research.facebook.com/research/babi
75.
QA dataset: TheChildren’s Book Test
> AI가 “이상한 나라의 앨리스”를 이해할 수 있을까?
https://research.facebook.com/research/babi/#cbt
Asynchronous Methods for
DeepReinforcement Learning (A3C)
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Kavukcuoglu, K. (ICML 2016)
https://github.com/carpedm20/deep-rl-tensorflow
정의: State, Value
DemystifyingDeep Reinforcement Learning
State: game의 상태 화면, 공의 위치, 상대방의 위치 …
Value: state s|에서 행동 a|의 가치 보상 (return)
a|를 하고 난 후의 점수 r|}:뿐만 아니라
s|, 𝑠g}:, 𝑠g}?, … 에서 받게되는 미래의 모든 점수, 즉 r|}: + r|}? + r|}A + ⋯의 기대값
90.
정의: State, Value
DemystifyingDeep Reinforcement Learning
State: game의 상태 화면, 공의 위치, 상대방의 위치 …
Value: state s|에서 행동 a|의 가치 보상 (return)
Value가 중요한 이유는 Value를 통해 어떠한 행동을 할지 결정할 수 있기 때문
epsilon greedy policy
91.
정의: State, Value
DemystifyingDeep Reinforcement Learning
State: game의 상태
가장 쉬운 방법은 표로 기록하고 값을 업데이트 하는것!
화면, 공의 위치, 상대방의 위치 …
State 𝑸(𝒔 𝒕,위) 𝑸(𝒔 𝒕,아래)
𝑠: 1.2 -1
𝑠? -1 0
𝑠A 2.3 1.2
𝑠C 0 0
. . .
. . .
. . .
𝑠‚ 0.2 -0.1
Value: state s|에서 행동 a|의 가치
Dynamic programming, Value iteration …
보상 (return)
92.
정의: State, Value
DemystifyingDeep Reinforcement Learning
State: game의 상태
Value: state s|에서 행동 a|의 가치
하지만 state가 너무 많을 경우 각각을 하나하나 계산하는게 불가능
화면, 공의 위치, 상대방의 위치 …
보상 (return)
State 𝑸(𝒔 𝒕,위) 𝑸(𝒔 𝒕,아래)
𝑠: 1.2 -1
𝑠? -1 0
𝑠A 2.3 1.2
𝑠C 0 0
. . .
. . .
. . .
𝑠‚ 0.2 -0.1
93.
Model: Deep Q(quality) -Network
Demystifying Deep Reinforcement Learning
딥러닝 모델로 한번 배워보자!
Q(s|)를 함수로 approximate 하는것 :
Value function approximation
𝑸(𝒔 𝒕, 위) 𝑸(𝒔 𝒕, 아래) 𝑸(𝒔 𝒕,…)
94.
Input: 4개의 연속된프레임
Human-level control through deep reinforcement learning
공이 위로 움직이는지 아래로 움직이는지
context 정보를 주기 위함
𝑠g
𝑸(𝒔 𝒕, 𝒂 𝟏 ) 𝑸(𝒔 𝒕, 𝒂 𝟐 ) 𝑸(𝒔 𝒕, 𝒂 𝟐 )
95.
Output: 행동의 가치(q-value)
Human-levelcontrol through deep reinforcement learning
𝑸 𝒔𝒕 : action의 갯수를 크기로 하는 vector
𝑸(𝒔 𝒕, 𝒂 𝟏 ) 𝑸(𝒔 𝒕, 𝒂 𝟐 ) 𝑸(𝒔 𝒕, 𝒂 𝟐 )
즉, loss만큼 𝑄𝑠g 를 업데이트 한다
𝑄(𝑠g)
𝑄(𝑠g, 𝑎:) 𝑄(𝑠g, 𝑎?) 𝑄(𝑠g, 𝑎A)
𝑠g
𝑠g}:
𝑎 𝑟g}:
예상하는 가치 : 𝑟g}: + 𝛾𝑟g}? + 𝛾?
𝑟g}A + ⋯ 𝑟g}? + 𝛾𝑟g}A + 𝛾?
𝑟g}C + ⋯
𝑙𝑜𝑠𝑠 = {𝑟g}: + 𝛾𝑄 𝑠g}: } − 𝑄 𝑠g
107.
n-step Q-learning
𝑄(𝑠g)
𝑄(𝑠g, 𝑎:)𝑄(𝑠g, 𝑎?) 𝑄(𝑠g, 𝑎A)
See David Silver’s course & Sutton’s book for more information
𝑟g}:
1-step
𝑠g 𝑠g}:
업데이트
108.
n-step Q-learning
𝑄(𝑠g)
𝑄(𝑠g, 𝑎:)𝑄(𝑠g, 𝑎?) 𝑄(𝑠g, 𝑎A)
See David Silver’s course & Sutton’s book for more information
𝑟g}:
1-step
𝑟g}:
2-step
𝑟g}?
𝑠g 𝑠g}:
𝑠g}?
업데이트
109.
n-step Q-learning
𝑄(𝑠g)
𝑄(𝑠g, 𝑎:)𝑄(𝑠g, 𝑎?) 𝑄(𝑠g, 𝑎A)
See David Silver’s course & Sutton’s book for more information
𝑟g}:
1-step
𝑟g}:
2-step
𝑟g}:
3-step
𝑟g}?
𝑟g}? 𝑟g}A
𝑠g 𝑠g}:
𝑠g}?
𝑠g}A
업데이트
110.
n-step Q-learning
𝑄(𝑠g)
𝑄(𝑠g, 𝑎:)𝑄(𝑠g, 𝑎?) 𝑄(𝑠g, 𝑎A)
See David Silver’s course & Sutton’s book for more information
1-step
𝑟g}:
2-step
𝑟g}:
3-step
𝑟g}?
𝑟g}? 𝑟g}A
𝑠g}?
𝑠g}A
{𝑟g}: + 𝛾𝑄 𝑠g}: } − 𝑄 𝑠g
업데이트
111.
n-step Q-learning
𝑄(𝑠g)
𝑄(𝑠g, 𝑎:)𝑄(𝑠g, 𝑎?) 𝑄(𝑠g, 𝑎A)
See David Silver’s course & Sutton’s book for more information
1-step
2-step
𝑟g}:
3-step
𝑟g}? 𝑟g}A
𝑠g}A
{𝑟g}: + 𝛾𝑄 𝑠g}: } − 𝑄 𝑠g
{𝑟g}: + 𝛾𝑟g}? + 𝛾?
𝑄 𝑠g}? } − 𝑄 𝑠g
업데이트
112.
n-step Q-learning
𝑄(𝑠g)
𝑄(𝑠g, 𝑎:)𝑄(𝑠g, 𝑎?) 𝑄(𝑠g, 𝑎A)
See David Silver’s course & Sutton’s book for more information
1-step
2-step
3-step
{𝑟g}: + 𝛾𝑄 𝑠g}: } − 𝑄 𝑠g
{𝑟g}: + 𝛾𝑟g}? + 𝛾?
𝑄 𝑠g}? } − 𝑄 𝑠g
{𝑟g}: + 𝛾𝑟g}? + 𝛾?
𝑟g}A + 𝛾A
𝑄 𝑠g}A } − 𝑄 𝑠g
Higher variance, lower bias
업데이트
예제: 3-step actor-critic
𝑥gU 𝑥g ?
Context
CNN
𝑥g L 𝑥g :
Context
CNN
𝑥g C 𝑥g
Context
CNN
𝑠g ?
𝑥g C 𝑥g}:
Context
CNN
share parameters
130.
𝑠g ?를 보고𝜋를 계산해 𝑎를 구함
𝑥g U 𝑥g ?
Context
CNN
𝜋 𝑉
𝑥g L 𝑥g :
Context
CNN
𝑥g C 𝑥g
Context
CNN
𝑠g ?
𝑥g C 𝑥g}:
Context
CNN
𝑎
𝜋 : actor V : critic
행동의 확률 state의 가치
131.
agent가 𝑎를 수행
𝑥gU 𝑥g ?
Context
CNN
𝜋 𝑉
𝑥g L 𝑥g :
Context
CNN
𝑥g C 𝑥g
Context
CNN
𝑠g :𝑠g ?
𝑥g C 𝑥g}:
Context
CNN
𝑎
𝑟g :
132.
𝑠g :를 보고𝜋를 계산해 𝑎를 구함
𝑥g U 𝑥g ?
Context
CNN
𝜋 𝑉
𝑥g L 𝑥g :
Context
CNN
𝜋 𝑉
𝑥g C 𝑥g
Context
CNN
𝑠g𝑠g :𝑠g ?
𝑥g C 𝑥g}:
Context
CNN
𝑎𝑎
𝑟g𝑟g :
133.
𝑠g를 보고 𝜋를계산해 𝑎를 구함
𝑥g U 𝑥g ?
Context
CNN
𝜋 𝑉
𝑥g L 𝑥g :
Context
CNN
𝜋 𝑉
𝑥g C 𝑥g
Context
CNN
𝜋 𝑉
𝑠g}:𝑠g𝑠g :𝑠g ?
𝑥g C 𝑥g}:
Context
CNN
𝜋 𝑉
𝑎𝑎𝑎
𝑟g}:𝑟g𝑟g :
from threading importThread
global_network = make_network(name='A3C_global’)
Global network를 만들고
170.
for worker_id inrange(n_worker):
with tf.variable_scope('thread_%d' % worker_id) as scope:
for step in range(n_step):
network = Network(global_network, name='A3C_%d' % (worker_id))
networks.append(network)
tf.get_variable_scope().reuse_variables()
A3C_FFs[worker_id] = A3C_FF(worker_id, networks, global_network)
Local network을 만들고
…
171.
for worker_id inrange(n_worker):
A3C_FFs[worker_id].networks[0].copy_from_global()
# weight sharing하기 때문에 0번째 step에 해당하는 network만 하면 됨
Global network의 𝜃‘, 𝜃™를
Local network로 복사한다
share parameters
copy
…
172.
class Network(object):
def __init__(self,sess):
with tf.variable_scope('copy_from_target'):
copy_ops = []
for name in self.w.keys(): # w: network의 variable를 가진 dict
copy_op = self.w[name].assign(global_network.w[name])
copy_ops.append(copy_op)
self.global_copy_op = tf.group(*copy_ops, name='global_copy_op')
𝜃™를 𝜃`™로 assign() 하는 op을 만들고
copy
…
for grad, varin tuple(grads_and_vars):
self.accum_grads[name] = tf.Variable(tf.zeros(shape), trainable=False)
new_grads_and_vars.append((tf.clip_by_norm(
self.accum_grads[name].ref(), self.max_grad_norm), global_v))
𝜕l𝑜𝑠𝑠/𝜕𝜃`™를 누적해서 저장할
𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒을 만들고, assign_add()를 추가
176.
targets = [network.sampled_actionfor network in self.networks]
targets.extend([network.log_policy_of_sampled_action for network in self.networks])
targets.extend([network.value for network in self.networks])
targets.extend(self.add_accum_grads.values())
targets.append(self.fake_apply_gradient)
inputs = [network.s_t for network in self.networks]
inputs.extend([network.R for network in self.networks])
inputs.extend([network.true_log_policy for network in self.networks])
inputs.append(self.learning_rate_op)
self.partial_graph = self.sess.partial_run_setup(targets, inputs)
sess.partial_run_setup로
partial_graph를 정의하고
def worker_func(worker_id):
model =A3C_FFs[worker_id]
if worker_id == 0:
model.train_with_log(global_t,
saver, writer, checkpoint_dir, assign_global_t_op)
else:
model.train(global_t)
쓰레드로 실행된 함수를 정의하고
180.
workers = []
foridx in range(config.n_worker):
workers.append(Thread(target=worker_func, args=(idx,)))
# Execute and wait for the end of the training
for worker in workers:
worker.start()
for worker in workers:
worker.join()
쓰레드 start() join() 하면 끝.
181.
Resources
• Paper “AsynchronousMethods for Deep Reinforcement Learning”
• Chainer 구현: https://github.com/muupan/async-rl
• 읽기 쉬운 Deep Q-network 소개글
• http://sanghyukchun.github.io/90/
• Demystifying Deep Reinforcement Learning
• Reinforcement Learning를 진지하게 공부하고 싶다면
• David Silver’s course
• John Schulman’s CS294 course
• Sutton’s “Reinforcement Learning: An Introduction”
182.
Resources
• 가 제안한문제들 : Requests for Research
• Improved Q-learning with continuous actions
• Multitask RL with continuous actions.
• Parallel TRPO
• 주목할만한 keywords
• Continuous actions: [1], [2]
• Multiagents: [3]
• Hierarchical reinforcement learning: [4]
• Better exploration: [5]
183.
아쉽지만..
OpenAI’s blog post.Generative Models
Deep Convolutional Generative Adversarial Networks
https://github.com/carpedm20/DCGAN-tensorflow