1. DIFFERENTIABLE
NEURAL COMPUTER
Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external
memory." Nature 538.7626 (2016): 471-476.
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:
1410.5401 (2014).
문지형
(jhmoon@dm.snu.ac.kr)
LAB SEMINAR
SNU DataMining Center
2. PROBLEM OF RNN
➤ Exploding & vanishing gradient
2
! ! !!!!
if the largest eigenvalue is > 1,
gradient will explode
if the largest eigenvalue is < 1,
gradient will vanish
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016. p.404-405
3. LSTM & GRU & ATTENTION
➤ Gate를 통해 정보를 효과적으로 관리
➤ Attention을 통해 필요한 시점의 정보를 반영
3
Hochreiter, Sepp, and Jürgen Schmidhuber.
"Long short-term memory." Neural
computation 9.8 (1997): 1735-1780.
Chung, Junyoung, et al. "Gated feedback
recurrent neural networks." International
Conference on Machine Learning. 2015.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural
machine translation by jointly learning to align and translate." arXiv
preprint arXiv:1409.0473 (2014).
4. STILL…
➤ LSTM cannot even solve simple tasks like copying
4
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
5. WHY THIS HAPPENS
➤ Artificial neural networks are remarkably adept at sensory processing, sequence
learning and reinforcement learning
➤ But are limited in their ability to represent variables and data structures and to store
data over long timescales
➤ LSTM simultaneously keeps information and processes computing
5
6. BASIC IDEA
➤ Modern computers separate computation and memory
➤ Modeling?
6
http://people.idsia.ch/~rupesh/rnnsymposium2016/slides/graves.pdf
7. TURING MACHINE
➤ 1936년에 Turing이 고안한 추상적 계산 기계. 튜링 머신은 순서에 따라 계산이나 논리 조작을
행하는 장치로, 적절한 기억 장소와 알고리즘만 주어진다면 어떠한 계산이라도 가능함을 보여
주어 현대 컴퓨터의 원형을 제시하였다. ‒ 튜링 머신 [Turing machine] (실험심리학용어사전,
2008., 시그마프레스㈜)
7
8. VON NEUMANN ARCHITECTURE
➤ Turing Machine은 기계를 사용해 수학적인 문제를 사람처럼 풀기 위한 이론적인 개념
➤ Von Neumann Architecture는 Turing Machine의 개념을 바탕으로 실제 컴퓨터를
구현하기 위한 구조
8
https://en.wikipedia.org/wiki/Von_Neumann_architecture
10. NEURAL TURING MACHINE
➤ Tries to mimic Von Neumann Architecture with neural networks
➤ Neural Network를 사용하므로 gradient descent를 통해 학습이 가능
➤ Controller (LSTM or Feed-forward Network)
A. interacts with the external world via input and output vectors
B. also interacts with a memory matrix using selective read and write operations (heads)
➤ Memory (Matrix)
10
11. DEALING WITH MEMORY
➤ How to access the specific location of memory
➤ How to read and write to memory
11
12. DEALING WITH MEMORY
➤ How to access the specific location of memory
➤ How to read and write to memory
12
19. DIFFERENTIABLE NEURAL COMPUTER (DNC)
➤ DNC extends NTM addressing the following limitations:
1. Ensuring that blocks of allocated memory do not overlap and interfere
2. Freeing memory that have already been written to
3. Handling of non-contiguous memory through temporal links
19
20. DIFFERENTIABLE NEURAL COMPUTER (DNC)
➤ More like computer
➤ 컴퓨터가 a+b라는 연산을 할 때, +는 CPU에서, a와 b는 메모리에서 관리
➤ a와 b는 메모리 주소값으로 어떤 값을 주든, +를 할 수 있다면 답을 낼 수 있음
➤ https://youtu.be/B9U8sI7TcMY
20
1 9 2 3 4 7 9 9 8
source edge destination
21. CONTROLLER NETWORK
➤ LSTM or Feed-forward Network (NTM과 같음)
➤ LSTM:
➤ Feed-forward: function of X_t
21
22. CONTROLLER NETWORK
➤ LSTM or Feed-forward Network (NTM과 같음)
➤ LSTM:
➤ Feed-forward: function of X_t
22
학습변수
23. DEALING WITH MEMORY
➤ How to access the specific location of memory
➤ Better than (= more complicate than) NTM
1. 메모리가 서로 겹치지 않도록 처리
2. 오래된 메모리는 제거
3. 메모리의 입력 순서를 기억
➤ How to read and write to memory
➤ Same as NTM
23
24. DEALING WITH MEMORY
➤ How to access the specific location of memory
➤ Better than (= more complicate than) NTM
1. 메모리가 서로 겹치지 않도록 처리
2. 오래된 메모리는 제거
3. 메모리의 입력 순서를 기억
➤ How to read and write to memory
➤ Same as NTM
24
28. INTERFACE PARAMETERS
➤ Contains informations that can parameterize the memory interactions
28
1. 메모리가 서로 겹치지 않도록 처리
2. 오래된 메모리는 제거
3. 메모리의 입력 순서를 기억
29. MEMORY ADDRESSING
➤ 3 forms of attentions
1. content-based addressing
2. dynamic memory allocation
3. temporal memory linkage
29
31. DYNAMIC MEMORY ALLOCATION
➤ 목적: 메모리의 빈 부분에 새로운 정보를 write
➤ 빈 부분?
➤ memory usage vector
➤ update memory usage vector given previous usage vector(t-1 번째에 쓰여진 메모리의 위치정보는
반영이 되지 않은 상태), locations have just been written to, and memories that have been retained
by the free gates (최근에 읽어들인 정보 = 최근에 소모한 정보, )
31
N
is called memory
retention vector
represents by
“how much each
locations will not be freed
by the free gates”
32. DYNAMIC MEMORY ALLOCATION
➤ 목적: 메모리의 빈 부분에 새로운 정보를 write
➤ 새로운 정보를 write?
➤ update된 usage vector의 usage가 적은 순서대로 위치정보를 저장한 새로운 벡터를 생성
➤ 은 가장 usage가 적은 location의 정보를 담고 있음
➤ usage가 적은 location에 새로운 정보가 적힐 확률이 커지도록 allocation vector를 생성
32
2
4
5
7
8
10
11
13
6
12
9
3
1
33. TEMPORAL MEMORY LINKAGE
➤ 목적: to keep track of consecutively modified memory locations
➤ Linkage matrix
➤ : location j 다음에 location i 에 정보가 저장되었는지를 나타냄
➤ 현재(t) 정보가 쓰일 위치와 바로 직전(t-1)에 정보가 쓰인 위치가 필요
➤ t 시점에 쓰인 위치를 나타내는 precedence weighting vector
➤ 이 정보를 바탕으로 Linkage matrix가 update됨
33
오래된 linkage는 점차 소멸 현재 i에 입력되고 바로 직전 j에 입력되었으면 1
0 1 0 0 0
0 0 0 0 0
0 0 0 1 0
1 0 0 0 0
0 0 0 0 0
2 -> 1 -> 4 -> 3
<Linkage matrix 예>
34. READ AND WRITE WEIGHTING (ADDRESSING)
➤ Write head allocation
➤ content-based addressing + dynamic memory allocation
➤ 키 값과 유사한 메모리의 위치에 쓰기 + 빈 위치에 쓰기
➤ Read head allocation
➤ content-based addressing + temporal memory linkage
➤ 키 값과 유사한 메모리의 위치에서 불러오기 + 연관된 정보들을 순차적으로 불러오기
34
36. DEALING WITH MEMORY
➤ How to access the specific location of memory
➤ Better than (= more complicate than) NTM
1. 메모리가 서로 겹치지 않도록 처리
2. 오래된 메모리는 제거
3. 메모리의 입력 순서를 기억
➤ How to read and write to memory
➤ Same as NTM
36
37. READING AND WRITING TO MEMORY
➤ Reading
➤ Writing
37
missing weight implicitly assigned to a null operation
that does not access any of the locations
Reading Writing
44. SUMMARY
➤ RNN의 gate, attention은 정보가 손실되는 것을 막고, 필요한 정보만을 사용하기 위해 등장
➤ content memory is fragile
➤ can’t increase the amount of memory easily (computation also increases)
➤ computer처럼 memory와 computation을 분리시킨 NTM, DNC를 제안
➤ End-to-End model이기 때문에, gradient descent를 통해 학습 가능
➤ 필요한 부분을 집중적으로 보기 위해 Memory access시 attention할 수 있도록 modeling
➤ Read / Erase / Write (Add)
➤ Memory addressing
44
DNC is also another form of RNN!
https://github.com/deepmind/dnc
45. REFERENCE
1. Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external
memory." Nature 538.7626 (2016): 471-476.
2. Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:
1410.5401 (2014).
3. https://www.slideshare.net/carpedm20/differentiable-neural-computer
4. https://norman3.github.io/papers/docs/neural_turing_machine.html
5. https://www.youtube.com/watch?v=r5XKzjTFCZQ
6. https://deepmind.com/blog/differentiable-neural-computers/
45