Differentiable Neural Computer

DIFFERENTIABLE
NEURAL COMPUTER
Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external
memory." Nature 538.7626 (2016): 471-476.
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:
1410.5401 (2014).
문지형

(jhmoon@dm.snu.ac.kr)
LAB SEMINAR
SNU DataMining Center

PROBLEM OF RNN
➤ Exploding & vanishing gradient
2
! ! !!!!
if the largest eigenvalue is > 1,  
gradient will explode
if the largest eigenvalue is < 1,  
gradient will vanish
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016. p.404-405

LSTM & GRU & ATTENTION
➤ Gate를 통해 정보를 효과적으로 관리
➤ Attention을 통해 필요한 시점의 정보를 반영
3
Hochreiter, Sepp, and Jürgen Schmidhuber.
"Long short-term memory." Neural
computation 9.8 (1997): 1735-1780.
Chung, Junyoung, et al. "Gated feedback
recurrent neural networks." International
Conference on Machine Learning. 2015.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural
machine translation by jointly learning to align and translate." arXiv
preprint arXiv:1409.0473 (2014).

STILL…
➤ LSTM cannot even solve simple tasks like copying
4
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).

WHY THIS HAPPENS
➤ Artificial neural networks are remarkably adept at sensory processing, sequence
learning and reinforcement learning
➤ But are limited in their ability to represent variables and data structures and to store
data over long timescales
➤ LSTM simultaneously keeps information and processes computing
5

BASIC IDEA
➤ Modern computers separate computation and memory
➤ Modeling?
6
http://people.idsia.ch/~rupesh/rnnsymposium2016/slides/graves.pdf

TURING MACHINE
➤ 1936년에 Turing이 고안한 추상적 계산 기계. 튜링 머신은 순서에 따라 계산이나 논리 조작을
행하는 장치로, 적절한 기억 장소와 알고리즘만 주어진다면 어떠한 계산이라도 가능함을 보여
주어 현대 컴퓨터의 원형을 제시하였다. ‒ 튜링 머신 [Turing machine] (실험심리학용어사전,
2008., 시그마프레스㈜)
7

VON NEUMANN ARCHITECTURE
➤ Turing Machine은 기계를 사용해 수학적인 문제를 사람처럼 풀기 위한 이론적인 개념
➤ Von Neumann Architecture는 Turing Machine의 개념을 바탕으로 실제 컴퓨터를  
구현하기 위한 구조
8
https://en.wikipedia.org/wiki/Von_Neumann_architecture

NEURAL TURING MACHINE
➤ Tries to mimic Von Neumann Architecture with neural networks
➤ Neural Network를 사용하므로 gradient descent를 통해 학습이 가능
➤ Controller (LSTM or Feed-forward Network)
A. interacts with the external world via input and output vectors
B. also interacts with a memory matrix using selective read and write operations (heads)
➤ Memory (Matrix)
10

DEALING WITH MEMORY
➤ How to access the specific location of memory
➤ How to read and write to memory
11

DEALING WITH MEMORY
12

ADDRESSING
➤ Addressing mechanism
1. Content Addressing
2. Interpolation
3. Convolutional Shift
4. Sharpening
13

DEALING WITH MEMORY
14

CONTROLLER
➤ Access memory to read and write
➤ Read
➤ 메모리에서 읽을 정보가 있는 ‘위치’가 주어지면 (1)
➤ 그 위치에 저장된 메모리를 읽어들임 (2)
15

CONTROLLER
➤ Write
➤ Erase: 메모리에 쓰기 전, 필요없는 정보를 지움
16

CONTROLLER
➤ Write
➤ Add: 새로운 정보를 입력
17

DIFFERENTIABLE
NEURAL COMPUTER

DIFFERENTIABLE NEURAL COMPUTER (DNC)
➤ DNC extends NTM addressing the following limitations:
1. Ensuring that blocks of allocated memory do not overlap and interfere
2. Freeing memory that have already been written to
3. Handling of non-contiguous memory through temporal links
19

DIFFERENTIABLE NEURAL COMPUTER (DNC)
➤ More like computer
➤ 컴퓨터가 a+b라는 연산을 할 때, +는 CPU에서, a와 b는 메모리에서 관리
➤ a와 b는 메모리 주소값으로 어떤 값을 주든, +를 할 수 있다면 답을 낼 수 있음
➤ https://youtu.be/B9U8sI7TcMY
20
1 9 2 3 4 7 9 9 8
source edge destination

CONTROLLER NETWORK
➤ LSTM or Feed-forward Network (NTM과 같음)
➤ LSTM:
➤ Feed-forward: function of X_t
21

CONTROLLER NETWORK
➤ LSTM or Feed-forward Network (NTM과 같음)
➤ LSTM:
➤ Feed-forward: function of X_t
22
학습변수

DEALING WITH MEMORY
➤ Better than (= more complicate than) NTM
1. 메모리가 서로 겹치지 않도록 처리
2. 오래된 메모리는 제거
3. 메모리의 입력 순서를 기억
➤ Same as NTM
23

DEALING WITH MEMORY
➤ Same as NTM
24

NTM’S ADDRESSING REVIEW
➤ Addressing mechanism
1. Content Addressing
2. Interpolation
3. Convolutional Shift
4. Sharpening
25

INTERFACE PARAMETERS
➤ Contains informations that can parameterize the memory interactions
26

27

28

MEMORY ADDRESSING
➤ 3 forms of attentions
1. content-based addressing
2. dynamic memory allocation
3. temporal memory linkage
29

CONTENT-BASED ADDRESSING
30
beta는 퍼짐정도를 조절  
(클수록 sharp, 작을수록 smooth)

DYNAMIC MEMORY ALLOCATION
➤ 목적: 메모리의 빈 부분에 새로운 정보를 write
➤ 빈 부분?
➤ memory usage vector
➤ update memory usage vector given previous usage vector(t-1 번째에 쓰여진 메모리의 위치정보는
반영이 되지 않은 상태), locations have just been written to, and memories that have been retained
by the free gates (최근에 읽어들인 정보 = 최근에 소모한 정보, )
31
N
is called memory
retention vector
represents by  
“how much each
locations will not be freed
by the free gates”

DYNAMIC MEMORY ALLOCATION
➤ 목적: 메모리의 빈 부분에 새로운 정보를 write
➤ 새로운 정보를 write?
➤ update된 usage vector의 usage가 적은 순서대로 위치정보를 저장한 새로운 벡터를 생성
➤ 은 가장 usage가 적은 location의 정보를 담고 있음
➤ usage가 적은 location에 새로운 정보가 적힐 확률이 커지도록 allocation vector를 생성
32
2
4
5
7
8
10
11
13
6
12
9
3
1

TEMPORAL MEMORY LINKAGE
➤ 목적: to keep track of consecutively modified memory locations
➤ Linkage matrix
➤ : location j 다음에 location i 에 정보가 저장되었는지를 나타냄
➤ 현재(t) 정보가 쓰일 위치와 바로 직전(t-1)에 정보가 쓰인 위치가 필요
➤ t 시점에 쓰인 위치를 나타내는 precedence weighting vector
➤ 이 정보를 바탕으로 Linkage matrix가 update됨
33
오래된 linkage는 점차 소멸 현재 i에 입력되고 바로 직전 j에 입력되었으면 1
0 1 0 0 0
0 0 0 0 0
0 0 0 1 0
1 0 0 0 0
0 0 0 0 0
2 -> 1 -> 4 -> 3
<Linkage matrix 예>

READ AND WRITE WEIGHTING (ADDRESSING)
➤ Write head allocation
➤ content-based addressing + dynamic memory allocation
➤ 키 값과 유사한 메모리의 위치에 쓰기 + 빈 위치에 쓰기
➤ Read head allocation
➤ content-based addressing + temporal memory linkage
➤ 키 값과 유사한 메모리의 위치에서 불러오기 + 연관된 정보들을 순차적으로 불러오기
34

READ AND WRITE WEIGHTING (ADDRESSING)
➤ Write head allocation
➤ content-based addressing + dynamic memory allocation (빈 위치에 할당시키기)
➤ Read head allocation
➤ content-based addressing + temporal memory linkage (연관된 정보들을 순차적으로 읽어들이기)
35
0 1 0 0 0
0 0 0 0 0
0 0 0 1 0
1 0 0 0 0
0 0 0 0 0
2 -> 1 -> 4 -> 3
0
0
0
1
0
0
0
1
0
0
=

DEALING WITH MEMORY
➤ Same as NTM
36

READING AND WRITING TO MEMORY
➤ Reading
➤ Writing
37
missing weight implicitly assigned to a null operation
that does not access any of the locations
Reading Writing

EXPERIMENT (GRAPH TASK)
➤ Traversal
➤ Training
39
1 9 2 3 4 7 9 9 8
source edge destination
Graph Description Phase Query Phase Answer Phase
1 9 2 3 4 7 0 0 0
0 0 0 3 4 7 0 0 0
1 9 2 3 4 7 9 9 8
9 9 8 3 4 7 9 9 7

➤ Traversal
➤ London Underground data
40

RESULT (GRAPH TASK)
➤ London Underground
41

➤ Inference
➤ Training
42
1 9 2 0 0 7 2 9 7
start single edge 
relation 
예: 아들1
end
Graph Description Phase Query Phase Answer Phase
1 9 2 3 4 7 0 0 0 1 9 2 3 4 7 9 9 8
친손자 (347) 라는 relation이 
아들(007)의 아들(007)이라
는 것을 추론
multiple edge 
relation 
예: 친손자

➤ Inference
43

SUMMARY
➤ RNN의 gate, attention은 정보가 손실되는 것을 막고, 필요한 정보만을 사용하기 위해 등장
➤ content memory is fragile
➤ can’t increase the amount of memory easily (computation also increases)
➤ computer처럼 memory와 computation을 분리시킨 NTM, DNC를 제안
➤ End-to-End model이기 때문에, gradient descent를 통해 학습 가능
➤ 필요한 부분을 집중적으로 보기 위해 Memory access시 attention할 수 있도록 modeling
➤ Read / Erase / Write (Add)
➤ Memory addressing
44
DNC is also another form of RNN!
https://github.com/deepmind/dnc

REFERENCE
1. Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external
memory." Nature 538.7626 (2016): 471-476.
2. Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:
1410.5401 (2014).
3. https://www.slideshare.net/carpedm20/differentiable-neural-computer
4. https://norman3.github.io/papers/docs/neural_turing_machine.html
5. https://www.youtube.com/watch?v=r5XKzjTFCZQ
6. https://deepmind.com/blog/differentiable-neural-computers/
45

Differentiable Neural Computer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Differentiable Neural Computer

Similar to Differentiable Neural Computer (20)

Recently uploaded

Recently uploaded (20)

Differentiable Neural Computer