SlideShare a Scribd company logo
1 of 70
Download to read offline
Presentation on
“Neural Turing Machine”
Kiho Suh

PR12, May 7th 2017
Kiho Suh - PR12
About Paper
• Published on October 2014
(v1)

• Updated on December 2014
(v2)

• https://arxiv.org/abs/1410.5401
Kiho Suh - PR12
Kiho Suh - PR12
What is Turing
Machine?
• Model of a Computer

• Memory tape with Read and
Write heads

• Controller(Program) attends to
specific element

• Discrete, so not possible to
train with backpropagation
Kiho Suh - PR12
What is Neural Turing Machine (NTM)?
• Inspired by Turing Machine to perform tasks that computer can
perform very well but machine learning cannot perform well.
• Narrow gap between neural networks and algorithms
• ‘Differentiable’ Turing Machine
• ‘Sharp’ functions made smooth and can train with
backpropagation.
Modified from Daniel Shank
Kiho Suh - PR12
What is Neural Turing Machine (NTM)?
Neural Network

(CPU)
External Memory

(RAM)
+
Neural net that separates
computation from memory
Computer that learns programs
or algorithms from input and
output examples (copy, sort …)

[7,9,3,2] [2,3,7,9]
[4,3,0,5] [0,3,4,5]
[6,9,1,2] [1,2,6,9]
[7,2,8,3] [2,3,7,8]
…
Modified from Alex Graves’ slides
Kiho Suh - PR12
NTM Architecture
(Feedforward or 

Recurrent)
(real value matrix)
(select portions of

the memory)
Kiho Suh - PR12
Why not RNN or
LSTM?
• Memory is tied up in the
activations of the latent state of
the network.
• High Computation Cost: More
memory, bigger size of the
network
• Content Fragility: Being
constantly updated with new
info
MemoryActivation
Kiho Suh - PR12
Why NTM?
Rather than artificially increasing the size of the hidden state
in the RNN or LSTM,
we would like to arbitrarily increase the amount of
knowledge we add to the model while making minimal
changes to the model itself.
Kiho Suh - PR12
Innovations
1. Memory augmented networks
2. Attention mechanism: a novel idea in 2014 - check out Neural Machine
Translation by Jointly Learning to Align and Translate (Bahnadau, Cho, Bengio
2014)
3. Writing mechanism unlike other memory augmented networks such as Memory
Networks (Weston et al. 2014) and End-to-End Memory Networks(Sukhbaatar et
al. 2015).
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
Kiho Suh - PR12
Attention
• Want to focus on specific parts of the memory that
network wants to read and write to.
• When this paper came out in 2014, attention was novel
idea. Now, it is standard.
• Controller outputs to parameterize a distribution
(weighting) over the rows (locations in the memory matrix)
• Weight <- secret sauce
Kiho Suh - PR12
Data Structure and
Accessors
• Content only: Memory is
accessed like an associative
map. Inspired by brain.

• Content and location: Key finds
an array, shift indexes into it.

• Location only: Shift iterates
from the last focus. Inspired by
computer.
One-shot Learning with Memory-Augmented Neural Networks
(Santoro et al. 2016)
3 * 5 = 15 

3 * 5 = 14.95 

O

X
Kiho Suh - PR12
Addressing
Mt
Content
Interploation
Conv Shift
Sharpening
Addressing
kt gt stβt γt
wt-1
wt
c
wt
g
wt
~
wt
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
Kiho Suh - PR12
Selective Memory
Mt
Content
Interploation
Conv Shift
Sharpening
Addressing
kt gt stβt γt
wt-1
wt
c
wt
g
wt
~
wt
• Key design: how network interacts with memory.
• Making sure not interacting with whole memory at
once.
• Do not want to lose nice properties of
independence between memory and computation.
Right image from Tristen Deleu’s slide
Kiho Suh - PR12
Content Addressing
wt
c <- softmax(βt K [ kt, Mt(j) ] )
Mt
Content
Interploation
Conv Shift
Sharpening
Addressing
kt gt stβt γt
wt-1
wt
c
wt
g
wt
~
wt
Complete pattern or specific
version of approximate guess
Kiho Suh - PR12
Content Addressing
[0 0 0 1 0 0] [.15 .10 .47 .08 .13 .17] [.16 .16 .16 .16 .16 .16]
1
1
2
2
1
4
3
2
1[ ]
Mt (memory)
[3 2 1]
kt (key vector)
1
4
5
0
0
1
1
0
0
βt = 50 βt = 5 βt = 0
wt
c <- softmax(βt K [ kt, Mt(j) ] )
cosine similarity
βt (key strength)
Modified from Mark Chang’s slide
Kiho Suh - PR12
Interpolation (Location Addressing)
0 =< gt =< 1
Mt
Content
Interploation
Conv Shift
Sharpening
Addressing
kt gt stβt γt
wt-1
wt
c
wt
g
wt
~
wt
Complete pattern or specific
version of approximate guess
Kiho Suh - PR12
Interpolation (Location Addressing)
0.9
0.1
0
0
0
0
[ ]
0
0
1
0
0
0
[ ]
wtc (content weight) wt-1 (previous final weight)
when gt=1: when gt=0.5 when gt=0
[0 0 1 0 0 0] [.45 .05 .50 0 0 0] [.9 .1 .0 0 0 0]
0 =< gt =< 1
Right part becomes 0,
and Content only
Left part becomes 0,
and location only
gt (interpolation weight)
Modified from Mark Chang’s slide
Kiho Suh - PR12
Convolutional Shift (Location Addressing)
Mt
Content
Interploation
Conv Shift
Sharpening
Addressing
kt gt stβt γt
wt-1
wt
c
wt
g
wt
~
wt
Controller says how wide area
should be affected by the action
of the head
Kiho Suh - PR12
Convolutional Shift (Location Addressing)
w(i) <- w(i-1)*s(1) + w(i)s(0) + w(i+1)s(-1)
s = [1 0 0] s = [0 0 1] s = [.5 0 .5]
-1 0 1 -1 0 1 -1 0 1
[.45 .05 .50 0 0 0]
[.05 .50 0 0 0 .45]
[.45 .05 .50 0 0 0]
[0 .45 .05 .50 0 0]
[.45 .05 .50 0 0 0]
[.25 .475 .025 .25 0 .225]
~
All the numbers shift to left. All the numbers shift to right.
All the numbers give half of
itself to left and right.
[wi-1
g wi
g wi+1
g]
[s-1 s0 s1]
wi
~
wtg (interpolated weight)
st (shift weight)
Modified from Mark Chang’s slide
Kiho Suh - PR12
Sharpening (Location Addressing)
Mt
Content
Interploation
Conv Shift
Sharpening
Addressing
kt gt stβt γt
wt-1
wt
c
wt
g
wt
~
wt
The convolution in the previous
step can blur so sharpening.
Finally obtain the address (weight
value for each memory location)
of the memory that Controller
thinks we need.
Kiho Suh - PR12
Convolutional Shift (Location Addressing)
[0 .45 .05 .50 0 0]
[0 .37 0 .62 0 0][0 0 0 1 0 0] [.16 .16 .16 .16 .16 .16]
γt = 50 γt = 5 γt = 0
γt >= 1
because γ is smaller than 1, 

here the wt is even more blurred.
because γ is much bigger compared 

to 5 and 0, the array is sharpened.
wt (shifted weight)

γt (sharpening)
~
Modified from Mark Chang’s slide
From Tristen Deleu’s blog

Addressing is “soft” and distributed
across the entire memory. However, it is
focused on very few cells quantitatively.
Kiho Suh - PR12
Writing
Memory
et at
r t Mt
Reading
Erase
Add
Writing
Mt-1wt
Mt
~ Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
Memory
Memory Address
Memory Block
Block
Length
0 1 … i … n
0
j
m
……N x M
From Mark Chang’s slide
Erase Operation
Erase Operation:
0
1
1
11 2
21 3
42 1
0 00 00.9 0.1
0 1 … i … n
0
j
m
……
11 2
3
1
0.1 1.8
0.2 3.6
Head Location:
Erase Vector:
Memory :
From Mark Chang’s slide
N x M
N
M
Add Operation
Add Operation:
1
1
0
0 00 00.9 0.1
0 1 … i … n
11 2
3
1
0.1 1.8
0.2 3.6
2
3
10.2 3.6
1.9
1.9
1.1
1.0
Add Vector:
Memory :
Head Location:
0
j
m
……
From Mark Chang’s slide
N x M
N
M
Kiho Suh - PR12
Reading
Memory
et at
r t Mt
Reading
Erase
Add
Writing
Mt-1wt
Mt
~ Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
Read Operation
11 2
21 3
42 1
Read Operation:
0 00 00.9 0.1
0 1 … i … n
Read Vector:
Head Location:
Memory :
1.1
1.0
2.2
From Mark Chang’s slide
N x M
N
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([7,9,3,2], [2,3,7,9])loss between [2,9,3,7] and [2,3,7,9]
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
Kiho Suh - PR12
NTM Architecture in more detail
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
for example: input - ([4,3,0,5], [0,3,4,5])
unsorted sorted
loss between [0,3,4,5] and [0,3,4,5]
Kiho Suh - PR12
Experiments
NTM with
Feedforward Controller
NTM with
LSTM
LSTM
Network
Kiho Suh - PR12
Copy
From top to bottom: External Inputs/Outputs,
Adds/Reads Vectors to Memory,
Write/Read Weightings
Kiho Suh - PR12
Copy
NTM copy task
generalization
(train length ≤ 20,
test 120)
NTM not only copy, but also generalize!
So NTM learns program.
LSTM copy task
generalization
shift error
Kiho Suh - PR12
Copy
Kiho Suh - PR12
Repeat Copy
From top to bottom: External Inputs/Outputs,
Adds/Reads Vectors to Memory,
Write/Read Weightings
NTM learns its first
for-loop, using
content to jump,
iteration to step,
and a variable to
count to N.
Kiho Suh - PR12
Repeat Copy
Kiho Suh - PR12
Repeat Copy
Kiho Suh - PR12
Associate Recall
From top to bottom: External Inputs/Outputs,
Adds/Reads Vectors to Memory,
Write/Read Weightings
NTM correctly
produces the red box
item after they see the
green box item. Similar
to dictionary.
Matching item Query item Next-to-matching item
Kiho Suh - PR12
Associate Recall
Kiho Suh - PR12
Associate Recall (Generalization)
Number of
incorrect bits
Kiho Suh - PR12
Dynamic N-Gram
The goal of dynamic n-gram task was to test whether NTM could rapidly
adapt to new predictive distributions.
Mismatching
Kiho Suh - PR12
Dynamic N-Gram
Kiho Suh - PR12
Priority Sort
Write head writes to locations according to a linear
function of priority and Read head reads from
locations in increasing order
Kiho Suh - PR12
Priority Sort
Kiho Suh - PR12
Innovations
1. Memory augmented networks
2. Attention mechanism: a novel idea in 2014 - check out Neural Machine
Translation by Jointly Learning to Align and Translate (Bahnadau, Cho, Bengio
2014)
3. Writing mechanism unlike other memory augmented networks such as Memory
Networks (Weston et al. 2014) and End-to-End Memory Networks(Sukhbaatar et
al. 2015).
Kiho Suh - PR12
NTM Architecture
Addressing
kt
wt
wt-1
et at
gt
r t-1 input output Mt-1
r t Mt
stβt γt
WritingReading
Memory
Controller
Kiho Suh - PR12
What to improve?
• Memory management problem -> Dynamic Allocation
• Time Retrieval Memory in Order -> Temporal Matrix
• Graph Algorithm for wider range of tasks
• Reinforcement Learning
Differentiable Neural Computer!!!
Hybrid Computing using a neural network with dynamic
external memory (Graves et al. 2016)
Kiho Suh - PR12
Discussion
• Why some results are better with NTM+feedforward (i.e. associate
recall) while others are better with NTM+LSTM (i.e. copy)?
• The paper shows in order of “content addressing” and
“interpolation.” However, wouldn’t it make more sense to do
“interpolation” and then “content addressing?”
• Differentiable might not be the best way to learn program,
because it is inherently fragile. Programs are discrete in native.
Every bit really counts. So gradient descent might not be
desirable. Is NTM’s approach still right way to go?
• Any Question?
Kiho Suh - PR12
Reference
• https://www.slideshare.net/ckmarkohchang/neural-turing-machine-tutorial-51270912
• https://www.slideshare.net/SessionsEvents/daniel-shank-data-scientist-talla-at-mlconf-
sf-2016?qid=5b26ca7a-6a33-43d0-a8d8-92adce30306e&v=&b=&from_search=6
• https://www.youtube.com/watch?v=_H0i0IhEO2g&t=532s
• https://norman3.github.io/papers/docs/neural_turing_machine
• https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-
lasagne-2cdce6837315
• https://www.slideshare.net/yuzurukato/neural-turing-machines-43179669
• https://arxiv.org/pdf/1409.0473.pdf
• https://arxiv.org/abs/1605.06065

More Related Content

What's hot

第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)RCCSRENKEI
 
第10回 配信講義 計算科学技術特論A(2021)
第10回 配信講義 計算科学技術特論A(2021)第10回 配信講義 計算科学技術特論A(2021)
第10回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
Hash tables and hash maps in python | Edureka
Hash tables and hash maps in python | EdurekaHash tables and hash maps in python | Edureka
Hash tables and hash maps in python | EdurekaEdureka!
 
クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法
クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法
クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法Hiroshi Nakagawa
 
PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)
PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)
PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)OSgeo Japan
 
第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
制限ボルツマンマシン入門
制限ボルツマンマシン入門制限ボルツマンマシン入門
制限ボルツマンマシン入門佑馬 斎藤
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてMaho Nakata
 
Supervised PCAとその周辺
Supervised PCAとその周辺Supervised PCAとその周辺
Supervised PCAとその周辺Daisuke Yoneoka
 
第3回 配信講義 計算科学技術特論A (2021)
第3回 配信講義 計算科学技術特論A (2021) 第3回 配信講義 計算科学技術特論A (2021)
第3回 配信講義 計算科学技術特論A (2021) RCCSRENKEI
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAmrinder Arora
 
MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」moterech
 
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-Deep Learning JP
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 
第9回 配信講義 計算科学技術特論B(2022)
 第9回 配信講義 計算科学技術特論B(2022) 第9回 配信講義 計算科学技術特論B(2022)
第9回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 

What's hot (20)

第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)
 
第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)
 
第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)
 
第10回 配信講義 計算科学技術特論A(2021)
第10回 配信講義 計算科学技術特論A(2021)第10回 配信講義 計算科学技術特論A(2021)
第10回 配信講義 計算科学技術特論A(2021)
 
第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)
 
第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)
 
Hash tables and hash maps in python | Edureka
Hash tables and hash maps in python | EdurekaHash tables and hash maps in python | Edureka
Hash tables and hash maps in python | Edureka
 
クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法
クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法
クラシックな機械学習の入門  10. マルコフ連鎖モンテカルロ 法
 
PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)
PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)
PosGIS/pgRoutingとRの連携による道路ネットワーク分析(埼玉大学・国府田様)
 
第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)
 
制限ボルツマンマシン入門
制限ボルツマンマシン入門制限ボルツマンマシン入門
制限ボルツマンマシン入門
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解について
 
Supervised PCAとその周辺
Supervised PCAとその周辺Supervised PCAとその周辺
Supervised PCAとその周辺
 
第3回 配信講義 計算科学技術特論A (2021)
第3回 配信講義 計算科学技術特論A (2021) 第3回 配信講義 計算科学技術特論A (2021)
第3回 配信講義 計算科学技術特論A (2021)
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data Structures
 
Turing Machine
Turing MachineTuring Machine
Turing Machine
 
MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」
 
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
第9回 配信講義 計算科学技術特論B(2022)
 第9回 配信講義 計算科学技術特論B(2022) 第9回 配信講義 計算科学技術特論B(2022)
第9回 配信講義 計算科学技術特論B(2022)
 

Similar to Neural Turing Machine

Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程littleuniverse24
 
Advanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big DataAdvanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big DataVictor Smirnov
 
lecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdflecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdfAyushKumar93531
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingJun Young Park
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural NetworksShinya Takamaeda-Y
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Ryo Takahashi
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu WorksZhen Wei
 
Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Jéferson Machado
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - PyData
 
CMUデータベース輪読会第8回
CMUデータベース輪読会第8回CMUデータベース輪読会第8回
CMUデータベース輪読会第8回Keisuke Suzuki
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterEDB
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorSun-Li Beatteay
 

Similar to Neural Turing Machine (20)

Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
 
Memory model
Memory modelMemory model
Memory model
 
Advanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big DataAdvanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big Data
 
lecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdflecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdf
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural Networks
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Apache Spark Internals - Part 2
Apache Spark Internals - Part 2
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
CMUデータベース輪読会第8回
CMUデータベース輪読会第8回CMUデータベース輪読会第8回
CMUデータベース輪読会第8回
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even Faster
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editor
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Neural Turing Machine

  • 1. Presentation on “Neural Turing Machine” Kiho Suh PR12, May 7th 2017
  • 2. Kiho Suh - PR12 About Paper • Published on October 2014 (v1) • Updated on December 2014 (v2) • https://arxiv.org/abs/1410.5401 Kiho Suh - PR12
  • 3. Kiho Suh - PR12 What is Turing Machine? • Model of a Computer • Memory tape with Read and Write heads • Controller(Program) attends to specific element • Discrete, so not possible to train with backpropagation
  • 4. Kiho Suh - PR12 What is Neural Turing Machine (NTM)? • Inspired by Turing Machine to perform tasks that computer can perform very well but machine learning cannot perform well. • Narrow gap between neural networks and algorithms • ‘Differentiable’ Turing Machine • ‘Sharp’ functions made smooth and can train with backpropagation. Modified from Daniel Shank
  • 5. Kiho Suh - PR12 What is Neural Turing Machine (NTM)? Neural Network (CPU) External Memory (RAM) + Neural net that separates computation from memory Computer that learns programs or algorithms from input and output examples (copy, sort …) [7,9,3,2] [2,3,7,9] [4,3,0,5] [0,3,4,5] [6,9,1,2] [1,2,6,9] [7,2,8,3] [2,3,7,8] … Modified from Alex Graves’ slides
  • 6. Kiho Suh - PR12 NTM Architecture (Feedforward or Recurrent) (real value matrix) (select portions of the memory)
  • 7. Kiho Suh - PR12 Why not RNN or LSTM? • Memory is tied up in the activations of the latent state of the network. • High Computation Cost: More memory, bigger size of the network • Content Fragility: Being constantly updated with new info MemoryActivation
  • 8. Kiho Suh - PR12 Why NTM? Rather than artificially increasing the size of the hidden state in the RNN or LSTM, we would like to arbitrarily increase the amount of knowledge we add to the model while making minimal changes to the model itself.
  • 9. Kiho Suh - PR12 Innovations 1. Memory augmented networks 2. Attention mechanism: a novel idea in 2014 - check out Neural Machine Translation by Jointly Learning to Align and Translate (Bahnadau, Cho, Bengio 2014) 3. Writing mechanism unlike other memory augmented networks such as Memory Networks (Weston et al. 2014) and End-to-End Memory Networks(Sukhbaatar et al. 2015).
  • 10. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller
  • 11. Kiho Suh - PR12 Attention • Want to focus on specific parts of the memory that network wants to read and write to. • When this paper came out in 2014, attention was novel idea. Now, it is standard. • Controller outputs to parameterize a distribution (weighting) over the rows (locations in the memory matrix) • Weight <- secret sauce
  • 12. Kiho Suh - PR12 Data Structure and Accessors • Content only: Memory is accessed like an associative map. Inspired by brain. • Content and location: Key finds an array, shift indexes into it. • Location only: Shift iterates from the last focus. Inspired by computer. One-shot Learning with Memory-Augmented Neural Networks (Santoro et al. 2016) 3 * 5 = 15 3 * 5 = 14.95 O X
  • 13. Kiho Suh - PR12 Addressing Mt Content Interploation Conv Shift Sharpening Addressing kt gt stβt γt wt-1 wt c wt g wt ~ wt Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller
  • 14. Kiho Suh - PR12 Selective Memory Mt Content Interploation Conv Shift Sharpening Addressing kt gt stβt γt wt-1 wt c wt g wt ~ wt • Key design: how network interacts with memory. • Making sure not interacting with whole memory at once. • Do not want to lose nice properties of independence between memory and computation. Right image from Tristen Deleu’s slide
  • 15. Kiho Suh - PR12 Content Addressing wt c <- softmax(βt K [ kt, Mt(j) ] ) Mt Content Interploation Conv Shift Sharpening Addressing kt gt stβt γt wt-1 wt c wt g wt ~ wt Complete pattern or specific version of approximate guess
  • 16. Kiho Suh - PR12 Content Addressing [0 0 0 1 0 0] [.15 .10 .47 .08 .13 .17] [.16 .16 .16 .16 .16 .16] 1 1 2 2 1 4 3 2 1[ ] Mt (memory) [3 2 1] kt (key vector) 1 4 5 0 0 1 1 0 0 βt = 50 βt = 5 βt = 0 wt c <- softmax(βt K [ kt, Mt(j) ] ) cosine similarity βt (key strength) Modified from Mark Chang’s slide
  • 17. Kiho Suh - PR12 Interpolation (Location Addressing) 0 =< gt =< 1 Mt Content Interploation Conv Shift Sharpening Addressing kt gt stβt γt wt-1 wt c wt g wt ~ wt Complete pattern or specific version of approximate guess
  • 18. Kiho Suh - PR12 Interpolation (Location Addressing) 0.9 0.1 0 0 0 0 [ ] 0 0 1 0 0 0 [ ] wtc (content weight) wt-1 (previous final weight) when gt=1: when gt=0.5 when gt=0 [0 0 1 0 0 0] [.45 .05 .50 0 0 0] [.9 .1 .0 0 0 0] 0 =< gt =< 1 Right part becomes 0, and Content only Left part becomes 0, and location only gt (interpolation weight) Modified from Mark Chang’s slide
  • 19. Kiho Suh - PR12 Convolutional Shift (Location Addressing) Mt Content Interploation Conv Shift Sharpening Addressing kt gt stβt γt wt-1 wt c wt g wt ~ wt Controller says how wide area should be affected by the action of the head
  • 20. Kiho Suh - PR12 Convolutional Shift (Location Addressing) w(i) <- w(i-1)*s(1) + w(i)s(0) + w(i+1)s(-1) s = [1 0 0] s = [0 0 1] s = [.5 0 .5] -1 0 1 -1 0 1 -1 0 1 [.45 .05 .50 0 0 0] [.05 .50 0 0 0 .45] [.45 .05 .50 0 0 0] [0 .45 .05 .50 0 0] [.45 .05 .50 0 0 0] [.25 .475 .025 .25 0 .225] ~ All the numbers shift to left. All the numbers shift to right. All the numbers give half of itself to left and right. [wi-1 g wi g wi+1 g] [s-1 s0 s1] wi ~ wtg (interpolated weight) st (shift weight) Modified from Mark Chang’s slide
  • 21. Kiho Suh - PR12 Sharpening (Location Addressing) Mt Content Interploation Conv Shift Sharpening Addressing kt gt stβt γt wt-1 wt c wt g wt ~ wt The convolution in the previous step can blur so sharpening. Finally obtain the address (weight value for each memory location) of the memory that Controller thinks we need.
  • 22. Kiho Suh - PR12 Convolutional Shift (Location Addressing) [0 .45 .05 .50 0 0] [0 .37 0 .62 0 0][0 0 0 1 0 0] [.16 .16 .16 .16 .16 .16] γt = 50 γt = 5 γt = 0 γt >= 1 because γ is smaller than 1, here the wt is even more blurred. because γ is much bigger compared to 5 and 0, the array is sharpened. wt (shifted weight) γt (sharpening) ~ Modified from Mark Chang’s slide
  • 23. From Tristen Deleu’s blog Addressing is “soft” and distributed across the entire memory. However, it is focused on very few cells quantitatively.
  • 24. Kiho Suh - PR12 Writing Memory et at r t Mt Reading Erase Add Writing Mt-1wt Mt ~ Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller
  • 25. Memory Memory Address Memory Block Block Length 0 1 … i … n 0 j m ……N x M From Mark Chang’s slide
  • 26. Erase Operation Erase Operation: 0 1 1 11 2 21 3 42 1 0 00 00.9 0.1 0 1 … i … n 0 j m …… 11 2 3 1 0.1 1.8 0.2 3.6 Head Location: Erase Vector: Memory : From Mark Chang’s slide N x M N M
  • 27. Add Operation Add Operation: 1 1 0 0 00 00.9 0.1 0 1 … i … n 11 2 3 1 0.1 1.8 0.2 3.6 2 3 10.2 3.6 1.9 1.9 1.1 1.0 Add Vector: Memory : Head Location: 0 j m …… From Mark Chang’s slide N x M N M
  • 28. Kiho Suh - PR12 Reading Memory et at r t Mt Reading Erase Add Writing Mt-1wt Mt ~ Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller
  • 29. Read Operation 11 2 21 3 42 1 Read Operation: 0 00 00.9 0.1 0 1 … i … n Read Vector: Head Location: Memory : 1.1 1.0 2.2 From Mark Chang’s slide N x M N
  • 30. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 31. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 32. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 33. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 34. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 35. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 36. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 37. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 38. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 39. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9]) unsorted sorted
  • 40. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([7,9,3,2], [2,3,7,9])loss between [2,9,3,7] and [2,3,7,9] unsorted sorted
  • 41. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 42. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 43. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 44. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 45. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 46. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 47. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 48. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 49. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 50. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted
  • 51. Kiho Suh - PR12 NTM Architecture in more detail Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller for example: input - ([4,3,0,5], [0,3,4,5]) unsorted sorted loss between [0,3,4,5] and [0,3,4,5]
  • 52. Kiho Suh - PR12 Experiments NTM with Feedforward Controller NTM with LSTM LSTM Network
  • 53. Kiho Suh - PR12 Copy From top to bottom: External Inputs/Outputs, Adds/Reads Vectors to Memory, Write/Read Weightings
  • 54. Kiho Suh - PR12 Copy NTM copy task generalization (train length ≤ 20, test 120) NTM not only copy, but also generalize! So NTM learns program. LSTM copy task generalization shift error
  • 55. Kiho Suh - PR12 Copy
  • 56. Kiho Suh - PR12 Repeat Copy From top to bottom: External Inputs/Outputs, Adds/Reads Vectors to Memory, Write/Read Weightings NTM learns its first for-loop, using content to jump, iteration to step, and a variable to count to N.
  • 57. Kiho Suh - PR12 Repeat Copy
  • 58. Kiho Suh - PR12 Repeat Copy
  • 59. Kiho Suh - PR12 Associate Recall From top to bottom: External Inputs/Outputs, Adds/Reads Vectors to Memory, Write/Read Weightings NTM correctly produces the red box item after they see the green box item. Similar to dictionary. Matching item Query item Next-to-matching item
  • 60. Kiho Suh - PR12 Associate Recall
  • 61. Kiho Suh - PR12 Associate Recall (Generalization) Number of incorrect bits
  • 62. Kiho Suh - PR12 Dynamic N-Gram The goal of dynamic n-gram task was to test whether NTM could rapidly adapt to new predictive distributions. Mismatching
  • 63. Kiho Suh - PR12 Dynamic N-Gram
  • 64. Kiho Suh - PR12 Priority Sort Write head writes to locations according to a linear function of priority and Read head reads from locations in increasing order
  • 65. Kiho Suh - PR12 Priority Sort
  • 66. Kiho Suh - PR12 Innovations 1. Memory augmented networks 2. Attention mechanism: a novel idea in 2014 - check out Neural Machine Translation by Jointly Learning to Align and Translate (Bahnadau, Cho, Bengio 2014) 3. Writing mechanism unlike other memory augmented networks such as Memory Networks (Weston et al. 2014) and End-to-End Memory Networks(Sukhbaatar et al. 2015).
  • 67. Kiho Suh - PR12 NTM Architecture Addressing kt wt wt-1 et at gt r t-1 input output Mt-1 r t Mt stβt γt WritingReading Memory Controller
  • 68. Kiho Suh - PR12 What to improve? • Memory management problem -> Dynamic Allocation • Time Retrieval Memory in Order -> Temporal Matrix • Graph Algorithm for wider range of tasks • Reinforcement Learning Differentiable Neural Computer!!! Hybrid Computing using a neural network with dynamic external memory (Graves et al. 2016)
  • 69. Kiho Suh - PR12 Discussion • Why some results are better with NTM+feedforward (i.e. associate recall) while others are better with NTM+LSTM (i.e. copy)? • The paper shows in order of “content addressing” and “interpolation.” However, wouldn’t it make more sense to do “interpolation” and then “content addressing?” • Differentiable might not be the best way to learn program, because it is inherently fragile. Programs are discrete in native. Every bit really counts. So gradient descent might not be desirable. Is NTM’s approach still right way to go? • Any Question?
  • 70. Kiho Suh - PR12 Reference • https://www.slideshare.net/ckmarkohchang/neural-turing-machine-tutorial-51270912 • https://www.slideshare.net/SessionsEvents/daniel-shank-data-scientist-talla-at-mlconf- sf-2016?qid=5b26ca7a-6a33-43d0-a8d8-92adce30306e&v=&b=&from_search=6 • https://www.youtube.com/watch?v=_H0i0IhEO2g&t=532s • https://norman3.github.io/papers/docs/neural_turing_machine • https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in- lasagne-2cdce6837315 • https://www.slideshare.net/yuzurukato/neural-turing-machines-43179669 • https://arxiv.org/pdf/1409.0473.pdf • https://arxiv.org/abs/1605.06065