15. In-place Operation
• The input is usually overwritten by the output as the algorithm executes.
• In-place operation updates input sequence only through replacement or
swapping of elements.
19. a = torch.zeros(1, 3)
print(a)
print(hex(id(a)))
0 0 0
[torch.FloatTensor of size 1x3]
0x7f4b08813188
In-place in Pytorch
20. a = torch.zeros(1, 3)
print(a)
print(hex(id(a)))
0 0 0
[torch.FloatTensor of size 1x3]
0x7f4b08813188
0x7f4b08813188 FloatTensor([0, 0, 0])a
In-place in Pytorch
21. case 1)
a = a + 1
print(a.numpy())
print(hex(id(a)))
0x7f4b08813188 FloatTensor([0, 0, 0])a
case 2)
for i in range(3):
a[:, i] += 1
print(a.numpy())
print(hex(id(a)))
In-place in Pytorch
22. case 1)
a = a + 1
print(a.numpy())
print(hex(id(a)))
[[1. 1. 1.]]
0x7f4b088135c8
0x7f4b08813188 FloatTensor([0, 0, 0])a
case 2)
for i in range(3):
a[:, i] += 1
print(a.numpy())
print(hex(id(a)))
[[1. 1. 1.]]
0x7f4b08813188
In-place in Pytorch
23. case 1)
0x7f4b08813188 FloatTensor([0, 0, 0])a
case 2)
0x7f4b08813188 FloatTensor([0, 0, 0])
a
0x7f4b088135c8 FloatTensor([1, 1, 1])
0x7f4b08813188
FloatTensor([0, 0, 0])
FloatTensor([1, 1, 1])
a
In-place in Pytorch
24. case 1)
0x7f4b08813188 FloatTensor([0, 0, 0])a
case 2)
0x7f4b08813188 FloatTensor([0, 0, 0])
a
0x7f4b088135c8 FloatTensor([1, 1, 1])
0x7f4b08813188
FloatTensor([0, 0, 0])
FloatTensor([1, 1, 1])
a
Out of place In-place
In-place in Pytorch
25. case 3)
a = a.add(1)
print(a.numpy())
print(hex(id(a)))
0x7f4b08813188 FloatTensor([0, 0, 0])a
case 4)
a.add_(1)
print(a.numpy())
print(hex(id(a)))
In-place in Pytorch
26. case 3)
a = a.add(1)
print(a.numpy())
print(hex(id(a)))
[[1. 1. 1.]]
0x7f4b088135c8
0x7f4b08813188 FloatTensor([0, 0, 0])a
case 4)
a.add_(1)
print(a.numpy())
print(hex(id(a)))
[[1. 1. 1.]]
0x7f4b08813188
In-place in Pytorch
Out of place In-place
27. case 5)
a += 1
print(a.numpy())
print(hex(id(a)))
0x7f4b08813188 FloatTensor([0, 0, 0])a
In-place in Pytorch
28. case 5)
a += 1
print(a.numpy())
print(hex(id(a)))
[[1. 1. 1.]]
0x7f4b08813188
0x7f4b08813188 FloatTensor([0, 0, 0])a
In-place in Pytorch
In-place
29. case 5)
a += 1
print(a.numpy())
print(hex(id(a)))
[[1. 1. 1.]]
0x7f4b08813188
0x7f4b08813188 FloatTensor([0, 0, 0])a
In-place in Pytorch
In-place
torch/autograd/variable.py::Variable( )
42. Memory sharing
A
B
INPUT
Sigmoid(A)
Sigmoid(B)
FPool(C) D + E
B
Pool(B)C
D
ERe-use
Release B after allocating C, E
Reuse for D
Memory sharing : Memory used by intermediate results that are no longer needed can be recycled and used in another node.
43. Memory sharing
A
B
INPUT
Sigmoid(A)
Sigmoid(B)
FPool(C) D + E
B
Pool(B)C
D
E
In-place
A
B
INPUT
Sigmoid(A)
Sigmoid(B)
FPool(C) D + E
B
Pool(B)C
D
E
In-place
OR A
B
INPUT
Sigmoid(A)
Sigmoid(B)
FPool(C) D + E
B
Pool(B)C
D
E
OR
Re-use
44. Trade Computation for Memory
• Apply normalization and non-linearities before/after the conv-operation.
• Convolution is most efficient when input lies in a contiguous block of
memory
• To make a contiguous input, each layer must copy all previous features
(concatenation → mem-copy)
• Above operations are computationally extremely cheap
• Copying to pre-allocated memory is significantly faster than
allocating new memory
45. Shared storage for concatenation
• Rather than allocating memory for each concatenation operation, assign
the outputs to a memory allocation shared across all layers
• Shared memory storage is used by all network layers, its data is not
permanent
• Need to be recomputed during back-propagation
46. Shared storage for batch normalization
& non-linearity activation
• Assign the outputs of batch normalization / activation to a shared
memory allocation
• The data in shared memory storage is not permanent and will be
overwritten by the next layer
• Should recompute the batch normalization / activation outputs during
back-propagation