5. Max pooling loses the spatial information
- We don’t use the relationship between objects. Is this a face?
6. Equivariance and invariance
- CNNs without max pooling are equivariant regarding translation.
- That’s something we want! But max pooling breaks it.
11. Can’t we go the
other way around
and achieve
viewpoint
invariance?
Computer
Vision?
12. CAPSULES ENCODE AN ENTITY
A capsule votes to say if a certain entity
is in the image.
13. Layer L Layer L+1
building
tea cup
face
nose
window
leaf
window nose leaf eye
face tea cup building
Correspondence
between network and
graph structure
14. Layer L Layer L+1
building
tea cup
face
nose
window
leaf
nose eye
face
Correspondence
between network and
graph structure
This graph has been carved out from
the full graph.
15. CAPSULES OUTPUT A VECTOR
A capsule encodes an entity (and its
properties) via its output vector.
16. Layer L Layer L+1
i
0.456
Fully
Connected
Net
The output of a
node (neuron) is a
scalar value.
0.456
0.456
17. Layer L Layer L+1
i
Capsules
Net
The output of a
node (capsule) is a
vector.
18. Layer L Layer L+1
digit 6
Capsules
Net:
an example
The first dimension
of the output vector
encodes for the
scale and thickness
of the digit.
19. Layer L Layer L+1
digit 6
Capsules
Net:
an example
The second
dimension of the
output vector
encodes for the
roundness of the
top part of the digit.
21. Layer L Layer L+1
j+1
j
j-1
i
Wi,j-1
Wi,j
Wi,j+1
Fully
Connected
Net
The information is
distributed
uniformly to every
other node in the
next layer.
22. Layer L Layer L+1
j+1
j
j-1
i
ci,j-1
Wi,j-1
ci,j
Wi,j
ci,j+1
Wi,j+1
Capsules
Net
The information is
distributed to a
specific node in the
next layer.
23. Routing mechanism (bonus slide)
- In a CNN, this routing mechanism is ‘inverted’.
- In a CapsNet, the routing is learned.
0.2
0.1
0.6
24. Layer L Layer L+1
building
tea cup
face
nose
ci,j-1
Wi,j-1
ci,j
Wi,j
ci,j+1
Wi,j+1
Capsules
Net:
an example
window
leaf
29. Layer L Layer L+1
j+1
j
j-1
i
Computing
the output
vector
i-1
i+1
Weighted sum of
the inputs (before
activation function).
30. Layer L Layer L+1
j+1
j
j-1
i
Computing
the output
vector
Squashing the
output vector to
fallback on a
probability (non
linear activation
function).
i-1
i+1
31.
32. How
routing is
achieved
How do we obtain
the ?
1 Start with log priors:
2 Initialise with
3 Make a forward pass to obtain the
4 Update the :