2. Remarks
• The neurons should be clustered based on specific pattern
• Capsules – the neuron groups
• Comparing to many methods focus on “filter grouping”, Capsnet focus on the feature
maps.
• This would be thought like the “filter grouping” because the feature maps are
generated from the filters.
• The activation function should control the information flows
• Dynamic routing – the information flow should flow to the function which can keep
their original information the most.
• The activation is happened before the information flow into the neurons
• Max pooling is not specific. So the “attention like” function would be helpful
• https://math.stackexchange.com/questions/689022/how-does-the-dot-product-
determine-similarity
5. The activation function – dynamic routine
Input
Capsule
1
Capsule
2
Capsule
3
Capsule
n
…
Kernel
groups 1
Kernel
groups 2
Kernel
groups 3
Kernel
groups n
Capsule
1
Capsule
2
Capsule
3
Capsule
m
…
C
The dynamic routine contains 2 concepts
1. Agreements
1. Consider as pooling
2. Activation
1. An attention like
6. How to activate the capsules
Capsule
(u)
Capsule
(v)
W
filters
Capsule
(u)
Capsule
(û̂̂̂)
The shape of
û is the same
as v
Step 1.
Step 2.
Capsule
(û̂̂̂1)
Capsule
(û̂̂̂2)
Capsule
(û̂̂̂3)
Capsule
(S)
c +c +c
Step 3. squashing
unitscale
Capsule
(S)
Capsule
(v)
squashing
Capsule
(u)
Capsule
(u)
W
C
7. The coupling coefficient - C
Capsule
(û̂̂̂1)
Capsule
(û̂̂̂2)
Capsule
(û̂̂̂3)
Capsule
(S)c +c +c Capsule
(v)
squashing
Capsule
(û̂̂̂)
Capsule
(u)
W
…...
Similarity (agreement)
• Once you update C, the
v will be changed
• You can re-calculate the
v and get a new C
• In the original paper,
this step will run 3 times
1. In the first step, the v is merged from all capsules. If the all the capsules give
a result that the output will similar to certain capsule, we can just activate
that capsule.
2. This will control the information flow.
• If the information is handles with some filters that are not correlated,
this would generated noise.
9. Comparing to normal neural network
activation
scalar vector
Cactivation
Weighted sum
Squashing
Accept all the
information
from previous
layer
Choose one
information
from previous
layer
11. Give the meaning to the DigitCaps
• Originally, the 10 vectors in
DigitCaps would not mean anything
• The study use the DigitCaps to
reconstruct the MNIST picture.
• For example, if the original
picture is 5, the other capsules
are masked and only keep the
specific capsule.
• The kept capsule is used for
reconstructing the picture. The
reconstructing results are also
used for regularization.