Capsule Networks

calculation | consulting
capsule networks
(TM)
c|c
(TM)
charles@calculationconsulting.com

calculation|consulting
capsule networks
(TM)

c|c
(TM)
(TM)
3
calculation | consulting capsule networks
Capsule networks by Hinton

c|c
(TM)
(TM)
4

c|c
(TM)
(TM)
5
Where ConvNets come from: LeNet 5
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
Gradient-based learning applied to document recognition,
Proc. IEEE 86(11): 2278–2324, 1998.

c|c
(TM)
(TM)
6
Convolutions usually w/ max pooling
we get gross spatial invariance by ignoring
exactly where a feature occurs
“A vision system needs to use the same
knowledge at all locations in the image” Hinton
ConvNet: share weights + max pooling

c|c
(TM)
(TM)
7
Hierarchical model of the visual system
HMax model, Riesenhuber and Poggio (1999)
dotted line selects max pooled features from lower layer

c|c
(TM)
(TM)
8
Pooling proposed by Hubel andWiesel in1962
A. Receptive ﬁeld (RF) of simple cell
(green) formed by pooling over
(center-surround) cells (yellow) in
the same orientation row
B. RF of complex cell (green) formed by
pooling over over simple cells.
here: (crude) translation invariance

c|c
(TM)
(TM)
9
ConvNets resemble hierarchical models (but notice the hyper-column)
HMax model, Riesenhuber and Poggio (1999)

c|c
(TM)
(TM)
10
Hinton: why max pooling is bad ?
(If) the brain embeds things in rectangular space, then
Translation is easy; Rotation is hard
Experiment: time for mind to process rotation ~ amount
Conv Nets:
Crude translation invariance
No explicit pose (orientation) information
Can not distinguish left from right
(actually some people have stopped using pooling)
A vision system needs to use the same knowledge at all locations in the image

c|c
(TM)
(TM)
11
2 streams hypothesis: what and where
Ventral: what objects are
Dorsal: where objects are in space
How do we know ? Neurological disorders
Simultanagnosia: can only see one object at a time
idea dates back to 1968
lots of other evidence as well
https://www.youtube.com/watch?v=mCoYOFzSS9A

c|c
(TM)
(TM)
12
Cortical Microcolumns
Capsules may encode
orientation scale
velocity color …
Column through cortical layers of the brain
80-120 neurons (2X long inV1)
share the same receptive ﬁeld
part of Hubel andWiesel, Nobel Prize 1981
also see recent review: https://www.sciencedirect.com/science/article/pii/S0166223615001484

c|c
(TM)
(TM)
13
Canonical object based frames of reference:
Hinton 1981
Hinton has been thinking about this a long time
A kind of inverse computer graphics

c|c
(TM)
(TM)
14
Capsule networks: inverse computer graphics
computer graphics: rendering engine
capsule network: inverse graphics
matrix of pose
information
Hinton proposes that our brain does a kind-of inverse computer graphics transformation.

c|c
(TM)
(TM)
15
Invariance vs Equivariance
Max pooling provides spatial Invariance, but Hinton argues we need spatial Equivariance.
so use vectors and Afﬁne transformations
Invariance: similar results if
image is shifted or rotated
Equivariance: invariance
under a Symmetry Transformations (S,A,…)
Group homomorphism: f(g*x)=g*f(x)=f(x)*g-1
Geometric: i.e. triangle
centers invariant under Similarity (S)
centroid invariant under Afﬁne (A)
Statistics:
mean: invariant under change of units
median: more generally invariant; a better statistic

c|c
(TM)
(TM)
16
Segmenting highly overlapping objects
Explaining away: Even if two hidden causes are independent, they can become
dependent when we observe an effect that they can both inﬂuence. Hinton

c|c
(TM)
(TM)
17
Capsule networks: architecture
+ unsupervised | reconstruction loss
supervised | max norm loss
Hinton et. al. Dynamic Routing Between Capsules (2017)

c|c
(TM)
(TM)
18
conv2D
Keep ﬁrst convolutional layer, but replace max pooling with …

c|c
(TM)
(TM)
19
conv2D
Reshape conv2d into primary capsule vectors (red), and
replace max pooling with routing-by-agreement algo

c|c
(TM)
(TM)
20
“Active capsules at one level (red) make predictions, via transformation matrices,
for the instantiation parameters of higher-level capsules (blue).
When multiple predictions agree, a higher level capsule (blue) becomes active”
conv2D

c|c
(TM)
(TM)
21
Primary layer: Conv2D reshaped
keras implementation: https://github.com/XifengGuo/CapsNet-Keras

c|c
(TM)
(TM)
22
Capsule networks: encodes poses
Capsules can represent objects w/ different poses (3D orientations)
Latest results (matrix capsules, below) improve best accuracy on SmallNORB by %45

c|c
(TM)
(TM)
23
Capsules capture visual features
“A capsule is a group of neurons whose outputs represent different properties of the same entity.”
Capsules encode SIFT-like features
Perturbing an image causes speciﬁc capsules to activate

c|c
(TM)
(TM)
24
Place-coding vs Rate-coding
Place-coding:
convNet w/out pooling
low level features for
small receptive ﬁelds
when a part moves, it may
gets a new capsule
position maps to active
capsules (u) in primary layer
Rate-coding:
traditional neurological way of coding (1926)
stimulus info encoded in rate of ﬁring
(as opposed to magnitude, population, timing, …)
when a part rotates or moves,
the capsule values change
maps to real-values of capsule output vectors (v)
rates
encoded
in
vector
values
aside: are ReLUs a kind of rate coding ?

c|c
(TM)
(TM)
25
Hierarchy of parts: coupled layers
A higher level entity is present if the lower / primary layer capsules
agree on their predictions for its pose.

c|c
(TM)
(TM)
26
Routining algo: some pose prose
An effective way to implement the “explaining away”
that is needed for segmenting highly overlapping objects.
Like an Attention mechanism: The competition … is between the higher-level
capsules that a lower-level capsule might send its vote to.
stuff Hinton says…
A capsule is activated only if the transformed poses coming from the layer
below match each other. This is a more effective way to capture covariance
and leads to models with many fewer parameters that generalize better.
…a powerful segmentation principle that allows knowledge of familiar shapes to
drive segmentation, rather than just using low-level cues such as proximity or
agreement in color or velocity.

c|c
(TM)
(TM)
27
Data-speciﬁc dynamic routes
squash
softmax
“c are determined by an iterative dynamic routing process”ij
weighted sum weighted mean prediction

c|c
(TM)
(TM)
28
Capsule vs traditional neuron
https://github.com/naturomics/CapsNet-Tensorﬂow

c|c
(TM)
(TM)
29
Capsule: afﬁne transformation
Primary rectangle and triangle capsules (prediction vectors) routed to
boat and house capsules (parent layer), and then routes pruned
“CapsNet is moderately robust to small afﬁne transformations of the training data”

c|c
(TM)
(TM)
30
Capsule: squashing function
https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66
length of the capsule vector ~ probability entity represented by capsule

c|c
(TM)
(TM)
31
Routing by agreement
Algo selects data-specific routes b by matching
primary outputs and squashed (secondary) outputs
ij
first paper uses vector overlap / cosine distance to find cluster centers: ok, but can not tell great from good
second paper (matrix capsules) uses a Free Energy cost function

c|c
(TM)
(TM)
32
Routing algorithm
How can we implement in Backprop ?
ﬁxed point equation

c|c
(TM)
(TM)
33
Routing algo: EM ﬁxed point equation
in forward pass of Backprop
(like an EM step)
must terminate to take dW
dot product ~ log likelihood (Energy*)
*Similar to ﬁxed point equation for TAP Free Energy in the EMF RBM
**and in the later matrix capsule paper, a Free Energy is used explicitly

c|c
(TM)
(TM)
34
Routing algo: ﬁxed point unwound (3 steps)
Similar to a 3 layer FCN w/shared weights W
= 0

c|c
(TM)
(TM)
35
Routing algorithm: keras Layers
https://keras.io/layers/writing-your-own-keras-layers/

c|c
(TM)
(TM)
36
Routing algo: keras

c|c
(TM)
(TM)
37
Routing algo: matrix capsules
cluster score = [ log p(x | mixture) - log p(x | uniform)]ii
cosine distance —> Free Energy cost:
EM to ﬁnd mean, variance, and mixing proportion of Gaussians
“data-points that form a tight cluster from the perspective of one capsule
may be widely scattered from the perspective of another capsule”
p(x | mixture)
ih

c|c
(TM)
(TM)
38
Matrix capsules: after 3 EM iterations
recent results from matrix capsule paper (more later)

c|c
(TM)
(TM)
40
From max pool to max |vector|
mask selects (squashed) max vector (by length)
- does not throw away position information
- inputs vector into Fully Connected Net
- reconstructs the image from the vector
- similar to a variational auto-encoder

c|c
(TM)
(TM)
41
From max pool to max |vector|

c|c
(TM)
(TM)
42
Reconstruction error: a regularizer

Reconstruction: overlapping images
c|c
(TM)
(TM)
43
individual (8, 6) reconstructed
after removing a speciﬁc capsule
and does not reconstruct absent (0, 1)
trained on overlapping
MNIST images
like (8,1) (6,7)
does have trouble with close images (like humans)
https://www.youtube.com/watch?v=gq-7HgzfDBM&t=62s

c|c
(TM)
(TM)
44
Matrix capsules : Nov 2017
capsule vectors —> matrices
cosine distance —> Free Energy cost function (Gaussian mixtures)
+ convolutions between layers + lots more details … for another video

(TM)
c|c
(TM)
c | c

Capsule Networks

More Related Content

What's hot

Similar to Capsule Networks

More from Charles Martin

Recently uploaded

Capsule Networks