SlideShare a Scribd company logo
1 of 162
Download to read offline
Dynamic Routing Between Capsules
by S. Sabour, N. Frosst and G. Hinton (NIPS 2017)
presented by Karel Ha
27th
March 2018
Pattern Recognition and Computer Vision Reading Group
Outline
Motivation
Capsule
Routing by an Agreement
Capsule Network
Experiments
Conclusion
1
Motivation
The pooling operation used in convolutional
neural networks is a big mistake and the
fact that it works so well is a disaster.
G. Hinton
http://condo.ca/wp-content/uploads/2017/03/
Vector-director-Institute-artificial-intelligence-Toronto-MaRS-Discovery-District-H
ca_.jpg 1
The pooling operation used in convolutional
neural networks is a big mistake and the
fact that it works so well is a disaster.
G. Hinton
[...] it makes much more sense to represent
a pose as a small matrix that converts
a vector of positional coordinates relative to
the viewer into positional coordinates
relative to the shape itself.
G. Hinton
http://condo.ca/wp-content/uploads/2017/03/
Vector-director-Institute-artificial-intelligence-Toronto-MaRS-Discovery-District-H
ca_.jpg 1
Part-Whole Geometric Relationships
“What is wrong with convolutional neural nets?”
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
Part-Whole Geometric Relationships
“What is wrong with convolutional neural nets?”
To a CNN (with MaxPool)...
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
Part-Whole Geometric Relationships
“What is wrong with convolutional neural nets?”
To a CNN (with MaxPool)...
...both pictures are similar, since they both contain similar elements.
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
Part-Whole Geometric Relationships
“What is wrong with convolutional neural nets?”
To a CNN (with MaxPool)...
...both pictures are similar, since they both contain similar elements.
...a mere presence of objects can be a very strong indicator to consider a face in the image.
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
Part-Whole Geometric Relationships
“What is wrong with convolutional neural nets?”
To a CNN (with MaxPool)...
...both pictures are similar, since they both contain similar elements.
...a mere presence of objects can be a very strong indicator to consider a face in the image.
...orientational and relative spatial relationships are not very important.
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
Part-Whole Geometric Relationships
Scene Graphs from Computer Graphics
http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
Part-Whole Geometric Relationships
Scene Graphs from Computer Graphics
...takes into account relative positions of objects.
http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
Part-Whole Geometric Relationships
Scene Graphs from Computer Graphics
...takes into account relative positions of objects.
The internal representation in computer memory:
http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
Part-Whole Geometric Relationships
Scene Graphs from Computer Graphics
...takes into account relative positions of objects.
The internal representation in computer memory:
a) arrays of geometrical objects
http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
Part-Whole Geometric Relationships
Scene Graphs from Computer Graphics
...takes into account relative positions of objects.
The internal representation in computer memory:
a) arrays of geometrical objects
b) matrices representing their relative positions and orientations
http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
Part-Whole Geometric Relationships
Inverse (Computer) Graphics
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/
CNN-Capsule-Networks-Edureka-442x300.png 4
Part-Whole Geometric Relationships
Inverse (Computer) Graphics
Inverse graphics:
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/
CNN-Capsule-Networks-Edureka-442x300.png 4
Part-Whole Geometric Relationships
Inverse (Computer) Graphics
Inverse graphics:
from visual information received by eyes
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/
CNN-Capsule-Networks-Edureka-442x300.png 4
Part-Whole Geometric Relationships
Inverse (Computer) Graphics
Inverse graphics:
from visual information received by eyes
deconstruct a hierarchical representation of the world around us
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/
CNN-Capsule-Networks-Edureka-442x300.png 4
Part-Whole Geometric Relationships
Inverse (Computer) Graphics
Inverse graphics:
from visual information received by eyes
deconstruct a hierarchical representation of the world around us
and try to match it with already learned patterns and relationships stored in the brain
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/
CNN-Capsule-Networks-Edureka-442x300.png 4
Part-Whole Geometric Relationships
Inverse (Computer) Graphics
Inverse graphics:
from visual information received by eyes
deconstruct a hierarchical representation of the world around us
and try to match it with already learned patterns and relationships stored in the brain
relationships between 3D objects using a “pose” (= translation plus rotation)
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/
CNN-Capsule-Networks-Edureka-442x300.png 4
Pose Equivariance and the Viewing Angle
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
Pose Equivariance and the Viewing Angle
We have probably never seen these exact pictures, but we can still
immediately recognize the object in it...
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
Pose Equivariance and the Viewing Angle
We have probably never seen these exact pictures, but we can still
immediately recognize the object in it...
internal representation in the brain: independent of the viewing angle
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
Pose Equivariance and the Viewing Angle
We have probably never seen these exact pictures, but we can still
immediately recognize the object in it...
internal representation in the brain: independent of the viewing angle
quite hard for a CNN: no built-in understanding of 3D space
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
Pose Equivariance and the Viewing Angle
We have probably never seen these exact pictures, but we can still
immediately recognize the object in it...
internal representation in the brain: independent of the viewing angle
quite hard for a CNN: no built-in understanding of 3D space
much easier for a CapsNet: these relationships are explicitly modeled
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
Routing by an Agreement: High-Dimensional Coincidence
https://www.oreilly.com/ideas/introducing-capsule-networks 6
Routing by an Agreement: High-Dimensional Coincidence
https://www.oreilly.com/ideas/introducing-capsule-networks 6
Routing by an Agreement: High-Dimensional Coincidence
https://www.oreilly.com/ideas/introducing-capsule-networks 6
Routing by an Agreement: Illustrative Overview
https://www.oreilly.com/ideas/introducing-capsule-networks 7
Routing by an Agreement: Illustrative Overview
https://www.oreilly.com/ideas/introducing-capsule-networks 7
Routing by an Agreement: Recognizing Ambiguity in Images
https://www.oreilly.com/ideas/introducing-capsule-networks 8
Routing: Lower Levels Voting for Higher-Level Feature
(Sabour, Frosst and Hinton [2017]) 9
How to do it?
(mathematically)
9
Capsule
What Is a Capsule?
a group of neurons that:
perform some complicated internal computations on their
inputs
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
What Is a Capsule?
a group of neurons that:
perform some complicated internal computations on their
inputs
encapsulate their results into a small vector of highly
informative outputs
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
What Is a Capsule?
a group of neurons that:
perform some complicated internal computations on their
inputs
encapsulate their results into a small vector of highly
informative outputs
recognize an implicitly defined visual entity (over a limited
domain of viewing conditions and deformations)
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
What Is a Capsule?
a group of neurons that:
perform some complicated internal computations on their
inputs
encapsulate their results into a small vector of highly
informative outputs
recognize an implicitly defined visual entity (over a limited
domain of viewing conditions and deformations)
encode the probability of the entity being present
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
What Is a Capsule?
a group of neurons that:
perform some complicated internal computations on their
inputs
encapsulate their results into a small vector of highly
informative outputs
recognize an implicitly defined visual entity (over a limited
domain of viewing conditions and deformations)
encode the probability of the entity being present
encode instantiation parameters
pose, lighting, deformation relative to entity’s (implicitly defined) canonical version
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
Output As A Vector
https://www.oreilly.com/ideas/introducing-capsule-networks 11
Output As A Vector
probability of presence: locally invariant
E.g. if 0, 3, 2, 0, 0 leads to 0, 1, 0, 0, then 0, 0, 3, 2, 0 should also lead to 0, 1, 0, 0.
https://www.oreilly.com/ideas/introducing-capsule-networks 11
Output As A Vector
probability of presence: locally invariant
E.g. if 0, 3, 2, 0, 0 leads to 0, 1, 0, 0, then 0, 0, 3, 2, 0 should also lead to 0, 1, 0, 0.
instantiation parameters: equivariant
E.g. if 0, 3, 2, 0, 0 leads to 0, 1, 0, 0, then 0, 0, 3, 2, 0 might lead to 0, 0, 1, 0.
https://www.oreilly.com/ideas/introducing-capsule-networks 11
Previous Version of Capsules
for illustration taken from “Transforming Auto-Encoders” (Hinton, Krizhevsky and Wang [2011])
(Hinton, Krizhevsky and Wang [2011]) 12
Previous Version of Capsules
for illustration taken from “Transforming Auto-Encoders” (Hinton, Krizhevsky and Wang [2011])
three capsules of a transforming auto-encoder (that models translation)
(Hinton, Krizhevsky and Wang [2011]) 12
Capsule’s Vector Flow
https://cdn-images-1.medium.com/max/1250/1*GbmQ2X9NQoGuJ1M-EOD67g.png 13
Capsule’s Vector Flow
https://cdn-images-1.medium.com/max/1250/1*GbmQ2X9NQoGuJ1M-EOD67g.png 13
Capsule’s Vector Flow
Note: no bias (included in affine transformation matrices Wij ’s)
https://cdn-images-1.medium.com/max/1250/1*GbmQ2X9NQoGuJ1M-EOD67g.png 13
https://github.com/naturomics/CapsNet-Tensorflow 13
Routing by an Agreement
Capsule Schema with Routing
(Sabour, Frosst and Hinton [2017]) 14
Routing Softmax
cij =
exp(bij )
k exp(bik)
(1)
(Sabour, Frosst and Hinton [2017]) 15
Prediction Vectors
ˆuj|i = Wij ui (2)
(Sabour, Frosst and Hinton [2017]) 16
Total Input
sj =
i
cij ˆuj|i (3)
(Sabour, Frosst and Hinton [2017]) 17
Squashing: (vector) non-linearity
vj =
||sj ||2
1 + ||sj ||2
sj
||sj ||
(4)
(Sabour, Frosst and Hinton [2017]) 18
Squashing: (vector) non-linearity
vj =
||sj ||2
1 + ||sj ||2
sj
||sj ||
(4)
(Sabour, Frosst and Hinton [2017]) 18
Squashing: Plot for 1-D input
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 19
Squashing: Plot for 1-D input
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 19
Routing Algorithm
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.
3: for r iterations do
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.
3: for r iterations do
4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.
3: for r iterations do
4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1
5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.
3: for r iterations do
4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1
5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3
6: for all capsule j in layer (l + 1): vj ← squash(sj ) squash from Eq. 4
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.
3: for r iterations do
4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1
5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3
6: for all capsule j in layer (l + 1): vj ← squash(sj ) squash from Eq. 4
7: for all capsule i in layer l and capsule j in layer (l + 1): bij ← bij + ˆuj|i .vj
(Sabour, Frosst and Hinton [2017]) 20
Routing Algorithm
Algorithm Dynamic Routing between Capsules
1: procedure Routing(ˆuj|i , r, l)
2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.
3: for r iterations do
4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1
5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3
6: for all capsule j in layer (l + 1): vj ← squash(sj ) squash from Eq. 4
7: for all capsule i in layer l and capsule j in layer (l + 1): bij ← bij + ˆuj|i .vj
return vj
(Sabour, Frosst and Hinton [2017]) 20
https://youtu.be/rTawFwUvnLE?t=36m39s 20
Average Change of Each Routing Logit bij
(by each routing iteration during training)
(Sabour, Frosst and Hinton [2017]) 21
Average Change of Each Routing Logit bij
(by each routing iteration during training)
(Sabour, Frosst and Hinton [2017]) 21
Log Scale of Final Differences
(Sabour, Frosst and Hinton [2017]) 22
Training Loss of CapsNet on CIFAR10
(batch size of 128)
The CapsNet with 3 routing iterations optimizes the loss faster and
converges to a lower loss at the end.
(Sabour, Frosst and Hinton [2017]) 23
Training Loss of CapsNet on CIFAR10
(batch size of 128)
The CapsNet with 3 routing iterations optimizes the loss faster and
converges to a lower loss at the end.
(Sabour, Frosst and Hinton [2017]) 23
Training Loss of CapsNet on CIFAR10
(batch size of 128)
The CapsNet with 3 routing iterations optimizes the loss faster and
converges to a lower loss at the end.
(Sabour, Frosst and Hinton [2017]) 23
Capsule Network
Architecture: Encoder-Decoder
encoder:
(Sabour, Frosst and Hinton [2017]) 24
Architecture: Encoder-Decoder
encoder:
decoder:
(Sabour, Frosst and Hinton [2017]) 24
Encoder: CapsNet with 3 Layers
(Sabour, Frosst and Hinton [2017]) 25
Encoder: CapsNet with 3 Layers
input: 28 by 28 MNIST digit image
(Sabour, Frosst and Hinton [2017]) 25
Encoder: CapsNet with 3 Layers
input: 28 by 28 MNIST digit image
output: 16-dimensional vector of instantiation parameters
(Sabour, Frosst and Hinton [2017]) 25
Encoder Layer 1: (Standard) Convolutional Layer
(Sabour, Frosst and Hinton [2017]) 26
Encoder Layer 1: (Standard) Convolutional Layer
input: 28 × 28 image (one color channel)
(Sabour, Frosst and Hinton [2017]) 26
Encoder Layer 1: (Standard) Convolutional Layer
input: 28 × 28 image (one color channel)
output: 20 × 20 × 256
(Sabour, Frosst and Hinton [2017]) 26
Encoder Layer 1: (Standard) Convolutional Layer
input: 28 × 28 image (one color channel)
output: 20 × 20 × 256
256 kernels with size of 9 × 9 × 1
(Sabour, Frosst and Hinton [2017]) 26
Encoder Layer 1: (Standard) Convolutional Layer
input: 28 × 28 image (one color channel)
output: 20 × 20 × 256
256 kernels with size of 9 × 9 × 1
stride 1
(Sabour, Frosst and Hinton [2017]) 26
Encoder Layer 1: (Standard) Convolutional Layer
input: 28 × 28 image (one color channel)
output: 20 × 20 × 256
256 kernels with size of 9 × 9 × 1
stride 1
ReLU activation
(Sabour, Frosst and Hinton [2017]) 26
Encoder Layer 2: PrimaryCaps
(Sabour, Frosst and Hinton [2017]) 27
Encoder Layer 2: PrimaryCaps
input: 20 × 20 × 256
basic features detected by the convolutional layer
(Sabour, Frosst and Hinton [2017]) 27
Encoder Layer 2: PrimaryCaps
input: 20 × 20 × 256
basic features detected by the convolutional layer
output: 6 × 6 × 8 × 32
vector (activation) outputs of primary capsules
(Sabour, Frosst and Hinton [2017]) 27
Encoder Layer 2: PrimaryCaps
input: 20 × 20 × 256
basic features detected by the convolutional layer
output: 6 × 6 × 8 × 32
vector (activation) outputs of primary capsules
32 primary capsules
(Sabour, Frosst and Hinton [2017]) 27
Encoder Layer 2: PrimaryCaps
input: 20 × 20 × 256
basic features detected by the convolutional layer
output: 6 × 6 × 8 × 32
vector (activation) outputs of primary capsules
32 primary capsules
each applies eight 9 × 9 × 256 convolutional kernels
to the 20 × 20 × 256 input to produce 6 × 6 × 8 output
(Sabour, Frosst and Hinton [2017]) 27
Encoder Layer 3: DigitCaps
(Sabour, Frosst and Hinton [2017]) 28
Encoder Layer 3: DigitCaps
input: 6 × 6 × 8 × 32
(6 × 6 × 32)-many 8-dimensional vector activations
(Sabour, Frosst and Hinton [2017]) 28
Encoder Layer 3: DigitCaps
input: 6 × 6 × 8 × 32
(6 × 6 × 32)-many 8-dimensional vector activations
output: 16 × 10
(Sabour, Frosst and Hinton [2017]) 28
Encoder Layer 3: DigitCaps
input: 6 × 6 × 8 × 32
(6 × 6 × 32)-many 8-dimensional vector activations
output: 16 × 10
10 digit capsules
(Sabour, Frosst and Hinton [2017]) 28
Encoder Layer 3: DigitCaps
input: 6 × 6 × 8 × 32
(6 × 6 × 32)-many 8-dimensional vector activations
output: 16 × 10
10 digit capsules
input vectors gets their own 8 × 16 weight matrix Wij
that maps 8-dimensional input space to the 16-dimensional capsule output space
(Sabour, Frosst and Hinton [2017]) 28
Margin Loss
for a Digit Existence
https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce 29
Margin Loss
to Train the Whole Encoder
In other words, each DigitCap c has loss:
(Sabour, Frosst and Hinton [2017]) 30
Margin Loss
to Train the Whole Encoder
In other words, each DigitCap c has loss:
Lc =
max(0, m+
− ||vc||)2
iff a digit of class c is present,
λ max(0, ||vc|| − m−
)2
otherwise.
(Sabour, Frosst and Hinton [2017]) 30
Margin Loss
to Train the Whole Encoder
In other words, each DigitCap c has loss:
Lc =
max(0, m+
− ||vc||)2
iff a digit of class c is present,
λ max(0, ||vc|| − m−
)2
otherwise.
m+ = 0.9:
The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9.
(Sabour, Frosst and Hinton [2017]) 30
Margin Loss
to Train the Whole Encoder
In other words, each DigitCap c has loss:
Lc =
max(0, m+
− ||vc||)2
iff a digit of class c is present,
λ max(0, ||vc|| − m−
)2
otherwise.
m+ = 0.9:
The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9.
m− = 0.1:
The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1.
(Sabour, Frosst and Hinton [2017]) 30
Margin Loss
to Train the Whole Encoder
In other words, each DigitCap c has loss:
Lc =
max(0, m+
− ||vc||)2
iff a digit of class c is present,
λ max(0, ||vc|| − m−
)2
otherwise.
m+ = 0.9:
The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9.
m− = 0.1:
The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1.
λ = 0.5 is down-weighting of the loss for absent digit classes.
It stops the initial learning from shrinking the lengths of the activity vectors.
(Sabour, Frosst and Hinton [2017]) 30
Margin Loss
to Train the Whole Encoder
In other words, each DigitCap c has loss:
Lc =
max(0, m+
− ||vc||)2
iff a digit of class c is present,
λ max(0, ||vc|| − m−
)2
otherwise.
m+ = 0.9:
The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9.
m− = 0.1:
The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1.
λ = 0.5 is down-weighting of the loss for absent digit classes.
It stops the initial learning from shrinking the lengths of the activity vectors.
Squares? Because there are L2 norms in the loss function?
(Sabour, Frosst and Hinton [2017]) 30
Margin Loss
to Train the Whole Encoder
In other words, each DigitCap c has loss:
Lc =
max(0, m+
− ||vc||)2
iff a digit of class c is present,
λ max(0, ||vc|| − m−
)2
otherwise.
m+ = 0.9:
The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9.
m− = 0.1:
The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1.
λ = 0.5 is down-weighting of the loss for absent digit classes.
It stops the initial learning from shrinking the lengths of the activity vectors.
Squares? Because there are L2 norms in the loss function?
The total loss is the sum of the losses of all digit capsules.
(Sabour, Frosst and Hinton [2017]) 30
Margin Loss
Function Value for Positive and for Negative Class
For the correct DigitCap, the loss is 0 iff it predicts the correct label with probability ≥ 0.9.
For the mismatching DigitCap, the loss is 0 iff it predicts an incorrect label with probability ≤ 0.1.
https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce 31
Margin Loss
Function Value for Positive and for Negative Class
For the correct DigitCap, the loss is 0 iff it predicts the correct label with probability ≥ 0.9.
For the mismatching DigitCap, the loss is 0 iff it predicts an incorrect label with probability ≤ 0.1.
https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce 31
Decoder: Regularization of CapsNets
(Sabour, Frosst and Hinton [2017]) 32
Decoder: Regularization of CapsNets
Decoder is used for regularization:
decodes input from DigitCaps
to recreate an image of a (28 × 28)-pixels digit
(Sabour, Frosst and Hinton [2017]) 32
Decoder: Regularization of CapsNets
Decoder is used for regularization:
decodes input from DigitCaps
to recreate an image of a (28 × 28)-pixels digit
with the loss function being the Euclidean distance
(Sabour, Frosst and Hinton [2017]) 32
Decoder: Regularization of CapsNets
Decoder is used for regularization:
decodes input from DigitCaps
to recreate an image of a (28 × 28)-pixels digit
with the loss function being the Euclidean distance
ignores the negative classes
(Sabour, Frosst and Hinton [2017]) 32
Decoder: Regularization of CapsNets
Decoder is used for regularization:
decodes input from DigitCaps
to recreate an image of a (28 × 28)-pixels digit
with the loss function being the Euclidean distance
ignores the negative classes
forces capsules to learn features useful for reconstruction
(Sabour, Frosst and Hinton [2017]) 32
Decoder: 3 Fully Connected Layers
(Sabour, Frosst and Hinton [2017]) 33
Decoder: 3 Fully Connected Layers
Layer 4: from 16 × 10 input to 512 output, ReLU activations
(Sabour, Frosst and Hinton [2017]) 33
Decoder: 3 Fully Connected Layers
Layer 4: from 16 × 10 input to 512 output, ReLU activations
Layer 5: from 512 input to 1024 output, ReLU activations
(Sabour, Frosst and Hinton [2017]) 33
Decoder: 3 Fully Connected Layers
Layer 4: from 16 × 10 input to 512 output, ReLU activations
Layer 5: from 512 input to 1024 output, ReLU activations
Layer 6: from 1024 input to 784 output, sigmoid activations
(after reshaping it produces a (28 × 28)-pixels decoded image)
(Sabour, Frosst and Hinton [2017]) 33
Architecture: Summary
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/
Capsule-Neural-Network-Architecture-Capsule-Networks-Edureka.png 34
Experiments
MNIST Reconstructions (CapsNet, 3 routing iterations)
Label: 8 5 5 5
Prediction: 8 5 3 3
Reconstruction: 8 5 5 3
Input:
Output:
(Sabour, Frosst and Hinton [2017]) 35
Dimension Perturbations
one of the 16 dimensions, by intervals of 0.05 in the range
[−0.25, 0.25]:
(Sabour, Frosst and Hinton [2017]) 36
Dimension Perturbations
one of the 16 dimensions, by intervals of 0.05 in the range
[−0.25, 0.25]:
Interpretation Reconstructions after perturbing
“scale and thickness”
“localized part”
(Sabour, Frosst and Hinton [2017]) 36
Dimension Perturbations
one of the 16 dimensions, by intervals of 0.05 in the range
[−0.25, 0.25]:
Interpretation Reconstructions after perturbing
“scale and thickness”
“localized part”
“stroke thickness”
“localized skew”
(Sabour, Frosst and Hinton [2017]) 36
Dimension Perturbations
one of the 16 dimensions, by intervals of 0.05 in the range
[−0.25, 0.25]:
Interpretation Reconstructions after perturbing
“scale and thickness”
“localized part”
“stroke thickness”
“localized skew”
“width and translation”
“localized part”
(Sabour, Frosst and Hinton [2017]) 36
Dimension Perturbations: Latent Codes of 0 and 1
rows: DigitCaps dimensions
columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25}
https://github.com/XifengGuo/CapsNet-Keras 37
Dimension Perturbations: Latent Codes of 2 and 3
rows: DigitCaps dimensions
columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25}
https://github.com/XifengGuo/CapsNet-Keras 38
Dimension Perturbations: Latent Codes of 4 and 5
rows: DigitCaps dimensions
columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25}
https://github.com/XifengGuo/CapsNet-Keras 39
Dimension Perturbations: Latent Codes of 6 and 7
rows: DigitCaps dimensions
columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25}
https://github.com/XifengGuo/CapsNet-Keras 40
Dimension Perturbations: Latent Codes of 8 and 9
rows: DigitCaps dimensions
columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25}
https://github.com/XifengGuo/CapsNet-Keras 41
MultiMNIST Reconstructions (CapsNet, 3 routing iterations)
R:(2, 7) R:(6, 0) R:(6, 8) R:(7, 1)
L:(2, 7) L:(6, 0) L:(6, 8) L:(7, 1)
(Sabour, Frosst and Hinton [2017]) 42
MultiMNIST Reconstructions (CapsNet, 3 routing iterations)
R:(2, 7) R:(6, 0) R:(6, 8) R:(7, 1)
L:(2, 7) L:(6, 0) L:(6, 8) L:(7, 1)
*R:(5, 7) *R:(2, 3) R:(2, 8) R:P:(2, 7)
L:(5, 0) L:(4, 3) L:(2, 8) L:(2, 8)
(Sabour, Frosst and Hinton [2017]) 42
MultiMNIST Reconstructions (CapsNet, 3 routing iterations)
R:(8, 7) R:(9, 4) R:(9, 5) R:(8, 4)
L:(8, 7) L:(9, 4) L:(9, 5) L:(8, 4)
(Sabour, Frosst and Hinton [2017]) 43
MultiMNIST Reconstructions (CapsNet, 3 routing iterations)
R:(8, 7) R:(9, 4) R:(9, 5) R:(8, 4)
L:(8, 7) L:(9, 4) L:(9, 5) L:(8, 4)
*R:(0, 8) *R:(1, 6) R:(4, 9) R:P:(4, 0)
L:(1, 8) L:(7, 6) L:(4, 9) L:(4, 9)
(Sabour, Frosst and Hinton [2017]) 43
Results on MNIST and MultiMNIST
CapsNet classification test accuracy:
(Sabour, Frosst and Hinton [2017]) 44
Results on MNIST and MultiMNIST
CapsNet classification test accuracy:
Method Routing Reconstruction MNIST (%) MultiMNIST (%)
Baseline - - 0.39 8.1
CapsNet 1 no 0.34±0.032 -
CapsNet 1 yes 0.29±0.011 7.5
CapsNet 3 no 0.35±0.036 -
CapsNet 3 yes 0.25±0.005 5.2
(The MNIST average and standard deviation results are reported from 3 trials.)
(Sabour, Frosst and Hinton [2017]) 44
Results on Other Datasets
CIFAR10
10.6% test error
(Sabour, Frosst and Hinton [2017]) 45
Results on Other Datasets
CIFAR10
10.6% test error
ensemble of 7 models
(Sabour, Frosst and Hinton [2017]) 45
Results on Other Datasets
CIFAR10
10.6% test error
ensemble of 7 models
3 routing iterations
(Sabour, Frosst and Hinton [2017]) 45
Results on Other Datasets
CIFAR10
10.6% test error
ensemble of 7 models
3 routing iterations
24 × 24 patches of the image
(Sabour, Frosst and Hinton [2017]) 45
Results on Other Datasets
CIFAR10
10.6% test error
ensemble of 7 models
3 routing iterations
24 × 24 patches of the image
about what standard convolutional nets achieved when they
were first applied to CIFAR10 (Zeiler and Fergus [2013])
(Sabour, Frosst and Hinton [2017]) 45
Conclusion
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
now dynamic routing algorithm
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
now dynamic routing algorithm
one of the reasons: computers not powerful enough in the pre-GPU-based era
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
now dynamic routing algorithm
one of the reasons: computers not powerful enough in the pre-GPU-based era
capable of learning to achieve state-of-the art performance
by only using a fraction of the data compared to CNN
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
now dynamic routing algorithm
one of the reasons: computers not powerful enough in the pre-GPU-based era
capable of learning to achieve state-of-the art performance
by only using a fraction of the data compared to CNN
task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at
most).
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
now dynamic routing algorithm
one of the reasons: computers not powerful enough in the pre-GPU-based era
capable of learning to achieve state-of-the art performance
by only using a fraction of the data compared to CNN
task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at
most).
On the other hand, CNNs typically need tens of thousands of them.
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
now dynamic routing algorithm
one of the reasons: computers not powerful enough in the pre-GPU-based era
capable of learning to achieve state-of-the art performance
by only using a fraction of the data compared to CNN
task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at
most).
On the other hand, CNNs typically need tens of thousands of them.
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Benefits:
a new building block usable in deep learning to better model
hierarchical relationships
representations similar to scene graphs in computer graphics
no algorithm to implement and train a capsule network:
now dynamic routing algorithm
one of the reasons: computers not powerful enough in the pre-GPU-based era
capable of learning to achieve state-of-the art performance
by only using a fraction of the data compared to CNN
task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at
most).
On the other hand, CNNs typically need tens of thousands of them.
Downsides:
current implementations: much slower than other modern
deep learning models
https://medium.com/ai-theory-practice-business/
understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
Thank you!
Questions?
46
Backup Slides
Results on Other Datasets
smallNORB (LeCun et al. [2004])
2.7% test error
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
smallNORB (LeCun et al. [2004])
2.7% test error
96 × 96 stereo grey-scale images
resized to 48 × 48; during training processed random 32 × 32 crops; during test the central 32 × 32 patch
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
smallNORB (LeCun et al. [2004])
2.7% test error
96 × 96 stereo grey-scale images
resized to 48 × 48; during training processed random 32 × 32 crops; during test the central 32 × 32 patch
same CapsNet architecture as for MNIST
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
smallNORB (LeCun et al. [2004])
2.7% test error
96 × 96 stereo grey-scale images
resized to 48 × 48; during training processed random 32 × 32 crops; during test the central 32 × 32 patch
same CapsNet architecture as for MNIST
on-par with the state-of-the-art (Ciresan et al. [2011])
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
SVHN (Netzer et al. [2011])
4.3% test error
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
SVHN (Netzer et al. [2011])
4.3% test error
only 73257 images!
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
SVHN (Netzer et al. [2011])
4.3% test error
only 73257 images!
the number of first convolutional layer channels reduced to 64
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
SVHN (Netzer et al. [2011])
4.3% test error
only 73257 images!
the number of first convolutional layer channels reduced to 64
the primary capsule layer: to 16 6D-capsules
(Sabour, Frosst and Hinton [2017])
Results on Other Datasets
SVHN (Netzer et al. [2011])
4.3% test error
only 73257 images!
the number of first convolutional layer channels reduced to 64
the primary capsule layer: to 16 6D-capsules
final capsule layer: 8D-capsules
(Sabour, Frosst and Hinton [2017])
Further Reading i
References i

More Related Content

Similar to Dynamic Routing Between Capsules

Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecognIlyas CHAOUA
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
thesis_submitted
thesis_submittedthesis_submitted
thesis_submittedAlex Streit
 
VGGFace Transfer Learning and Siamese Network for Face Recognition
VGGFace Transfer Learning and Siamese Network for Face RecognitionVGGFace Transfer Learning and Siamese Network for Face Recognition
VGGFace Transfer Learning and Siamese Network for Face Recognitionijtsrd
 
CORE: Cognitive Organization for Requirements Elicitation
CORE: Cognitive Organization for Requirements ElicitationCORE: Cognitive Organization for Requirements Elicitation
CORE: Cognitive Organization for Requirements ElicitationScott M. Confer
 
UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...
UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...
UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...EmadfHABIB2
 
The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full reportJIEMS Akkalkuwa
 
Functional Glass v3 Value of Interpolation Chain
Functional Glass v3 Value of Interpolation ChainFunctional Glass v3 Value of Interpolation Chain
Functional Glass v3 Value of Interpolation ChainBrij Consulting, LLC
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationRoelof Pieters
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...Aalto University
 
Semantic_Visualisation_in_Design_Computing.pdf
Semantic_Visualisation_in_Design_Computing.pdfSemantic_Visualisation_in_Design_Computing.pdf
Semantic_Visualisation_in_Design_Computing.pdfEly Hernandez
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing NetworksLynn Cherny
 
SEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AISEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AIIRJET Journal
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
Facial Expression Recognition via Python
Facial Expression Recognition via PythonFacial Expression Recognition via Python
Facial Expression Recognition via PythonSaurav Gupta
 

Similar to Dynamic Routing Between Capsules (20)

Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
thesis_submitted
thesis_submittedthesis_submitted
thesis_submitted
 
VGGFace Transfer Learning and Siamese Network for Face Recognition
VGGFace Transfer Learning and Siamese Network for Face RecognitionVGGFace Transfer Learning and Siamese Network for Face Recognition
VGGFace Transfer Learning and Siamese Network for Face Recognition
 
CORE: Cognitive Organization for Requirements Elicitation
CORE: Cognitive Organization for Requirements ElicitationCORE: Cognitive Organization for Requirements Elicitation
CORE: Cognitive Organization for Requirements Elicitation
 
UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...
UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...
UPDATED (Version 1.0) Systems Neurology (the only objective is My CAREER, onl...
 
The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full report
 
Functional Glass v3 Value of Interpolation Chain
Functional Glass v3 Value of Interpolation ChainFunctional Glass v3 Value of Interpolation Chain
Functional Glass v3 Value of Interpolation Chain
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + Visualization
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
PointNet
PointNetPointNet
PointNet
 
Semantic_Visualisation_in_Design_Computing.pdf
Semantic_Visualisation_in_Design_Computing.pdfSemantic_Visualisation_in_Design_Computing.pdf
Semantic_Visualisation_in_Design_Computing.pdf
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
 
SEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AISEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AI
 
160713
160713160713
160713
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Facial Expression Recognition
Facial Expression RecognitionFacial Expression Recognition
Facial Expression Recognition
 
Facial Expression Recognition via Python
Facial Expression Recognition via PythonFacial Expression Recognition via Python
Facial Expression Recognition via Python
 

More from Karel Ha

transcript-master-studies-Karel-Ha
transcript-master-studies-Karel-Hatranscript-master-studies-Karel-Ha
transcript-master-studies-Karel-HaKarel Ha
 
Schrodinger poster 2020
Schrodinger poster 2020Schrodinger poster 2020
Schrodinger poster 2020Karel Ha
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
 
Solving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as PokerSolving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as PokerKarel Ha
 
transcript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Hatranscript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-HaKarel Ha
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
 
Real-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiReal-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiKarel Ha
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015Karel Ha
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation AxiomsKarel Ha
 
Oddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologiiOddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologiiKarel Ha
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game TheoryKarel Ha
 
Summer Student Programme
Summer Student ProgrammeSummer Student Programme
Summer Student ProgrammeKarel Ha
 
Summer @CERN
Summer @CERNSummer @CERN
Summer @CERNKarel Ha
 
Tape Storage and CRC Protection
Tape Storage and CRC ProtectionTape Storage and CRC Protection
Tape Storage and CRC ProtectionKarel Ha
 
Question Answering with Subgraph Embeddings
Question Answering with Subgraph EmbeddingsQuestion Answering with Subgraph Embeddings
Question Answering with Subgraph EmbeddingsKarel Ha
 

More from Karel Ha (17)

transcript-master-studies-Karel-Ha
transcript-master-studies-Karel-Hatranscript-master-studies-Karel-Ha
transcript-master-studies-Karel-Ha
 
Schrodinger poster 2020
Schrodinger poster 2020Schrodinger poster 2020
Schrodinger poster 2020
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
 
AlphaZero
AlphaZeroAlphaZero
AlphaZero
 
Solving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as PokerSolving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as Poker
 
transcript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Hatranscript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Ha
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
Real-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiReal-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/Phi
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation Axioms
 
Oddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologiiOddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologii
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game Theory
 
Summer Student Programme
Summer Student ProgrammeSummer Student Programme
Summer Student Programme
 
Summer @CERN
Summer @CERNSummer @CERN
Summer @CERN
 
Tape Storage and CRC Protection
Tape Storage and CRC ProtectionTape Storage and CRC Protection
Tape Storage and CRC Protection
 
Question Answering with Subgraph Embeddings
Question Answering with Subgraph EmbeddingsQuestion Answering with Subgraph Embeddings
Question Answering with Subgraph Embeddings
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Dynamic Routing Between Capsules

  • 1. Dynamic Routing Between Capsules by S. Sabour, N. Frosst and G. Hinton (NIPS 2017) presented by Karel Ha 27th March 2018 Pattern Recognition and Computer Vision Reading Group
  • 2. Outline Motivation Capsule Routing by an Agreement Capsule Network Experiments Conclusion 1
  • 4. The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster. G. Hinton http://condo.ca/wp-content/uploads/2017/03/ Vector-director-Institute-artificial-intelligence-Toronto-MaRS-Discovery-District-H ca_.jpg 1
  • 5. The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster. G. Hinton [...] it makes much more sense to represent a pose as a small matrix that converts a vector of positional coordinates relative to the viewer into positional coordinates relative to the shape itself. G. Hinton http://condo.ca/wp-content/uploads/2017/03/ Vector-director-Institute-artificial-intelligence-Toronto-MaRS-Discovery-District-H ca_.jpg 1
  • 6. Part-Whole Geometric Relationships “What is wrong with convolutional neural nets?” https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
  • 7. Part-Whole Geometric Relationships “What is wrong with convolutional neural nets?” To a CNN (with MaxPool)... https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
  • 8. Part-Whole Geometric Relationships “What is wrong with convolutional neural nets?” To a CNN (with MaxPool)... ...both pictures are similar, since they both contain similar elements. https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
  • 9. Part-Whole Geometric Relationships “What is wrong with convolutional neural nets?” To a CNN (with MaxPool)... ...both pictures are similar, since they both contain similar elements. ...a mere presence of objects can be a very strong indicator to consider a face in the image. https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
  • 10. Part-Whole Geometric Relationships “What is wrong with convolutional neural nets?” To a CNN (with MaxPool)... ...both pictures are similar, since they both contain similar elements. ...a mere presence of objects can be a very strong indicator to consider a face in the image. ...orientational and relative spatial relationships are not very important. https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 2
  • 11. Part-Whole Geometric Relationships Scene Graphs from Computer Graphics http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
  • 12. Part-Whole Geometric Relationships Scene Graphs from Computer Graphics ...takes into account relative positions of objects. http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
  • 13. Part-Whole Geometric Relationships Scene Graphs from Computer Graphics ...takes into account relative positions of objects. The internal representation in computer memory: http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
  • 14. Part-Whole Geometric Relationships Scene Graphs from Computer Graphics ...takes into account relative positions of objects. The internal representation in computer memory: a) arrays of geometrical objects http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
  • 15. Part-Whole Geometric Relationships Scene Graphs from Computer Graphics ...takes into account relative positions of objects. The internal representation in computer memory: a) arrays of geometrical objects b) matrices representing their relative positions and orientations http://math.hws.edu/graphicsbook/c2/scene-graph.png 3
  • 16. Part-Whole Geometric Relationships Inverse (Computer) Graphics https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/ CNN-Capsule-Networks-Edureka-442x300.png 4
  • 17. Part-Whole Geometric Relationships Inverse (Computer) Graphics Inverse graphics: https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/ CNN-Capsule-Networks-Edureka-442x300.png 4
  • 18. Part-Whole Geometric Relationships Inverse (Computer) Graphics Inverse graphics: from visual information received by eyes https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/ CNN-Capsule-Networks-Edureka-442x300.png 4
  • 19. Part-Whole Geometric Relationships Inverse (Computer) Graphics Inverse graphics: from visual information received by eyes deconstruct a hierarchical representation of the world around us https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/ CNN-Capsule-Networks-Edureka-442x300.png 4
  • 20. Part-Whole Geometric Relationships Inverse (Computer) Graphics Inverse graphics: from visual information received by eyes deconstruct a hierarchical representation of the world around us and try to match it with already learned patterns and relationships stored in the brain https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/ CNN-Capsule-Networks-Edureka-442x300.png 4
  • 21. Part-Whole Geometric Relationships Inverse (Computer) Graphics Inverse graphics: from visual information received by eyes deconstruct a hierarchical representation of the world around us and try to match it with already learned patterns and relationships stored in the brain relationships between 3D objects using a “pose” (= translation plus rotation) https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2017/12/ CNN-Capsule-Networks-Edureka-442x300.png 4
  • 22. Pose Equivariance and the Viewing Angle https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
  • 23. Pose Equivariance and the Viewing Angle We have probably never seen these exact pictures, but we can still immediately recognize the object in it... https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
  • 24. Pose Equivariance and the Viewing Angle We have probably never seen these exact pictures, but we can still immediately recognize the object in it... internal representation in the brain: independent of the viewing angle https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
  • 25. Pose Equivariance and the Viewing Angle We have probably never seen these exact pictures, but we can still immediately recognize the object in it... internal representation in the brain: independent of the viewing angle quite hard for a CNN: no built-in understanding of 3D space https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
  • 26. Pose Equivariance and the Viewing Angle We have probably never seen these exact pictures, but we can still immediately recognize the object in it... internal representation in the brain: independent of the viewing angle quite hard for a CNN: no built-in understanding of 3D space much easier for a CapsNet: these relationships are explicitly modeled https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 5
  • 27. Routing by an Agreement: High-Dimensional Coincidence https://www.oreilly.com/ideas/introducing-capsule-networks 6
  • 28. Routing by an Agreement: High-Dimensional Coincidence https://www.oreilly.com/ideas/introducing-capsule-networks 6
  • 29. Routing by an Agreement: High-Dimensional Coincidence https://www.oreilly.com/ideas/introducing-capsule-networks 6
  • 30. Routing by an Agreement: Illustrative Overview https://www.oreilly.com/ideas/introducing-capsule-networks 7
  • 31. Routing by an Agreement: Illustrative Overview https://www.oreilly.com/ideas/introducing-capsule-networks 7
  • 32. Routing by an Agreement: Recognizing Ambiguity in Images https://www.oreilly.com/ideas/introducing-capsule-networks 8
  • 33. Routing: Lower Levels Voting for Higher-Level Feature (Sabour, Frosst and Hinton [2017]) 9
  • 34. How to do it? (mathematically) 9
  • 36. What Is a Capsule? a group of neurons that: perform some complicated internal computations on their inputs https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
  • 37. What Is a Capsule? a group of neurons that: perform some complicated internal computations on their inputs encapsulate their results into a small vector of highly informative outputs https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
  • 38. What Is a Capsule? a group of neurons that: perform some complicated internal computations on their inputs encapsulate their results into a small vector of highly informative outputs recognize an implicitly defined visual entity (over a limited domain of viewing conditions and deformations) https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
  • 39. What Is a Capsule? a group of neurons that: perform some complicated internal computations on their inputs encapsulate their results into a small vector of highly informative outputs recognize an implicitly defined visual entity (over a limited domain of viewing conditions and deformations) encode the probability of the entity being present https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
  • 40. What Is a Capsule? a group of neurons that: perform some complicated internal computations on their inputs encapsulate their results into a small vector of highly informative outputs recognize an implicitly defined visual entity (over a limited domain of viewing conditions and deformations) encode the probability of the entity being present encode instantiation parameters pose, lighting, deformation relative to entity’s (implicitly defined) canonical version https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 10
  • 41. Output As A Vector https://www.oreilly.com/ideas/introducing-capsule-networks 11
  • 42. Output As A Vector probability of presence: locally invariant E.g. if 0, 3, 2, 0, 0 leads to 0, 1, 0, 0, then 0, 0, 3, 2, 0 should also lead to 0, 1, 0, 0. https://www.oreilly.com/ideas/introducing-capsule-networks 11
  • 43. Output As A Vector probability of presence: locally invariant E.g. if 0, 3, 2, 0, 0 leads to 0, 1, 0, 0, then 0, 0, 3, 2, 0 should also lead to 0, 1, 0, 0. instantiation parameters: equivariant E.g. if 0, 3, 2, 0, 0 leads to 0, 1, 0, 0, then 0, 0, 3, 2, 0 might lead to 0, 0, 1, 0. https://www.oreilly.com/ideas/introducing-capsule-networks 11
  • 44. Previous Version of Capsules for illustration taken from “Transforming Auto-Encoders” (Hinton, Krizhevsky and Wang [2011]) (Hinton, Krizhevsky and Wang [2011]) 12
  • 45. Previous Version of Capsules for illustration taken from “Transforming Auto-Encoders” (Hinton, Krizhevsky and Wang [2011]) three capsules of a transforming auto-encoder (that models translation) (Hinton, Krizhevsky and Wang [2011]) 12
  • 48. Capsule’s Vector Flow Note: no bias (included in affine transformation matrices Wij ’s) https://cdn-images-1.medium.com/max/1250/1*GbmQ2X9NQoGuJ1M-EOD67g.png 13
  • 50. Routing by an Agreement
  • 51. Capsule Schema with Routing (Sabour, Frosst and Hinton [2017]) 14
  • 52. Routing Softmax cij = exp(bij ) k exp(bik) (1) (Sabour, Frosst and Hinton [2017]) 15
  • 53. Prediction Vectors ˆuj|i = Wij ui (2) (Sabour, Frosst and Hinton [2017]) 16
  • 54. Total Input sj = i cij ˆuj|i (3) (Sabour, Frosst and Hinton [2017]) 17
  • 55. Squashing: (vector) non-linearity vj = ||sj ||2 1 + ||sj ||2 sj ||sj || (4) (Sabour, Frosst and Hinton [2017]) 18
  • 56. Squashing: (vector) non-linearity vj = ||sj ||2 1 + ||sj ||2 sj ||sj || (4) (Sabour, Frosst and Hinton [2017]) 18
  • 57. Squashing: Plot for 1-D input https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 19
  • 58. Squashing: Plot for 1-D input https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 19
  • 59. Routing Algorithm (Sabour, Frosst and Hinton [2017]) 20
  • 60. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) (Sabour, Frosst and Hinton [2017]) 20
  • 61. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0. (Sabour, Frosst and Hinton [2017]) 20
  • 62. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0. 3: for r iterations do (Sabour, Frosst and Hinton [2017]) 20
  • 63. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0. 3: for r iterations do 4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1 (Sabour, Frosst and Hinton [2017]) 20
  • 64. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0. 3: for r iterations do 4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1 5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3 (Sabour, Frosst and Hinton [2017]) 20
  • 65. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0. 3: for r iterations do 4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1 5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3 6: for all capsule j in layer (l + 1): vj ← squash(sj ) squash from Eq. 4 (Sabour, Frosst and Hinton [2017]) 20
  • 66. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0. 3: for r iterations do 4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1 5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3 6: for all capsule j in layer (l + 1): vj ← squash(sj ) squash from Eq. 4 7: for all capsule i in layer l and capsule j in layer (l + 1): bij ← bij + ˆuj|i .vj (Sabour, Frosst and Hinton [2017]) 20
  • 67. Routing Algorithm Algorithm Dynamic Routing between Capsules 1: procedure Routing(ˆuj|i , r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0. 3: for r iterations do 4: for all capsule i in layer l: ci ← softmax(bi ) softmax from Eq. 1 5: for all capsule j in layer (l + 1): sj ← i cij ˆuj|i total input from Eq. 3 6: for all capsule j in layer (l + 1): vj ← squash(sj ) squash from Eq. 4 7: for all capsule i in layer l and capsule j in layer (l + 1): bij ← bij + ˆuj|i .vj return vj (Sabour, Frosst and Hinton [2017]) 20
  • 69. Average Change of Each Routing Logit bij (by each routing iteration during training) (Sabour, Frosst and Hinton [2017]) 21
  • 70. Average Change of Each Routing Logit bij (by each routing iteration during training) (Sabour, Frosst and Hinton [2017]) 21
  • 71. Log Scale of Final Differences (Sabour, Frosst and Hinton [2017]) 22
  • 72. Training Loss of CapsNet on CIFAR10 (batch size of 128) The CapsNet with 3 routing iterations optimizes the loss faster and converges to a lower loss at the end. (Sabour, Frosst and Hinton [2017]) 23
  • 73. Training Loss of CapsNet on CIFAR10 (batch size of 128) The CapsNet with 3 routing iterations optimizes the loss faster and converges to a lower loss at the end. (Sabour, Frosst and Hinton [2017]) 23
  • 74. Training Loss of CapsNet on CIFAR10 (batch size of 128) The CapsNet with 3 routing iterations optimizes the loss faster and converges to a lower loss at the end. (Sabour, Frosst and Hinton [2017]) 23
  • 78. Encoder: CapsNet with 3 Layers (Sabour, Frosst and Hinton [2017]) 25
  • 79. Encoder: CapsNet with 3 Layers input: 28 by 28 MNIST digit image (Sabour, Frosst and Hinton [2017]) 25
  • 80. Encoder: CapsNet with 3 Layers input: 28 by 28 MNIST digit image output: 16-dimensional vector of instantiation parameters (Sabour, Frosst and Hinton [2017]) 25
  • 81. Encoder Layer 1: (Standard) Convolutional Layer (Sabour, Frosst and Hinton [2017]) 26
  • 82. Encoder Layer 1: (Standard) Convolutional Layer input: 28 × 28 image (one color channel) (Sabour, Frosst and Hinton [2017]) 26
  • 83. Encoder Layer 1: (Standard) Convolutional Layer input: 28 × 28 image (one color channel) output: 20 × 20 × 256 (Sabour, Frosst and Hinton [2017]) 26
  • 84. Encoder Layer 1: (Standard) Convolutional Layer input: 28 × 28 image (one color channel) output: 20 × 20 × 256 256 kernels with size of 9 × 9 × 1 (Sabour, Frosst and Hinton [2017]) 26
  • 85. Encoder Layer 1: (Standard) Convolutional Layer input: 28 × 28 image (one color channel) output: 20 × 20 × 256 256 kernels with size of 9 × 9 × 1 stride 1 (Sabour, Frosst and Hinton [2017]) 26
  • 86. Encoder Layer 1: (Standard) Convolutional Layer input: 28 × 28 image (one color channel) output: 20 × 20 × 256 256 kernels with size of 9 × 9 × 1 stride 1 ReLU activation (Sabour, Frosst and Hinton [2017]) 26
  • 87. Encoder Layer 2: PrimaryCaps (Sabour, Frosst and Hinton [2017]) 27
  • 88. Encoder Layer 2: PrimaryCaps input: 20 × 20 × 256 basic features detected by the convolutional layer (Sabour, Frosst and Hinton [2017]) 27
  • 89. Encoder Layer 2: PrimaryCaps input: 20 × 20 × 256 basic features detected by the convolutional layer output: 6 × 6 × 8 × 32 vector (activation) outputs of primary capsules (Sabour, Frosst and Hinton [2017]) 27
  • 90. Encoder Layer 2: PrimaryCaps input: 20 × 20 × 256 basic features detected by the convolutional layer output: 6 × 6 × 8 × 32 vector (activation) outputs of primary capsules 32 primary capsules (Sabour, Frosst and Hinton [2017]) 27
  • 91. Encoder Layer 2: PrimaryCaps input: 20 × 20 × 256 basic features detected by the convolutional layer output: 6 × 6 × 8 × 32 vector (activation) outputs of primary capsules 32 primary capsules each applies eight 9 × 9 × 256 convolutional kernels to the 20 × 20 × 256 input to produce 6 × 6 × 8 output (Sabour, Frosst and Hinton [2017]) 27
  • 92. Encoder Layer 3: DigitCaps (Sabour, Frosst and Hinton [2017]) 28
  • 93. Encoder Layer 3: DigitCaps input: 6 × 6 × 8 × 32 (6 × 6 × 32)-many 8-dimensional vector activations (Sabour, Frosst and Hinton [2017]) 28
  • 94. Encoder Layer 3: DigitCaps input: 6 × 6 × 8 × 32 (6 × 6 × 32)-many 8-dimensional vector activations output: 16 × 10 (Sabour, Frosst and Hinton [2017]) 28
  • 95. Encoder Layer 3: DigitCaps input: 6 × 6 × 8 × 32 (6 × 6 × 32)-many 8-dimensional vector activations output: 16 × 10 10 digit capsules (Sabour, Frosst and Hinton [2017]) 28
  • 96. Encoder Layer 3: DigitCaps input: 6 × 6 × 8 × 32 (6 × 6 × 32)-many 8-dimensional vector activations output: 16 × 10 10 digit capsules input vectors gets their own 8 × 16 weight matrix Wij that maps 8-dimensional input space to the 16-dimensional capsule output space (Sabour, Frosst and Hinton [2017]) 28
  • 97. Margin Loss for a Digit Existence https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce 29
  • 98. Margin Loss to Train the Whole Encoder In other words, each DigitCap c has loss: (Sabour, Frosst and Hinton [2017]) 30
  • 99. Margin Loss to Train the Whole Encoder In other words, each DigitCap c has loss: Lc = max(0, m+ − ||vc||)2 iff a digit of class c is present, λ max(0, ||vc|| − m− )2 otherwise. (Sabour, Frosst and Hinton [2017]) 30
  • 100. Margin Loss to Train the Whole Encoder In other words, each DigitCap c has loss: Lc = max(0, m+ − ||vc||)2 iff a digit of class c is present, λ max(0, ||vc|| − m− )2 otherwise. m+ = 0.9: The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9. (Sabour, Frosst and Hinton [2017]) 30
  • 101. Margin Loss to Train the Whole Encoder In other words, each DigitCap c has loss: Lc = max(0, m+ − ||vc||)2 iff a digit of class c is present, λ max(0, ||vc|| − m− )2 otherwise. m+ = 0.9: The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9. m− = 0.1: The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1. (Sabour, Frosst and Hinton [2017]) 30
  • 102. Margin Loss to Train the Whole Encoder In other words, each DigitCap c has loss: Lc = max(0, m+ − ||vc||)2 iff a digit of class c is present, λ max(0, ||vc|| − m− )2 otherwise. m+ = 0.9: The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9. m− = 0.1: The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1. λ = 0.5 is down-weighting of the loss for absent digit classes. It stops the initial learning from shrinking the lengths of the activity vectors. (Sabour, Frosst and Hinton [2017]) 30
  • 103. Margin Loss to Train the Whole Encoder In other words, each DigitCap c has loss: Lc = max(0, m+ − ||vc||)2 iff a digit of class c is present, λ max(0, ||vc|| − m− )2 otherwise. m+ = 0.9: The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9. m− = 0.1: The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1. λ = 0.5 is down-weighting of the loss for absent digit classes. It stops the initial learning from shrinking the lengths of the activity vectors. Squares? Because there are L2 norms in the loss function? (Sabour, Frosst and Hinton [2017]) 30
  • 104. Margin Loss to Train the Whole Encoder In other words, each DigitCap c has loss: Lc = max(0, m+ − ||vc||)2 iff a digit of class c is present, λ max(0, ||vc|| − m− )2 otherwise. m+ = 0.9: The loss is 0 iff the correct DigitCap predicts the correct label with probability ≥ 0.9. m− = 0.1: The loss is 0 iff the mismatching DigitCap predicts an incorrect label with probability ≤ 0.1. λ = 0.5 is down-weighting of the loss for absent digit classes. It stops the initial learning from shrinking the lengths of the activity vectors. Squares? Because there are L2 norms in the loss function? The total loss is the sum of the losses of all digit capsules. (Sabour, Frosst and Hinton [2017]) 30
  • 105. Margin Loss Function Value for Positive and for Negative Class For the correct DigitCap, the loss is 0 iff it predicts the correct label with probability ≥ 0.9. For the mismatching DigitCap, the loss is 0 iff it predicts an incorrect label with probability ≤ 0.1. https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce 31
  • 106. Margin Loss Function Value for Positive and for Negative Class For the correct DigitCap, the loss is 0 iff it predicts the correct label with probability ≥ 0.9. For the mismatching DigitCap, the loss is 0 iff it predicts an incorrect label with probability ≤ 0.1. https://medium.com/@pechyonkin/part-iv-capsnet-architecture-6a64422f7dce 31
  • 107. Decoder: Regularization of CapsNets (Sabour, Frosst and Hinton [2017]) 32
  • 108. Decoder: Regularization of CapsNets Decoder is used for regularization: decodes input from DigitCaps to recreate an image of a (28 × 28)-pixels digit (Sabour, Frosst and Hinton [2017]) 32
  • 109. Decoder: Regularization of CapsNets Decoder is used for regularization: decodes input from DigitCaps to recreate an image of a (28 × 28)-pixels digit with the loss function being the Euclidean distance (Sabour, Frosst and Hinton [2017]) 32
  • 110. Decoder: Regularization of CapsNets Decoder is used for regularization: decodes input from DigitCaps to recreate an image of a (28 × 28)-pixels digit with the loss function being the Euclidean distance ignores the negative classes (Sabour, Frosst and Hinton [2017]) 32
  • 111. Decoder: Regularization of CapsNets Decoder is used for regularization: decodes input from DigitCaps to recreate an image of a (28 × 28)-pixels digit with the loss function being the Euclidean distance ignores the negative classes forces capsules to learn features useful for reconstruction (Sabour, Frosst and Hinton [2017]) 32
  • 112. Decoder: 3 Fully Connected Layers (Sabour, Frosst and Hinton [2017]) 33
  • 113. Decoder: 3 Fully Connected Layers Layer 4: from 16 × 10 input to 512 output, ReLU activations (Sabour, Frosst and Hinton [2017]) 33
  • 114. Decoder: 3 Fully Connected Layers Layer 4: from 16 × 10 input to 512 output, ReLU activations Layer 5: from 512 input to 1024 output, ReLU activations (Sabour, Frosst and Hinton [2017]) 33
  • 115. Decoder: 3 Fully Connected Layers Layer 4: from 16 × 10 input to 512 output, ReLU activations Layer 5: from 512 input to 1024 output, ReLU activations Layer 6: from 1024 input to 784 output, sigmoid activations (after reshaping it produces a (28 × 28)-pixels decoded image) (Sabour, Frosst and Hinton [2017]) 33
  • 118. MNIST Reconstructions (CapsNet, 3 routing iterations) Label: 8 5 5 5 Prediction: 8 5 3 3 Reconstruction: 8 5 5 3 Input: Output: (Sabour, Frosst and Hinton [2017]) 35
  • 119. Dimension Perturbations one of the 16 dimensions, by intervals of 0.05 in the range [−0.25, 0.25]: (Sabour, Frosst and Hinton [2017]) 36
  • 120. Dimension Perturbations one of the 16 dimensions, by intervals of 0.05 in the range [−0.25, 0.25]: Interpretation Reconstructions after perturbing “scale and thickness” “localized part” (Sabour, Frosst and Hinton [2017]) 36
  • 121. Dimension Perturbations one of the 16 dimensions, by intervals of 0.05 in the range [−0.25, 0.25]: Interpretation Reconstructions after perturbing “scale and thickness” “localized part” “stroke thickness” “localized skew” (Sabour, Frosst and Hinton [2017]) 36
  • 122. Dimension Perturbations one of the 16 dimensions, by intervals of 0.05 in the range [−0.25, 0.25]: Interpretation Reconstructions after perturbing “scale and thickness” “localized part” “stroke thickness” “localized skew” “width and translation” “localized part” (Sabour, Frosst and Hinton [2017]) 36
  • 123. Dimension Perturbations: Latent Codes of 0 and 1 rows: DigitCaps dimensions columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25} https://github.com/XifengGuo/CapsNet-Keras 37
  • 124. Dimension Perturbations: Latent Codes of 2 and 3 rows: DigitCaps dimensions columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25} https://github.com/XifengGuo/CapsNet-Keras 38
  • 125. Dimension Perturbations: Latent Codes of 4 and 5 rows: DigitCaps dimensions columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25} https://github.com/XifengGuo/CapsNet-Keras 39
  • 126. Dimension Perturbations: Latent Codes of 6 and 7 rows: DigitCaps dimensions columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25} https://github.com/XifengGuo/CapsNet-Keras 40
  • 127. Dimension Perturbations: Latent Codes of 8 and 9 rows: DigitCaps dimensions columns (from left to right): + {−0.25, −0.2, −0.15, −0.1, −0.05, 0, 0.05, 0.1, 0.15, 0.2, 0.25} https://github.com/XifengGuo/CapsNet-Keras 41
  • 128. MultiMNIST Reconstructions (CapsNet, 3 routing iterations) R:(2, 7) R:(6, 0) R:(6, 8) R:(7, 1) L:(2, 7) L:(6, 0) L:(6, 8) L:(7, 1) (Sabour, Frosst and Hinton [2017]) 42
  • 129. MultiMNIST Reconstructions (CapsNet, 3 routing iterations) R:(2, 7) R:(6, 0) R:(6, 8) R:(7, 1) L:(2, 7) L:(6, 0) L:(6, 8) L:(7, 1) *R:(5, 7) *R:(2, 3) R:(2, 8) R:P:(2, 7) L:(5, 0) L:(4, 3) L:(2, 8) L:(2, 8) (Sabour, Frosst and Hinton [2017]) 42
  • 130. MultiMNIST Reconstructions (CapsNet, 3 routing iterations) R:(8, 7) R:(9, 4) R:(9, 5) R:(8, 4) L:(8, 7) L:(9, 4) L:(9, 5) L:(8, 4) (Sabour, Frosst and Hinton [2017]) 43
  • 131. MultiMNIST Reconstructions (CapsNet, 3 routing iterations) R:(8, 7) R:(9, 4) R:(9, 5) R:(8, 4) L:(8, 7) L:(9, 4) L:(9, 5) L:(8, 4) *R:(0, 8) *R:(1, 6) R:(4, 9) R:P:(4, 0) L:(1, 8) L:(7, 6) L:(4, 9) L:(4, 9) (Sabour, Frosst and Hinton [2017]) 43
  • 132. Results on MNIST and MultiMNIST CapsNet classification test accuracy: (Sabour, Frosst and Hinton [2017]) 44
  • 133. Results on MNIST and MultiMNIST CapsNet classification test accuracy: Method Routing Reconstruction MNIST (%) MultiMNIST (%) Baseline - - 0.39 8.1 CapsNet 1 no 0.34±0.032 - CapsNet 1 yes 0.29±0.011 7.5 CapsNet 3 no 0.35±0.036 - CapsNet 3 yes 0.25±0.005 5.2 (The MNIST average and standard deviation results are reported from 3 trials.) (Sabour, Frosst and Hinton [2017]) 44
  • 134. Results on Other Datasets CIFAR10 10.6% test error (Sabour, Frosst and Hinton [2017]) 45
  • 135. Results on Other Datasets CIFAR10 10.6% test error ensemble of 7 models (Sabour, Frosst and Hinton [2017]) 45
  • 136. Results on Other Datasets CIFAR10 10.6% test error ensemble of 7 models 3 routing iterations (Sabour, Frosst and Hinton [2017]) 45
  • 137. Results on Other Datasets CIFAR10 10.6% test error ensemble of 7 models 3 routing iterations 24 × 24 patches of the image (Sabour, Frosst and Hinton [2017]) 45
  • 138. Results on Other Datasets CIFAR10 10.6% test error ensemble of 7 models 3 routing iterations 24 × 24 patches of the image about what standard convolutional nets achieved when they were first applied to CIFAR10 (Zeiler and Fergus [2013]) (Sabour, Frosst and Hinton [2017]) 45
  • 140. Benefits: a new building block usable in deep learning to better model hierarchical relationships https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 141. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 142. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 143. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: now dynamic routing algorithm https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 144. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: now dynamic routing algorithm one of the reasons: computers not powerful enough in the pre-GPU-based era https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 145. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: now dynamic routing algorithm one of the reasons: computers not powerful enough in the pre-GPU-based era capable of learning to achieve state-of-the art performance by only using a fraction of the data compared to CNN https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 146. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: now dynamic routing algorithm one of the reasons: computers not powerful enough in the pre-GPU-based era capable of learning to achieve state-of-the art performance by only using a fraction of the data compared to CNN task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at most). https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 147. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: now dynamic routing algorithm one of the reasons: computers not powerful enough in the pre-GPU-based era capable of learning to achieve state-of-the art performance by only using a fraction of the data compared to CNN task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at most). On the other hand, CNNs typically need tens of thousands of them. https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 148. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: now dynamic routing algorithm one of the reasons: computers not powerful enough in the pre-GPU-based era capable of learning to achieve state-of-the art performance by only using a fraction of the data compared to CNN task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at most). On the other hand, CNNs typically need tens of thousands of them. https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 149. Benefits: a new building block usable in deep learning to better model hierarchical relationships representations similar to scene graphs in computer graphics no algorithm to implement and train a capsule network: now dynamic routing algorithm one of the reasons: computers not powerful enough in the pre-GPU-based era capable of learning to achieve state-of-the art performance by only using a fraction of the data compared to CNN task of telling digits apart: the human brain needs a couple of dozens of examples (hundreds at most). On the other hand, CNNs typically need tens of thousands of them. Downsides: current implementations: much slower than other modern deep learning models https://medium.com/ai-theory-practice-business/ understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b 46
  • 152. Results on Other Datasets smallNORB (LeCun et al. [2004]) 2.7% test error (Sabour, Frosst and Hinton [2017])
  • 153. Results on Other Datasets smallNORB (LeCun et al. [2004]) 2.7% test error 96 × 96 stereo grey-scale images resized to 48 × 48; during training processed random 32 × 32 crops; during test the central 32 × 32 patch (Sabour, Frosst and Hinton [2017])
  • 154. Results on Other Datasets smallNORB (LeCun et al. [2004]) 2.7% test error 96 × 96 stereo grey-scale images resized to 48 × 48; during training processed random 32 × 32 crops; during test the central 32 × 32 patch same CapsNet architecture as for MNIST (Sabour, Frosst and Hinton [2017])
  • 155. Results on Other Datasets smallNORB (LeCun et al. [2004]) 2.7% test error 96 × 96 stereo grey-scale images resized to 48 × 48; during training processed random 32 × 32 crops; during test the central 32 × 32 patch same CapsNet architecture as for MNIST on-par with the state-of-the-art (Ciresan et al. [2011]) (Sabour, Frosst and Hinton [2017])
  • 156. Results on Other Datasets SVHN (Netzer et al. [2011]) 4.3% test error (Sabour, Frosst and Hinton [2017])
  • 157. Results on Other Datasets SVHN (Netzer et al. [2011]) 4.3% test error only 73257 images! (Sabour, Frosst and Hinton [2017])
  • 158. Results on Other Datasets SVHN (Netzer et al. [2011]) 4.3% test error only 73257 images! the number of first convolutional layer channels reduced to 64 (Sabour, Frosst and Hinton [2017])
  • 159. Results on Other Datasets SVHN (Netzer et al. [2011]) 4.3% test error only 73257 images! the number of first convolutional layer channels reduced to 64 the primary capsule layer: to 16 6D-capsules (Sabour, Frosst and Hinton [2017])
  • 160. Results on Other Datasets SVHN (Netzer et al. [2011]) 4.3% test error only 73257 images! the number of first convolutional layer channels reduced to 64 the primary capsule layer: to 16 6D-capsules final capsule layer: 8D-capsules (Sabour, Frosst and Hinton [2017])