Capsule networks

Capsule networks
Intelligent Control and Systems Laboratory
J.hyeon Park
2018-08-07
SNU AI Study

Covering range
• Original capsule network
Sabour 2017. "Dynamic routing between capsules."
• EM-routing
Hinton 2018. "Matrix capsules with EM routing."
• Unsupervised training
Rawlinson 2018. "Sparse unsupervised capsules generalize better.”
• Stable training
- Zhao 2018. "Investigating Capsule Networks with Dynamic Routing for Text Classification."

Traditional CNN : Conv + Pooling
http://cs231n.github.io/convolutional-networks/

• Conv extracts features.

• Pool abstracts features

• Pool abstracts feature
• without spatial relationship!

Drawbacks of traditional CNN
Understanding Capsule Networks — AI’s Alluring New Architecture

Drawbacks of traditional CNN
Jaeyun’s Blog : 캡슐 네트워크(캡스넷 - Capsnet) – 1

Capsules
• A capsule is a vector

Capsules
• Each capsule represents an entity (nose, eye ...)
capsule1
(faceline)
capsule2
(left eye)
capsule3
(right eye)
capsule4
(nose)
capsule5
(mouse)

Capsules
• The direction of the capsule represents the property of entity
capsule3
(right eye)
...
various status of eyes...

Capsules
• The norm of the capsule represents the presence of entity
capsule3
(right eye)
∥ 𝑐𝑝𝑎𝑠𝑢𝑙𝑒3 ∥ is logit of the presence of eye

Capsules
• The norm of the capsule represents the presence of entity
• The lower capsules activate
the higher capsules according to its spatial hierarchy
face
capsule
face
capsule
X

Why I study a capsule?
A task visually demonstrated by human Robot will learn the task

Object segment by a region proposal network
(we need object-centric information for a robot)

Feature extraction by an Alexnet pre-trained with imagenet

Object spatial relationship determines task features
task features ?

Capsule network
28× 28
Sabour 2017. "Dynamic routing between capsules."

Capsule network
convolutional kernel
kernel = 9 × 9 × 256
strides = 1
28× 28

Capsule network
convolutional kernel
kernel = 9 × 9 × 256
strides = 2
28× 28

Capsule network
28× 28
1
1
8
primary capsule 𝑢𝑖
(dim = 8, total number = 6*6*32=1152)
1
16
digit capsule 𝑣𝑗
(dim = 16, total number = 10)

Capsule network
28× 28
prediction
𝑢𝑗|𝑖 = 𝑊𝑖𝑗 𝑢𝑖
𝑊 =[1152, 10, 8, 16]
Training parameter
(1) : # of PrimaryCaps
(2) : # of DigitCpas
(3) : Dim of PrimarCaps
(4) : Dim of DigitCaps
(1) (2) (3) (4)
𝑊𝑖𝑗 = [8,16]
pick 𝑖, 𝑗 components
1
1
8
(dim = 8, total number = 6*6*32=1152)
1
16

Capsule network
28× 28
dynamic routing
𝑣𝑗 = 𝑑𝑦𝑛𝑎𝑚𝑖𝑐 𝑟𝑜𝑢𝑡𝑖𝑛𝑔( 𝑢𝑗|𝑖)
1
1
8
(dim = 8, total number = 6*6*32=1152)
1
16

Capsule network
• Dynamic routing
Charles Martin, Capsule Networks (slide share)

Capsule network
Loss = margin loss + reconstruction loss
• margin loss :
• reconstruction loss :
𝑇𝑘 = 1 if the label is 𝑘 otherwise 0
𝑚+ : target capsule length if activated 𝑚−: target capsule length if not activated

EM routing
Hinton 2018. "Matrix capsules with EM routing."
(𝑀: 4 × 4)
(𝑎: scalar)
prediction : 𝑉𝑖𝑗 = 𝑀𝑖 𝑊𝑖𝑗 (𝑊𝑖𝑗 : 4×4 trainable parameters connecting between each capsule 𝑖 and 𝑗)
routing 𝑉𝑖𝑗 : EM-routing

EM routing
(gif)
Jonathan hui : Understanding Matrix capsules with EM Routing (Based on Hinton's Capsule Networks)

EM routing
• 4 × 4 Gaussian clusters
= 𝜇ℎ , 𝜎ℎ (ℎ = 1, … , 16)

EM routing
• 4 × 4 Gaussian clusters
= 𝜇ℎ , 𝜎ℎ (ℎ = 1, … , 16)
• For each Gaussian components ℎ of ,
computes the probability of 𝑣𝑖𝑗
ℎ
belonging to capsule 𝑗′
𝑠 Gaussian model
𝑝𝑖|𝑗
ℎ
=
1
2𝜋 𝜎𝑗
ℎ 2
exp −
𝑉𝑖𝑗
ℎ
− 𝜇 𝑗
ℎ 2
2 𝜎𝑗
ℎ 2

EM routing
• cost : the lower the cost, the more likely a capsule will be activated
𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
= − ln 𝑃𝑖|𝑗
ℎ
𝑐𝑜𝑠𝑡𝑗
ℎ
= 𝑖 𝑅𝑖𝑗 𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
where 𝑅𝑖𝑗 : assignment probability (the amount of data assigned to 𝑗)

EM routing
• cost : the lower the cost, the more likely a capsule will be activated
ℎ
ℎ
𝑐𝑜𝑠𝑡𝑗
ℎ
= 𝑖 𝑅𝑖𝑗 𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
where 𝑅𝑖𝑗 : assignment probability (the amount of data assigned to 𝑗)
• activation :
𝑎𝑗 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝜆 𝑏𝑗 − ℎ 𝑐𝑜𝑠𝑡𝑗
ℎ

EM routing
• E-step :
determine 𝑅𝑖𝑗
• M-step :
recalculate 𝜇 𝑗, 𝜎𝑗, 𝑎𝑗 to reduce cost

EM routing
• E-step :
𝑝𝑗 =
1
2𝜋 𝜎𝑗
ℎ
2
exp − ℎ
𝐻
𝑉𝑖𝑗
ℎ
−𝜇 𝑗
ℎ
2
2 𝜎𝑗
ℎ
2
𝑅𝑖𝑗 =
𝑎 𝑗 𝑝 𝑗
𝑎 𝑘 𝑝 𝑘

EM routing
• M-step
𝑅𝑖𝑗 = 𝑅𝑖𝑗 ∗ 𝑎𝑖
𝜇 𝑗
ℎ
=
𝑖 𝑅𝑖𝑗 𝑉𝑖𝑗
ℎ
𝑖 𝑅𝑖𝑗
𝜎𝑗
ℎ 2
=
𝑖 𝑅𝑖𝑗 𝑉𝑖𝑗
ℎ
− 𝜇 𝑗
ℎ 2
𝑖 𝑅𝑖𝑗
ℎ
ℎ
𝑎𝑗 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝜆 𝑏𝑗 − ℎ 𝑐𝑜𝑠𝑡𝑗
ℎ

Unsupervised training
Rawlinson 2018. "Sparse unsupervised capsules generalize better.“

• Add sparsity to capsule
𝜓𝑗𝑘 : weight connecting capsules
𝑔𝑗 : boosting value
𝑟𝑗𝑘 : activation raking of j-th capsule
𝑚𝑗𝑘 : normalized ranking
𝑣 : original capsule
𝑣′ : sparsity-added capsule

• Count the # of activation for each capsule
𝑟𝑗𝑘 : activation raking of j-th capsule of k-th data
K : batch size 𝐽 : number of capsule
𝜖𝑗 : count of activation of j-th capsule
𝜇 𝑗 : moving average of 𝜖𝑗

• Boost capsule based on the count
𝑑 : boosting step size
𝑔𝑗 : boosting value
𝜇 𝑚𝑖𝑛, 𝜇 𝑚𝑎𝑥 : target frequency of activation

Stable training
Zhao 2018. "Investigating Capsule Networks with Dynamic Routing for Text
Classification."

Capsule networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Capsule networks

Similar to Capsule networks (20)

Recently uploaded

Recently uploaded (20)

Capsule networks