SlideShare a Scribd company logo
Capsule networks
Intelligent Control and Systems Laboratory
J.hyeon Park
2018-08-07
SNU AI Study
Covering range
• Original capsule network
Sabour 2017. "Dynamic routing between capsules."
• EM-routing
Hinton 2018. "Matrix capsules with EM routing."
• Unsupervised training
Rawlinson 2018. "Sparse unsupervised capsules generalize better.”
• Stable training
- Zhao 2018. "Investigating Capsule Networks with Dynamic Routing for Text Classification."
Traditional CNN : Conv + Pooling
http://cs231n.github.io/convolutional-networks/
Traditional CNN : Conv + Pooling
http://cs231n.github.io/convolutional-networks/
• Conv extracts features.
Traditional CNN : Conv + Pooling
http://cs231n.github.io/convolutional-networks/
• Conv extracts features.
• Pool abstracts features
Traditional CNN : Conv + Pooling
http://cs231n.github.io/convolutional-networks/
• Conv extracts features.
• Pool abstracts feature
• without spatial relationship!
Drawbacks of traditional CNN
Understanding Capsule Networks — AI’s Alluring New Architecture
Drawbacks of traditional CNN
Jaeyun’s Blog : 캡슐 네트워크(캡스넷 - Capsnet) – 1
Capsules
Capsules
• A capsule is a vector
Capsules
• A capsule is a vector
• Each capsule represents an entity (nose, eye ...)
capsule1
(faceline)
capsule2
(left eye)
capsule3
(right eye)
capsule4
(nose)
capsule5
(mouse)
Capsules
• A capsule is a vector
• Each capsule represents an entity (nose, eye ...)
• The direction of the capsule represents the property of entity
capsule3
(right eye)
...
various status of eyes...
Capsules
• A capsule is a vector
• Each capsule represents an entity (nose, eye ...)
• The direction of the capsule represents the property of entity
• The norm of the capsule represents the presence of entity
capsule3
(right eye)
∥ 𝑐𝑝𝑎𝑠𝑢𝑙𝑒3 ∥ is logit of the presence of eye
Capsules
• A capsule is a vector
• Each capsule represents an entity (nose, eye ...)
• The direction of the capsule represents the property of entity
• The norm of the capsule represents the presence of entity
• The lower capsules activate
the higher capsules according to its spatial hierarchy
face
capsule
face
capsule
X
Why I study a capsule?
A task visually demonstrated by human Robot will learn the task
Why I study a capsule?
Object segment by a region proposal network
(we need object-centric information for a robot)
Why I study a capsule?
Feature extraction by an Alexnet pre-trained with imagenet
Object spatial relationship determines task features
Why I study a capsule?
task features ?
Capsule network
28× 28
Sabour 2017. "Dynamic routing between capsules."
Capsule network
convolutional kernel
kernel = 9 × 9 × 256
strides = 1
28× 28
Capsule network
convolutional kernel
kernel = 9 × 9 × 256
strides = 2
28× 28
Capsule network
28× 28
1
1
8
primary capsule 𝑢𝑖
(dim = 8, total number = 6*6*32=1152)
1
16
digit capsule 𝑣𝑗
(dim = 16, total number = 10)
Capsule network
28× 28
prediction
𝑢𝑗|𝑖 = 𝑊𝑖𝑗 𝑢𝑖
𝑊 =[1152, 10, 8, 16]
Training parameter
(1) : # of PrimaryCaps
(2) : # of DigitCpas
(3) : Dim of PrimarCaps
(4) : Dim of DigitCaps
(1) (2) (3) (4)
𝑊𝑖𝑗 = [8,16]
pick 𝑖, 𝑗 components
1
1
8
primary capsule 𝑢𝑖
(dim = 8, total number = 6*6*32=1152)
1
16
digit capsule 𝑣𝑗
(dim = 16, total number = 10)
Capsule network
28× 28
dynamic routing
𝑣𝑗 = 𝑑𝑦𝑛𝑎𝑚𝑖𝑐 𝑟𝑜𝑢𝑡𝑖𝑛𝑔( 𝑢𝑗|𝑖)
1
1
8
primary capsule 𝑢𝑖
(dim = 8, total number = 6*6*32=1152)
1
16
digit capsule 𝑣𝑗
(dim = 16, total number = 10)
Capsule network
𝑢1|𝑖 𝑣1
𝑢2|𝑖
𝑢…|𝑖
𝑣2
𝑣…
𝑣10𝑢10|𝑖
Primary capsule 𝑖
...
...
𝑐1𝑖
𝑐2𝑖
𝑐…𝑖
𝑐10𝑖
Digit capsules
𝑣𝑗 = 𝑠𝑞𝑢𝑎𝑠ℎ Σ𝑖 𝑐𝑖𝑗 𝑢𝑗|𝑖
where 𝑠𝑞𝑢𝑎𝑠ℎ 𝑥 =
∥𝑥∥2
1+∥𝑥∥2
𝑥
∥𝑥∥
• Dynamic routing
Capsule network
𝑢1|𝑖 𝑣1
𝑢2|𝑖
𝑢…|𝑖
𝑣2
𝑣…
𝑣10𝑢10|𝑖
Primary capsule 𝑖
...
...
𝑐1𝑖
𝑐2𝑖
𝑐…𝑖
𝑐10𝑖
Digit capsules
Σ𝑗 𝑐𝑖𝑗 = 1
increase 𝑐𝑖𝑗 if 𝑢𝑗|𝑖 has similar direction with 𝑣𝑗
• Dynamic routing
Capsule network
• Dynamic routing
Charles Martin, Capsule Networks (slide share)
Capsule network
Capsule network
Loss = margin loss + reconstruction loss
• margin loss :
• reconstruction loss :
𝑇𝑘 = 1 if the label is 𝑘 otherwise 0
𝑚+ : target capsule length if activated 𝑚−: target capsule length if not activated
Capsule network
Capsule network
EM routing
Hinton 2018. "Matrix capsules with EM routing."
(𝑀: 4 × 4)
(𝑎: scalar)
prediction : 𝑉𝑖𝑗 = 𝑀𝑖 𝑊𝑖𝑗 (𝑊𝑖𝑗 : 4×4 trainable parameters connecting between each capsule 𝑖 and 𝑗)
routing 𝑉𝑖𝑗 : EM-routing
Recall : dynamic routing
EM routing
(gif)
Jonathan hui : Understanding Matrix capsules with EM Routing (Based on Hinton's Capsule Networks)
EM routing
• 4 × 4 Gaussian clusters
= 𝜇ℎ , 𝜎ℎ (ℎ = 1, … , 16)
EM routing
• 4 × 4 Gaussian clusters
= 𝜇ℎ , 𝜎ℎ (ℎ = 1, … , 16)
• For each Gaussian components ℎ of ,
computes the probability of 𝑣𝑖𝑗
ℎ
belonging to capsule 𝑗′
𝑠 Gaussian model
𝑝𝑖|𝑗
ℎ
=
1
2𝜋 𝜎𝑗
ℎ 2
exp −
𝑉𝑖𝑗
ℎ
− 𝜇 𝑗
ℎ 2
2 𝜎𝑗
ℎ 2
EM routing
• cost : the lower the cost, the more likely a capsule will be activated
𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
= − ln 𝑃𝑖|𝑗
ℎ
𝑐𝑜𝑠𝑡𝑗
ℎ
= 𝑖 𝑅𝑖𝑗 𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
where 𝑅𝑖𝑗 : assignment probability (the amount of data assigned to 𝑗)
EM routing
• cost : the lower the cost, the more likely a capsule will be activated
𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
= − ln 𝑃𝑖|𝑗
ℎ
𝑐𝑜𝑠𝑡𝑗
ℎ
= 𝑖 𝑅𝑖𝑗 𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
where 𝑅𝑖𝑗 : assignment probability (the amount of data assigned to 𝑗)
• activation :
𝑎𝑗 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝜆 𝑏𝑗 − ℎ 𝑐𝑜𝑠𝑡𝑗
ℎ
EM routing
• E-step :
determine 𝑅𝑖𝑗
• M-step :
recalculate 𝜇 𝑗, 𝜎𝑗, 𝑎𝑗 to reduce cost
EM routing
• E-step :
𝑝𝑗 =
1
2𝜋 𝜎𝑗
ℎ
2
exp − ℎ
𝐻
𝑉𝑖𝑗
ℎ
−𝜇 𝑗
ℎ
2
2 𝜎𝑗
ℎ
2
𝑅𝑖𝑗 =
𝑎 𝑗 𝑝 𝑗
𝑎 𝑘 𝑝 𝑘
EM routing
• M-step
𝑅𝑖𝑗 = 𝑅𝑖𝑗 ∗ 𝑎𝑖
𝜇 𝑗
ℎ
=
𝑖 𝑅𝑖𝑗 𝑉𝑖𝑗
ℎ
𝑖 𝑅𝑖𝑗
𝜎𝑗
ℎ 2
=
𝑖 𝑅𝑖𝑗 𝑉𝑖𝑗
ℎ
− 𝜇 𝑗
ℎ 2
𝑖 𝑅𝑖𝑗
𝑐𝑜𝑠𝑡𝑖𝑗
ℎ
= − ln 𝑃𝑖|𝑗
ℎ
𝑎𝑗 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝜆 𝑏𝑗 − ℎ 𝑐𝑜𝑠𝑡𝑗
ℎ
EM routing
EM routing
spread loss :
EM routing
EM routing
Unsupervised training
Rawlinson 2018. "Sparse unsupervised capsules generalize better.“
Unsupervised training
Unsupervised training
Unsupervised training
• Add sparsity to capsule
𝜓𝑗𝑘 : weight connecting capsules
𝑔𝑗 : boosting value
𝑟𝑗𝑘 : activation raking of j-th capsule
𝑚𝑗𝑘 : normalized ranking
𝑣 : original capsule
𝑣′ : sparsity-added capsule
• Count the # of activation for each capsule
Unsupervised training
𝑟𝑗𝑘 : activation raking of j-th capsule of k-th data
K : batch size 𝐽 : number of capsule
𝜖𝑗 : count of activation of j-th capsule
𝜇 𝑗 : moving average of 𝜖𝑗
Unsupervised training
• Boost capsule based on the count
𝑑 : boosting step size
𝑔𝑗 : boosting value
𝜇 𝑚𝑖𝑛, 𝜇 𝑚𝑎𝑥 : target frequency of activation
Unsupervised training
Unsupervised training
Stable training
Zhao 2018. "Investigating Capsule Networks with Dynamic Routing for Text
Classification."
Thank you!
Q&A

More Related Content

What's hot

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptography
Nishant Bhardwaj
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
Dr-Dipali Meher
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
Huy Nguyen
 
Queue
QueueQueue
The n Queen Problem
The n Queen ProblemThe n Queen Problem
The n Queen Problem
Sukrit Gupta
 
Inductive bias
Inductive biasInductive bias
Inductive bias
swapnac12
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
Chitta Ranjan
 
Queue Data Structure
Queue Data StructureQueue Data Structure
Queue Data Structure
Zidny Nafan
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
georgejustymirobi1
 
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycleBacktracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
varun arora
 
K MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptxK MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptx
kibriaswe
 
Multi tasking learning
Multi tasking learningMulti tasking learning
Multi tasking learning
ShreyusPuthiyapurail
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithm
allyn joy calcaben
 
K-Means manual work
K-Means manual workK-Means manual work
K-Means manual work
Dr.E.N.Sathishkumar
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsules
Kyuhwan Jung
 
Queues
QueuesQueues
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sort
Krish_ver2
 
Quantum Key Distribution Meetup Slides (Updated)
Quantum Key Distribution Meetup Slides (Updated)Quantum Key Distribution Meetup Slides (Updated)
Quantum Key Distribution Meetup Slides (Updated)
Kirby Linvill
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
Mostafa G. M. Mostafa
 

What's hot (20)

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptography
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
 
Queue
QueueQueue
Queue
 
The n Queen Problem
The n Queen ProblemThe n Queen Problem
The n Queen Problem
 
Inductive bias
Inductive biasInductive bias
Inductive bias
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
 
Queue Data Structure
Queue Data StructureQueue Data Structure
Queue Data Structure
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
 
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycleBacktracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
 
K MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptxK MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptx
 
Multi tasking learning
Multi tasking learningMulti tasking learning
Multi tasking learning
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithm
 
K-Means manual work
K-Means manual workK-Means manual work
K-Means manual work
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsules
 
Queues
QueuesQueues
Queues
 
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sort
 
Quantum Key Distribution Meetup Slides (Updated)
Quantum Key Distribution Meetup Slides (Updated)Quantum Key Distribution Meetup Slides (Updated)
Quantum Key Distribution Meetup Slides (Updated)
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
 

Similar to Capsule networks

Capsnet
CapsnetCapsnet
Capsnet
Yi-Fan Liou
 
DCWP_CVPR2023.pptx
DCWP_CVPR2023.pptxDCWP_CVPR2023.pptx
DCWP_CVPR2023.pptx
건영 박
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
Sungjoon Choi
 
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
RevathiSundar4
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 
Mncs 16-10-1주-변승규-introduction to the machine learning #2
Mncs 16-10-1주-변승규-introduction to the machine learning #2Mncs 16-10-1주-변승규-introduction to the machine learning #2
Mncs 16-10-1주-변승규-introduction to the machine learning #2
Seung-gyu Byeon
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
WooSung Choi
 
Neural Networks
Neural NetworksNeural Networks
4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization
Donghoon Park
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
Naoki Hayashi
 
High performance large-scale image recognition without normalization
High performance large-scale image recognition without normalizationHigh performance large-scale image recognition without normalization
High performance large-scale image recognition without normalization
taeseon ryu
 
A practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningA practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) Learning
Bruno Gonçalves
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
ParveenMalik18
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defense
Wei Yang
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
19526YuvaKumarIrigi
 
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
ssuser4b1f48
 
NeurIPS22.pptx
NeurIPS22.pptxNeurIPS22.pptx
NeurIPS22.pptx
Julián Tachella
 
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
Gyutaek Oh
 
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Lviv Startup Club
 

Similar to Capsule networks (20)

Capsnet
CapsnetCapsnet
Capsnet
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
DCWP_CVPR2023.pptx
DCWP_CVPR2023.pptxDCWP_CVPR2023.pptx
DCWP_CVPR2023.pptx
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Mncs 16-10-1주-변승규-introduction to the machine learning #2
Mncs 16-10-1주-변승규-introduction to the machine learning #2Mncs 16-10-1주-변승규-introduction to the machine learning #2
Mncs 16-10-1주-변승규-introduction to the machine learning #2
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
 
High performance large-scale image recognition without normalization
High performance large-scale image recognition without normalizationHigh performance large-scale image recognition without normalization
High performance large-scale image recognition without normalization
 
A practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningA practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) Learning
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defense
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
 
NeurIPS22.pptx
NeurIPS22.pptxNeurIPS22.pptx
NeurIPS22.pptx
 
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
Unpaired Deep Learning for Accelerated MRI Using Optimal Transport Driven Cyc...
 
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
 

Recently uploaded

H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 

Recently uploaded (20)

H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 

Capsule networks

  • 1. Capsule networks Intelligent Control and Systems Laboratory J.hyeon Park 2018-08-07 SNU AI Study
  • 2. Covering range • Original capsule network Sabour 2017. "Dynamic routing between capsules." • EM-routing Hinton 2018. "Matrix capsules with EM routing." • Unsupervised training Rawlinson 2018. "Sparse unsupervised capsules generalize better.” • Stable training - Zhao 2018. "Investigating Capsule Networks with Dynamic Routing for Text Classification."
  • 3. Traditional CNN : Conv + Pooling http://cs231n.github.io/convolutional-networks/
  • 4. Traditional CNN : Conv + Pooling http://cs231n.github.io/convolutional-networks/ • Conv extracts features.
  • 5. Traditional CNN : Conv + Pooling http://cs231n.github.io/convolutional-networks/ • Conv extracts features. • Pool abstracts features
  • 6. Traditional CNN : Conv + Pooling http://cs231n.github.io/convolutional-networks/ • Conv extracts features. • Pool abstracts feature • without spatial relationship!
  • 7. Drawbacks of traditional CNN Understanding Capsule Networks — AI’s Alluring New Architecture
  • 8. Drawbacks of traditional CNN Jaeyun’s Blog : 캡슐 네트워크(캡스넷 - Capsnet) – 1
  • 10. Capsules • A capsule is a vector
  • 11. Capsules • A capsule is a vector • Each capsule represents an entity (nose, eye ...) capsule1 (faceline) capsule2 (left eye) capsule3 (right eye) capsule4 (nose) capsule5 (mouse)
  • 12. Capsules • A capsule is a vector • Each capsule represents an entity (nose, eye ...) • The direction of the capsule represents the property of entity capsule3 (right eye) ... various status of eyes...
  • 13. Capsules • A capsule is a vector • Each capsule represents an entity (nose, eye ...) • The direction of the capsule represents the property of entity • The norm of the capsule represents the presence of entity capsule3 (right eye) ∥ 𝑐𝑝𝑎𝑠𝑢𝑙𝑒3 ∥ is logit of the presence of eye
  • 14. Capsules • A capsule is a vector • Each capsule represents an entity (nose, eye ...) • The direction of the capsule represents the property of entity • The norm of the capsule represents the presence of entity • The lower capsules activate the higher capsules according to its spatial hierarchy face capsule face capsule X
  • 15. Why I study a capsule? A task visually demonstrated by human Robot will learn the task
  • 16. Why I study a capsule? Object segment by a region proposal network (we need object-centric information for a robot)
  • 17. Why I study a capsule? Feature extraction by an Alexnet pre-trained with imagenet
  • 18. Object spatial relationship determines task features Why I study a capsule? task features ?
  • 19. Capsule network 28× 28 Sabour 2017. "Dynamic routing between capsules."
  • 20. Capsule network convolutional kernel kernel = 9 × 9 × 256 strides = 1 28× 28
  • 21. Capsule network convolutional kernel kernel = 9 × 9 × 256 strides = 2 28× 28
  • 22. Capsule network 28× 28 1 1 8 primary capsule 𝑢𝑖 (dim = 8, total number = 6*6*32=1152) 1 16 digit capsule 𝑣𝑗 (dim = 16, total number = 10)
  • 23. Capsule network 28× 28 prediction 𝑢𝑗|𝑖 = 𝑊𝑖𝑗 𝑢𝑖 𝑊 =[1152, 10, 8, 16] Training parameter (1) : # of PrimaryCaps (2) : # of DigitCpas (3) : Dim of PrimarCaps (4) : Dim of DigitCaps (1) (2) (3) (4) 𝑊𝑖𝑗 = [8,16] pick 𝑖, 𝑗 components 1 1 8 primary capsule 𝑢𝑖 (dim = 8, total number = 6*6*32=1152) 1 16 digit capsule 𝑣𝑗 (dim = 16, total number = 10)
  • 24. Capsule network 28× 28 dynamic routing 𝑣𝑗 = 𝑑𝑦𝑛𝑎𝑚𝑖𝑐 𝑟𝑜𝑢𝑡𝑖𝑛𝑔( 𝑢𝑗|𝑖) 1 1 8 primary capsule 𝑢𝑖 (dim = 8, total number = 6*6*32=1152) 1 16 digit capsule 𝑣𝑗 (dim = 16, total number = 10)
  • 25. Capsule network 𝑢1|𝑖 𝑣1 𝑢2|𝑖 𝑢…|𝑖 𝑣2 𝑣… 𝑣10𝑢10|𝑖 Primary capsule 𝑖 ... ... 𝑐1𝑖 𝑐2𝑖 𝑐…𝑖 𝑐10𝑖 Digit capsules 𝑣𝑗 = 𝑠𝑞𝑢𝑎𝑠ℎ Σ𝑖 𝑐𝑖𝑗 𝑢𝑗|𝑖 where 𝑠𝑞𝑢𝑎𝑠ℎ 𝑥 = ∥𝑥∥2 1+∥𝑥∥2 𝑥 ∥𝑥∥ • Dynamic routing
  • 26. Capsule network 𝑢1|𝑖 𝑣1 𝑢2|𝑖 𝑢…|𝑖 𝑣2 𝑣… 𝑣10𝑢10|𝑖 Primary capsule 𝑖 ... ... 𝑐1𝑖 𝑐2𝑖 𝑐…𝑖 𝑐10𝑖 Digit capsules Σ𝑗 𝑐𝑖𝑗 = 1 increase 𝑐𝑖𝑗 if 𝑢𝑗|𝑖 has similar direction with 𝑣𝑗 • Dynamic routing
  • 27. Capsule network • Dynamic routing Charles Martin, Capsule Networks (slide share)
  • 29. Capsule network Loss = margin loss + reconstruction loss • margin loss : • reconstruction loss : 𝑇𝑘 = 1 if the label is 𝑘 otherwise 0 𝑚+ : target capsule length if activated 𝑚−: target capsule length if not activated
  • 32. EM routing Hinton 2018. "Matrix capsules with EM routing." (𝑀: 4 × 4) (𝑎: scalar) prediction : 𝑉𝑖𝑗 = 𝑀𝑖 𝑊𝑖𝑗 (𝑊𝑖𝑗 : 4×4 trainable parameters connecting between each capsule 𝑖 and 𝑗) routing 𝑉𝑖𝑗 : EM-routing
  • 33. Recall : dynamic routing
  • 34. EM routing (gif) Jonathan hui : Understanding Matrix capsules with EM Routing (Based on Hinton's Capsule Networks)
  • 35. EM routing • 4 × 4 Gaussian clusters = 𝜇ℎ , 𝜎ℎ (ℎ = 1, … , 16)
  • 36. EM routing • 4 × 4 Gaussian clusters = 𝜇ℎ , 𝜎ℎ (ℎ = 1, … , 16) • For each Gaussian components ℎ of , computes the probability of 𝑣𝑖𝑗 ℎ belonging to capsule 𝑗′ 𝑠 Gaussian model 𝑝𝑖|𝑗 ℎ = 1 2𝜋 𝜎𝑗 ℎ 2 exp − 𝑉𝑖𝑗 ℎ − 𝜇 𝑗 ℎ 2 2 𝜎𝑗 ℎ 2
  • 37. EM routing • cost : the lower the cost, the more likely a capsule will be activated 𝑐𝑜𝑠𝑡𝑖𝑗 ℎ = − ln 𝑃𝑖|𝑗 ℎ 𝑐𝑜𝑠𝑡𝑗 ℎ = 𝑖 𝑅𝑖𝑗 𝑐𝑜𝑠𝑡𝑖𝑗 ℎ where 𝑅𝑖𝑗 : assignment probability (the amount of data assigned to 𝑗)
  • 38. EM routing • cost : the lower the cost, the more likely a capsule will be activated 𝑐𝑜𝑠𝑡𝑖𝑗 ℎ = − ln 𝑃𝑖|𝑗 ℎ 𝑐𝑜𝑠𝑡𝑗 ℎ = 𝑖 𝑅𝑖𝑗 𝑐𝑜𝑠𝑡𝑖𝑗 ℎ where 𝑅𝑖𝑗 : assignment probability (the amount of data assigned to 𝑗) • activation : 𝑎𝑗 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝜆 𝑏𝑗 − ℎ 𝑐𝑜𝑠𝑡𝑗 ℎ
  • 39. EM routing • E-step : determine 𝑅𝑖𝑗 • M-step : recalculate 𝜇 𝑗, 𝜎𝑗, 𝑎𝑗 to reduce cost
  • 40. EM routing • E-step : 𝑝𝑗 = 1 2𝜋 𝜎𝑗 ℎ 2 exp − ℎ 𝐻 𝑉𝑖𝑗 ℎ −𝜇 𝑗 ℎ 2 2 𝜎𝑗 ℎ 2 𝑅𝑖𝑗 = 𝑎 𝑗 𝑝 𝑗 𝑎 𝑘 𝑝 𝑘
  • 41. EM routing • M-step 𝑅𝑖𝑗 = 𝑅𝑖𝑗 ∗ 𝑎𝑖 𝜇 𝑗 ℎ = 𝑖 𝑅𝑖𝑗 𝑉𝑖𝑗 ℎ 𝑖 𝑅𝑖𝑗 𝜎𝑗 ℎ 2 = 𝑖 𝑅𝑖𝑗 𝑉𝑖𝑗 ℎ − 𝜇 𝑗 ℎ 2 𝑖 𝑅𝑖𝑗 𝑐𝑜𝑠𝑡𝑖𝑗 ℎ = − ln 𝑃𝑖|𝑗 ℎ 𝑎𝑗 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝜆 𝑏𝑗 − ℎ 𝑐𝑜𝑠𝑡𝑗 ℎ
  • 46. Unsupervised training Rawlinson 2018. "Sparse unsupervised capsules generalize better.“
  • 49. Unsupervised training • Add sparsity to capsule 𝜓𝑗𝑘 : weight connecting capsules 𝑔𝑗 : boosting value 𝑟𝑗𝑘 : activation raking of j-th capsule 𝑚𝑗𝑘 : normalized ranking 𝑣 : original capsule 𝑣′ : sparsity-added capsule
  • 50. • Count the # of activation for each capsule Unsupervised training 𝑟𝑗𝑘 : activation raking of j-th capsule of k-th data K : batch size 𝐽 : number of capsule 𝜖𝑗 : count of activation of j-th capsule 𝜇 𝑗 : moving average of 𝜖𝑗
  • 51. Unsupervised training • Boost capsule based on the count 𝑑 : boosting step size 𝑔𝑗 : boosting value 𝜇 𝑚𝑖𝑛, 𝜇 𝑚𝑎𝑥 : target frequency of activation
  • 54. Stable training Zhao 2018. "Investigating Capsule Networks with Dynamic Routing for Text Classification."