SlideShare a Scribd company logo
Capsule Networks
Aurélien Géron, November 2017
https://youtu.be/pPN8d0E3900
Aurélien Géron, 2017
NIPS 2017 Paper
Dynamic Routing Between Capsules
by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
October 2017: https://arxiv.org/abs/1710.09829
Aurélien Géron, 2017
Computer Graphics
Rectangle
x=20
y=30
angle=16°
Triangle
x=24
y=25
angle=-65°
Instantiation parameters ImageRendering
Aurélien Géron, 2017
Inverse Graphics
Instantiation parameters ImageInverse rendering
Rectangle
x=20
y=30
angle=16°
Triangle
x=24
y=25
angle=-65°
Aurélien Géron, 2017
Capsules
Capsule activations ImageInverse rendering
=
=
Aurélien Géron, 2017
Activation vector:
Capsules
Length = estimated probability of presence
Orientation = object’s estimated pose parameters
=
=
Aurélien Géron, 2017
Squash(u) =
Capsules
=
= Convolutional Layers
+ Reshape
+ Squash
||u||2
1 + ||u||2
u
||u||
Aurélien Géron, 2017
Equivariance
=
=
Aurélien Géron, 2017
Equivariance
=
=
Aurélien Géron, 2017
A hierarchy of parts
Boat
x=22
y=28
angle=16°
Aurélien Géron, 2017
A hierarchy of parts
Rectangle
x=20
y=30
angle=16°
Triangle
x=24
y=25
angle=-65°
Boat
x=22
y=28
angle=16°
Aurélien Géron, 2017
A hierarchy of parts
Rectangle
x=20
y=30
angle=-5°
Triangle
x=26
y=31
angle=137°
House
x=22
y=28
angle=-5°
Aurélien Géron, 2017
Primary Capsules
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
One transformation matrix Wi,j
per part/whole pair (i, j).
ûj|i = Wi,j ui
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s
Output
=
=
Primary Capsules
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
Aurélien Géron, 2017
Routing by
Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!
Aurélien Géron, 2017
The rectangle and triangle
capsules should be routed to
the boat capsules.
Routing by
Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!
Aurélien Géron, 2017
Clusters of
Agreement
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Clusters of
Agreement
Mean
Aurélien Géron, 2017
Routing Weights
=
=
Predicted Outputs
Primary Capsules
bi,j=0 for all i, j
Aurélien Géron, 2017
Routing Weights
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
bi,j=0 for all i, j
ci = softmax(bi)
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.5
0.5
0.5
0.5
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj
Large
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #1)
Update Routing
Weights
=
=
Predicted Outputs
Primary Capsules
Disagreement bi,j += ûj|i . vj
Small
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
sj = weighted sum
vj = squash(sj)0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Actual outputs
of the next layer capsules
(round #2)
Compute Next
Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Handling Crowded
Scenes
=
=
=
=
Aurélien Géron, 2017
Handling Crowded
Scenes
=
=
=
=
Is this an upside
down house?
Aurélien Géron, 2017
Handling Crowded
Scenes
=
=
=
=
House
Thanks to routing by agreement,
the ambiguity is quickly resolved
(explaining away).
Boat
Aurélien Géron, 2017
Classification
CapsNet
|| ℓ2 || Estimated Class Probability
Aurélien Géron, 2017
Training
|| ℓ2 || Estimated Class Probability
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5
Aurélien Géron, 2017
Training
Translated to English:
“If an object of class k
is present, then ||vk||2
should be no less than
0.9. If not, then ||vk||2
should be no more
than 0.1.”
|| ℓ2 || Estimated Class Probability
To allow multiple classes,
minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:
m- = 0.1
m+ = 0.9
λ = 0.5
Aurélien Géron, 2017
Regularization by
Reconstruction
|| ℓ2 ||
Feedforward
Neural Network
Decoder
Reconstruction
Aurélien Géron, 2017
Regularization by
Reconstruction
|| ℓ2 ||
Feedforward
Neural Network
Decoder
Reconstruction
Loss = margin loss + α reconstruction loss
The reconstruction loss is the squared difference
between the reconstructed image and the input image.
In the paper, α = 0.0005.
Aurélien Géron, 2017
A CapsNet for
MNIST
(Figure 1 from the paper)
Aurélien Géron, 2017
A CapsNet for
MNIST – Decoder
(Figure 2 from the paper)
Aurélien Géron, 2017
Interpretable
Activation Vectors
(Figure 4 from the paper)
Aurélien Géron, 2017
Pros
● Reaches high accuracy on MNIST, and promising on CIFAR10
● Requires less training data
● Position and pose information are preserved (equivariance)
● This is promising for image segmentation and object detection
● Routing by agreement is great for overlapping objects (explaining away)
● Capsule activations nicely map the hierarchy of parts
● Offers robustness to affine transformations
● Activation vectors are easier to interpret (rotation, thickness, skew…)
● It’s Hinton! ;-)
Aurélien Géron, 2017
● Not state of the art on CIFAR10 (but it’s a good start)
● Not tested yet on larger images (e.g., ImageNet): will it work well?
● Slow training, due to the inner loop (in the routing by agreement algorithm)
● A CapsNet cannot see two very close identical objects
○ This is called “crowding”, and it has been observed as well in human vision
Cons
Aurélien Géron, 2017
Implementations
● Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet-
Keras
● TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow
● PyTorch: https://github.com/gram-ai/capsule-networks
Amazon: https://goo.gl/IoWYKD
Twitter: @aureliengeron
github.com/ageron

More Related Content

What's hot

Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
CIFAR-10
CIFAR-10CIFAR-10
CIFAR-10
satyam_madala
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
Antonio Rueda-Toicen
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
Amir Razmjou
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Brodmann17
 
Spatial domain and filtering
Spatial domain and filteringSpatial domain and filtering
Spatial domain and filtering
University of Potsdam
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Fellowship at Vodafone FutureLab
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
YOLO
YOLOYOLO
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
SURBHI SAROHA
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Ashray Bhandare
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
Andrew Ferlitsch
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
Mostafa G. M. Mostafa
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 

What's hot (20)

Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
CIFAR-10
CIFAR-10CIFAR-10
CIFAR-10
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Spatial domain and filtering
Spatial domain and filteringSpatial domain and filtering
Spatial domain and filtering
 
Random forest
Random forestRandom forest
Random forest
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
KNN
KNNKNN
KNN
 
YOLO
YOLOYOLO
YOLO
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Recently uploaded

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 

Recently uploaded (20)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 

Introduction to Capsule Networks (CapsNets)

Editor's Notes

  1. This presentation will tell you all about Capsule Networks, a hot new architecture for neural nets. Geoffrey Hinton had the idea of Capsule Networks several years ago, and he published a paper in 2011 that introduced many of the key ideas, but he had a hard time making them work properly, until now.
  2. A few weeks ago, in October 2017, a paper called “Dynamic Routing Between Capsules” was published by Sara Sabour, Nicholas Frosst and of course Geoffrey Hinton. They managed to reach state of the art performance on the MNIST dataset, and demonstrated considerably better results than convolutional neural nets on highly overlapping digits. So what are capsule networks exactly?
  3. Well, in computer graphics, you start with an abstract representation of a scene, for example a rectangle at position x=20 and y=30, rotated by 16°, and so on. Each object type has various instantiation parameters. Then you call some rendering function, and boom, you get an image.
  4. Inverse graphics, is just the reverse process. You start with an image, and you try to find what objects it contains, and what their instantiation parameters are. A capsule network is basically a neural network that tries to perform inverse graphics.
  5. It is composed of many capsules. A capsule is any function that tries to predict the presence and the instantiation parameters of a particular object at a given location. For example, the network above contains 50 capsules. The arrows represent the output vectors of these capsules. The capsules output vectors. The black arrows correspond to capsules that try to find rectangles, while the blue arrows represent the output of capsules looking for triangles. The length of an activation vector represents the estimated probability that the object the capsule is looking for is indeed present. You can see that most arrows are tiny, meaning the capsules didn’t detect anything, but two arrows are quite long. This means that the capsules at these locations are pretty confident that they found what they were looking for, in this case a rectangle, and a triangle.
  6. Next, the orientation of the activation vector encodes the instantiation parameters of the object, for example in this case the object’s rotation, but it could be also its thickness, how stretched or skewed it is, its exact position (there might be slight translations), and so on. For simplicity, I’ll just focus on the rotation parameter, but in a real capsule network, the activation vectors may have 5, 10 dimensions or more.
  7. In practice, a good way to implement this is to first apply a couple convolutional layers, just like in a regular convolutional neural net. This will output an array containing a bunch of feature maps. You can then reshape this array to get a set of vectors for each location. For example, suppose the convolutional layers output an array containing, say, 18 feature maps (2 times 9), you can easily reshape this array to get 2 vectors of 9 dimensions each, for every location. You could also get 3 vectors of 6 dimensions each, and so on. Something that would look like the capsule network represented here with two vectors at each location. The last step is to ensure that no vector is longer than 1, since the vector’s length is meant to represent a probability, it cannot be greater than 1. To do this, we apply a squashing function. It preserves the vector’s orientation, but it squashes it to ensure that its length is between 0 and 1.
  8. One key feature of Capsule Networks is that they preserve detailed information about the object’s location and its pose, throughout the network. For example, if I rotate the image slightly...
  9. ...notice that the activation vectors also change slightly. Right? This is called equivariance. In a regular convolutional neural net, there are generally several pooling layers, and unfortunately these pooling layers tend to lose information, such as the precise location and pose of the objects. It’s really not a big deal if you just want to classify the whole image, but it makes it challenging to perform accurate image segmentation or object detection (which require precise location and pose). The fact that capsules are equivariant makes them very promising for these applications.
  10. All right, so now let’s see how capsule networks can handle objects that are composed of a hierarchy of parts. For example, consider a boat centered at position x=22 and y=28, and rotated by 16°. This boat is composed of parts. In this case one rectangle and one triangle.
  11. So this is how it would be rendered. Now we want to do the reverse, we want inverse graphics, so we want to go from the image to this whole hierarchy of parts with their instantiation parameters.
  12. Similarly, we could also draw a house, using the same parts, a rectangle and a triangle, but this time organized in a different way. So the trick will be to try to go from this image containing a rectangle and a triangle, and figure out, not only that the rectangle and triangle are at this location and this orientation, but also that they are part of a boat, not a house. So let’s figure out how it would do this.
  13. The first step we have already seen: we run a couple convolutional layers, we reshape the output to get vectors, and we squash them. This gives us the output of the primary capsules. We’ve got the first layer already. The next step is where most of the magic and complexity of capsule networks takes place. Every capsule in the first layer tries to predict the output of every capsule in the next layer.
  14. For example, let’s consider the capsule that detected the rectangle. I’ll call it the rectangle-capsule.
  15. Let’s suppose that there are just two capsules in the next layer, the house-capsule and the boat-capsule. Since the rectangle-capsule detected a rectangle rotated by 16°, it predicts that the house-capsule will detect a house rotated by 16°, that makes sense, and the boat-capsule will detect a boat rotated by 16° as well. That’s what would be consistent with the orientation of the rectangle.
  16. So, to make this prediction, what the rectangle-capsule does is it simply computes the dot product of a transformation matrix W_i,j with its own activation vector u_i. During training, the network will gradually learn a transformation matrix for each pair of capsules in the first and second layer. In other words, it will learn all the part-whole relationships, for example the angle between the wall and the roof of a house, and so on.
  17. Now let’s see what the triangle-capsule predicts.
  18. This time, it’s a bit more interesting: given the rotation angle of the triangle, it predicts that the house-capsule will detect an upside-down house, and that the boat-capsule will detect a boat rotated by 16°. These are the positions that would be consistent with the rotation angle of the triangle.
  19. Now we have a bunch of predicted outputs, what do we do with them?
  20. As you can see, the rectangle-capsule and the triangle-capsule strongly agree on what the boat-capsule will output. In other words, they agree that a boat positioned in this way would explain their own positions and rotations. And they totally disagree on what the house-capsule will output. Therefore, it makes sense to assume that the rectangle and triangle are part of a boat, not a house.
  21. Now that we know that the rectangle and triangle are part of a boat, the outputs of the rectangle capsule and the triangle capsule really concern only the boat capsule, there’s no need to send these outputs to any other capsule, this would just add noise. They should be sent only to the boat capsule. This is called routing by agreement. There are several benefits: first, since capsule outputs are only routed to the appropriate capsule in the next layer, these capsules will get a cleaner input signal and will more accurately determine the pose of the object. Second, by looking at the paths of the activations, you can easily navigate the hierarchy of parts, and know exactly which part belongs to which object (like, the rectangle belongs to the boat, or the triangle belongs to the boat, and so on). Lastly, routing by agreement helps parse crowded scenes with overlapping objects (we will see this in a few slides). But first, let’s look at how routing by agreement is implemented in Capsule Networks.
  22. Here, I have represented the various poses of the boat, as predicted by the lower-level capsules. For example, one of these circles may represent what the rectangle-capsule thinks about the most likely pose of the boat, and another circle may represent what the triangle-capsule thinks, and if we suppose that there are many other low-level capsules, then we might get a cloud of prediction vectors, for the boat capsule, like this. In this example, there are two pose parameters: one represents the rotation angle, and the other represents the size of the boat. As I mentioned earlier, pose parameters may capture many different kinds of visual features, like skew, thickness, and so on. Or precise location. So the first thing we do, is we compute the mean of all these predictions.
  23. This gives us this vector. The next step is to measure the distance between each predicted vector and the mean vector. I will use here the euclidian distance here, but capsule networks actually use the scalar product. Basically, we want to measure how much each predicted vector agrees with the mean predicted vector. Using this agreement measure, we can update the weight of every predicted vector accordingly.
  24. Note that the predicted vectors that are far from the mean now have a very small weight, and the ones closest to the mean have a much stronger weight. I’ve represented them in black. Now we can just compute the mean once again (or I should say, the weighted mean).
  25. and you’ll notice that it moves slightly towards the cluster, towards the center of the cluster. So next, we can once again update the weights.
  26. And now most of the vectors within the cluster have turned black. And again, we can update the mean.
  27. And we can repeat this process a few times. In practice 3 to 5 iterations are generally sufficient. This might remind you, I suppose, of the k-means clustering algorithm if you know it. Okay, so this is how we find clusters of agreement. Now let’s see how the whole algorithm works in a bit more details.
  28. First, for every predicted output, we start by setting a raw routing weight b_i,j equal to 0.
  29. Next, we apply the softmax function to these raw weights, for each primary capsule. This gives the actual routing weights for each predicted output, in this example 0.5 each.
  30. Next we compute a weighted sum of the predictions, for each capsule in the next layer.
  31. This might give vectors longer than 1, so as usual we apply the squash function.
  32. And voilà! We now have the actual outputs of the house-capsule and boat-capsule. But this is not the final output, it’s just the end of the first round, the first iteration.
  33. Now we can see which predictions were most accurate. For example, the rectangle-capsule made a great prediction for the boat-capsule’s output. It really matches it pretty closely.
  34. This is estimated by computing the scalar product of the predicted output vector û_j|i and the actual product vector v_j. This scalar product is simply added to the predicted output’s raw routing weight, b_i,j. So the weight of this particular predicted output is increased.
  35. When there is a strong agreement, this scalar product is large, so good predictions will have a higher weight.
  36. On the other hand, the rectangle-capsule made a pretty bad prediction for the house-capsule’s output, so the scalar product in this case will be quite small, and the raw routing weight of this predicted vector will not grow much.
  37. Next, we update the routing weights by computing the softmax of the raw weights, once again. And as you can see, the rectangle-capsule’s predicted vector for the boat-capsule now has a weight of 0.8, while it’s predicted vector for the house-capsule dropped down to 0.2. So most of its output is now going to go to the boat capsule, not the house capsule.
  38. Once again we compute the weighted sum of all the predicted output vectors for each capsule in the next layer, that is the house-capsule and the boat-capsule. And this time, the house-capsule gets so little input that its output is a tiny vector. On the other hand the boat-capsule gets so much input that it outputs a vector much longer than 1. So again we squash it.
  39. And that’s the end of round #2.
  40. And as you can see, in just a couple iterations, we have already ruled out the house and clearly chosen the boat. After perhaps one or two more rounds, we can stop and proceed to the next capsule layer in exactly the same way.
  41. So as I mentioned earlier, routing by agreement is really great to handle crowded scenes, such as the one represented in this image.
  42. One way to interpret this image (as you can see there is a bit of ambiguity), you can see a house upside down in the middle. However, if this was the case, then there would be no explanation for the bottom rectangle or the top triangle, no reason for them to be where they are.
  43. The best way to interpret the image is that there is a house at the top and a boat at the bottom. And routing by agreement will tend to choose this solution, since it makes all the capsules perfectly happy, each of them making perfect predictions for the capsules in the next layer. The ambiguity is explained away. Okay, so what can you do with a capsule network now that you know how it works?
  44. Well for one, you can create a nice image classifier of course. Just have one capsule per class in the top layer and that’s almost all there is to it. All you need to add is a layer that computes the length of the top-layer activation vectors, and this gives you the estimated class probabilities. You could then just train the network by minimizing the cross-entropy loss, as in a regular classification neural network, and you would be done.
  45. However, in the paper they use a margin loss that makes it possible to detect multiple classes in the image.
  46. So without going into too much details, this margin loss is such that if an object of class k is present in the image, then the corresponding top-level capsule should output a vector whose squared length is at least 0.9. It should be long. Conversely, if an object of class k is not present in the image, then the capsule should output a short vector, one whose squared length is shorter than 0.1. So the total loss is the sum of losses for all classes.
  47. In the paper, they also add a decoder network on top of the capsule network. It’s just 3 fully connected layers with a sigmoid activation function in the output layer. It learns to reconstruct the input image by minimizing the squared difference between the reconstructed image and the input image.
  48. The full loss is the margin loss we discussed earlier, plus the reconstruction loss (scaled down considerably so as to ensure that the margin loss dominates training). The benefit of applying this reconstruction loss is that it forces the network to preserve all the information required to reconstruct the image, up to the top layer of the capsule network, its output layer. This constraint acts a bit like a regularizer: it reduces the risk of overfitting and helps generalize to new examples. And that’s it! You know how a capsule network works, and how to train it. Let’s look a little bit at some of the figures in the paper, which I find interesting.
  49. This is figure 1 from the paper, showing a full capsule network for MNIST. You can see the first two regular convolutional layers, whose output is reshaped and squashed to get the activation vectors of the primary capsules. And these primary capsules are organized in a 6 by 6 grid, with 32 primary capsules in each cell of this grid, and each primary capsule outputs an 8-dimensional vector. So this first layer of capsules is fully connected to the 10 output capsules, which output 16 dimensional vectors. The length of these vectors is used to compute the margin loss, as explained earlier.
  50. Now this is figure 2 from the paper. It shows the decoder sitting on top of the capsnet. It is composed of 2 fully connected ReLU layers plus a fully connected sigmoid layer which outputs 784 numbers that correspond to the pixel intensities of the reconstructed image (which is a 28 by 28 pixel image). The squared difference between this reconstructed image and the input image gives the reconstruction loss.
  51. Right, and this is figure 4 from the paper. One nice thing about capsule networks is that the activation vectors are often interpretable. For example, this image shows the reconstructions that you get when you gradually modify one of the 16 dimensions of the top layer capsules’ output. You can see that the first dimension seems to represent scale and thickness. The fourth dimension represents a localized skew. The fifth represents the width of the digit plus a slight translation to get the exact position. So as you can see, it’s rather clear what most of these parameters do.
  52. Okay, to conclude, let’s summarize the pros and cons. Capsule networks have reached state of the art accuracy on MNIST. On CIFAR10, they got a bit over 10% error, which is far from state of the art, but it’s similar to what was first obtained with other techniques before years of efforts were put into them, so it’s still a good start. Capsule networks require less training data. They offer equivariance, which means that position and pose information are preserved. And this is very promising for image segmentation and object detection. The routing by agreement algorithm is great for crowded scenes. The routing tree also maps the hierarchy of objects parts, so every part is assigned to a whole. And it’s rather robust to rotations, translations and other affine transformations. The activation vectors somewhat are interpretable. And finally, obviously, it’s Hinton’s idea, so don’t bet against it.
  53. However, there are a few cons: first, as I mentioned the results are not yet state of the art on CIFAR10, even though it’s a good start. Plus, it’s still unclear whether capsule networks can scale to larger images, such as the ImageNet dataset. What will the accuracy be? Capsule networks are also quite slow to train, in large part because of the routing by agreement algorithm which has an inner loop, as you saw earlier. Finally, there is only one capsule of any given type in a given location, so it’s impossible for a capsule network to detect two objects of the same type if they are too close to one another. This is called crowding, and it has been observed in human vision as well, so it’s probably not a show-stopper.
  54. All right! I highly recommend you take a look at the code of a CapsNet implementation, such as the ones listed here (I’ll leave the links in the video description below). If you take your time, you should have no problem understanding everything the code is doing. The main difficulty in implementing CapsNets is that it contains an inner loop for the routing by agreement algorithm. Implementing loops in Keras and TensorFlow can be a little bit trickier than in PyTorch, but it can be done. If you don’t have a particular preference, then I would say that the PyTorch code is the easiest to understand.
  55. And that’s all I had, I hope you enjoyed this presentation. If you did, please visit my YouTube channel, like, share, comment, subscribe, etc. It’s my first real YouTube video, and if people find it useful, I might make some more. If you want to learn more about Machine Learning, Deep Learning and Deep Reinforcement Learning, you may want to read my O’Reilly book Hands-on Machine Learning with Scikit-Learn and TensorFlow. It covers a ton of topics, with many code examples that you will find on my github account, so I’ll leave the links in the video description. That’s all for today, have fun and see you next time!