SlideShare a Scribd company logo
calculation | consulting
capsule networks
(TM)
c|c
(TM)
charles@calculationconsulting.com
calculation|consulting
capsule networks
(TM)
charles@calculationconsulting.com
c|c
(TM)
(TM)
3
calculation | consulting capsule networks
Capsule networks by Hinton
c|c
(TM)
(TM)
4
calculation | consulting capsule networks
Capsule networks by Hinton
c|c
(TM)
(TM)
5
calculation | consulting capsule networks
Where ConvNets come from: LeNet 5
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
Gradient-based learning applied to document recognition,
Proc. IEEE 86(11): 2278–2324, 1998.
c|c
(TM)
(TM)
6
calculation | consulting capsule networks
Convolutions usually w/ max pooling
we get gross spatial invariance by ignoring
exactly where a feature occurs
“A vision system needs to use the same
knowledge at all locations in the image” Hinton
ConvNet: share weights + max pooling
c|c
(TM)
(TM)
7
calculation | consulting capsule networks
Hierarchical model of the visual system
HMax model, Riesenhuber and Poggio (1999)
dotted line selects max pooled features from lower layer
c|c
(TM)
(TM)
8
calculation | consulting capsule networks
Hierarchical model of the visual system
Pooling proposed by Hubel andWiesel in1962
A. Receptive field (RF) of simple cell
(green) formed by pooling over
(center-surround) cells (yellow) in
the same orientation row
B. RF of complex cell (green) formed by
pooling over over simple cells.
here: (crude) translation invariance
c|c
(TM)
(TM)
9
calculation | consulting capsule networks
Hierarchical model of the visual system
ConvNets resemble hierarchical models (but notice the hyper-column)
HMax model, Riesenhuber and Poggio (1999)
c|c
(TM)
(TM)
10
calculation | consulting capsule networks
Hinton: why max pooling is bad ?
(If) the brain embeds things in rectangular space, then
Translation is easy; Rotation is hard
Experiment: time for mind to process rotation ~ amount
Conv Nets:
Crude translation invariance
No explicit pose (orientation) information
Can not distinguish left from right
(actually some people have stopped using pooling)
A vision system needs to use the same knowledge at all locations in the image
c|c
(TM)
(TM)
11
calculation | consulting capsule networks
2 streams hypothesis: what and where
Ventral: what objects are
Dorsal: where objects are in space
How do we know ? Neurological disorders
Simultanagnosia: can only see one object at a time
idea dates back to 1968
lots of other evidence as well
https://www.youtube.com/watch?v=mCoYOFzSS9A
c|c
(TM)
(TM)
12
calculation | consulting capsule networks
Cortical Microcolumns
Capsules may encode
orientation scale
velocity color …
Column through cortical layers of the brain
80-120 neurons (2X long inV1)
share the same receptive field
part of Hubel andWiesel, Nobel Prize 1981
also see recent review: https://www.sciencedirect.com/science/article/pii/S0166223615001484
c|c
(TM)
(TM)
13
calculation | consulting capsule networks
Canonical object based frames of reference:
Hinton 1981
Hinton has been thinking about this a long time
A kind of inverse computer graphics
c|c
(TM)
(TM)
14
calculation | consulting capsule networks
Capsule networks: inverse computer graphics
computer graphics: rendering engine
capsule network: inverse graphics
matrix of pose
information
Hinton proposes that our brain does a kind-of inverse computer graphics transformation.
c|c
(TM)
(TM)
15
calculation | consulting capsule networks
Invariance vs Equivariance
Max pooling provides spatial Invariance, but Hinton argues we need spatial Equivariance.
so use vectors and Affine transformations
Invariance: similar results if
image is shifted or rotated
Equivariance: invariance
under a Symmetry Transformations (S,A,…)
Group homomorphism: f(g*x)=g*f(x)=f(x)*g-1
Geometric: i.e. triangle
centers invariant under Similarity (S)
centroid invariant under Affine (A)
Statistics:
mean: invariant under change of units
median: more generally invariant; a better statistic
c|c
(TM)
(TM)
16
calculation | consulting capsule networks
Segmenting highly overlapping objects
Explaining away: Even if two hidden causes are independent, they can become
dependent when we observe an effect that they can both influence. Hinton
c|c
(TM)
(TM)
17
calculation | consulting capsule networks
Capsule networks: architecture
+ unsupervised | reconstruction loss
supervised | max norm loss
Hinton et. al. Dynamic Routing Between Capsules (2017)
c|c
(TM)
(TM)
18
calculation | consulting capsule networks
Capsule networks by Hinton
conv2D
Keep first convolutional layer, but replace max pooling with …
c|c
(TM)
(TM)
19
calculation | consulting capsule networks
Capsule networks by Hinton
conv2D
Reshape conv2d into primary capsule vectors (red), and
replace max pooling with routing-by-agreement algo
c|c
(TM)
(TM)
20
calculation | consulting capsule networks
Capsule networks by Hinton
“Active capsules at one level (red) make predictions, via transformation matrices,
for the instantiation parameters of higher-level capsules (blue).
When multiple predictions agree, a higher level capsule (blue) becomes active”
conv2D
c|c
(TM)
(TM)
21
calculation | consulting capsule networks
Primary layer: Conv2D reshaped
keras implementation: https://github.com/XifengGuo/CapsNet-Keras
c|c
(TM)
(TM)
22
calculation | consulting capsule networks
Capsule networks: encodes poses
Capsules can represent objects w/ different poses (3D orientations)
Latest results (matrix capsules, below) improve best accuracy on SmallNORB by %45
c|c
(TM)
(TM)
23
calculation | consulting capsule networks
Capsules capture visual features
“A capsule is a group of neurons whose outputs represent different properties of the same entity.”
Capsules encode SIFT-like features
Perturbing an image causes specific capsules to activate
c|c
(TM)
(TM)
24
calculation | consulting capsule networks
Place-coding vs Rate-coding
Place-coding:
convNet w/out pooling
low level features for
small receptive fields
when a part moves, it may
gets a new capsule
position maps to active
capsules (u) in primary layer
Rate-coding:
traditional neurological way of coding (1926)
stimulus info encoded in rate of firing
(as opposed to magnitude, population, timing, …)
when a part rotates or moves,
the capsule values change
maps to real-values of capsule output vectors (v)
rates
encoded
in
vector
values
aside: are ReLUs a kind of rate coding ?
c|c
(TM)
(TM)
25
calculation | consulting capsule networks
Hierarchy of parts: coupled layers
A higher level entity is present if the lower / primary layer capsules
agree on their predictions for its pose.
c|c
(TM)
(TM)
26
calculation | consulting capsule networks
Routining algo: some pose prose
An effective way to implement the “explaining away”
that is needed for segmenting highly overlapping objects.
Like an Attention mechanism: The competition … is between the higher-level
capsules that a lower-level capsule might send its vote to.
stuff Hinton says…
A capsule is activated only if the transformed poses coming from the layer
below match each other. This is a more effective way to capture covariance
and leads to models with many fewer parameters that generalize better.
…a powerful segmentation principle that allows knowledge of familiar shapes to
drive segmentation, rather than just using low-level cues such as proximity or
agreement in color or velocity.
c|c
(TM)
(TM)
27
calculation | consulting capsule networks
Data-specific dynamic routes
squash
softmax
“c are determined by an iterative dynamic routing process”ij
weighted sum weighted mean prediction
c|c
(TM)
(TM)
28
calculation | consulting capsule networks
Capsule vs traditional neuron
https://github.com/naturomics/CapsNet-Tensorflow
c|c
(TM)
(TM)
29
calculation | consulting capsule networks
Capsule: affine transformation
Primary rectangle and triangle capsules (prediction vectors) routed to
boat and house capsules (parent layer), and then routes pruned
“CapsNet is moderately robust to small affine transformations of the training data”
c|c
(TM)
(TM)
30
calculation | consulting capsule networks
Capsule: squashing function
https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66
length of the capsule vector ~ probability entity represented by capsule
c|c
(TM)
(TM)
31
calculation | consulting capsule networks
Routing by agreement
Algo selects data-specific routes b by matching
primary outputs and squashed (secondary) outputs
ij
first paper uses vector overlap / cosine distance to find cluster centers: ok, but can not tell great from good
second paper (matrix capsules) uses a Free Energy cost function
c|c
(TM)
(TM)
32
calculation | consulting capsule networks
Routing algorithm
How can we implement in Backprop ?
fixed point equation
c|c
(TM)
(TM)
33
calculation | consulting capsule networks
Routing algo: EM fixed point equation
in forward pass of Backprop
(like an EM step)
must terminate to take dW
dot product ~ log likelihood (Energy*)
*Similar to fixed point equation for TAP Free Energy in the EMF RBM
**and in the later matrix capsule paper, a Free Energy is used explicitly
c|c
(TM)
(TM)
34
calculation | consulting capsule networks
Routing algo: fixed point unwound (3 steps)
Similar to a 3 layer FCN w/shared weights W
= 0
c|c
(TM)
(TM)
35
calculation | consulting capsule networks
Routing algorithm: keras Layers
https://keras.io/layers/writing-your-own-keras-layers/
c|c
(TM)
(TM)
36
calculation | consulting capsule networks
Routing algo: keras
c|c
(TM)
(TM)
37
calculation | consulting capsule networks
Routing algo: matrix capsules
cluster score = [ log p(x | mixture) - log p(x | uniform)]ii
cosine distance —> Free Energy cost:
EM to find mean, variance, and mixing proportion of Gaussians
“data-points that form a tight cluster from the perspective of one capsule
may be widely scattered from the perspective of another capsule”
p(x | mixture)
ih
c|c
(TM)
(TM)
38
calculation | consulting capsule networks
Matrix capsules: after 3 EM iterations
recent results from matrix capsule paper (more later)
c|c
(TM)
(TM)
39
calculation | consulting capsule networks
Capsule networks: architecture
+ unsupervised | reconstruction loss
supervised | multi-label max-norm loss each digit capsule ~ single digit
for MNIST data
|v| ~ Prob(digit)
image
size
c|c
(TM)
(TM)
40
calculation | consulting capsule networks
From max pool to max |vector|
mask selects (squashed) max vector (by length)
- does not throw away position information
- inputs vector into Fully Connected Net
- reconstructs the image from the vector
- similar to a variational auto-encoder
c|c
(TM)
(TM)
41
calculation | consulting capsule networks
From max pool to max |vector|
c|c
(TM)
(TM)
42
calculation | consulting capsule networks
Reconstruction error: a regularizer
Reconstruction: overlapping images
c|c
(TM)
(TM)
43
calculation | consulting capsule networks
individual (8, 6) reconstructed
after removing a specific capsule
and does not reconstruct absent (0, 1)
trained on overlapping
MNIST images
like (8,1) (6,7)
does have trouble with close images (like humans)
https://www.youtube.com/watch?v=gq-7HgzfDBM&t=62s
c|c
(TM)
(TM)
44
calculation | consulting capsule networks
Matrix capsules : Nov 2017
capsule vectors —> matrices
cosine distance —> Free Energy cost function (Gaussian mixtures)
+ convolutions between layers + lots more details … for another video
(TM)
c|c
(TM)
c | c
charles@calculationconsulting.com

More Related Content

What's hot

Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
sheetal katkar
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Concepts
mmjalbiaty
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
Jinwon Lee
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Capsule networks
Capsule networksCapsule networks
Capsule networks
Jaehyeon Park
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain
 
Inductive bias
Inductive biasInductive bias
Inductive bias
swapnac12
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Ashray Bhandare
 
AlexNet
AlexNetAlexNet
AlexNet
Bertil Hatt
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
Rajesh Piryani
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
Sungjoon Choi
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
威智 黃
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
Sivagowry Shathesh
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
Chode Amarnath
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
Kasun Chinthaka Piyarathna
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
Hoang Nguyen
 
Gradient Boosting
Gradient BoostingGradient Boosting
Gradient Boosting
Nghia Bui Van
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
swapnac12
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo
 

What's hot (20)

Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Concepts
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Capsule networks
Capsule networksCapsule networks
Capsule networks
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Inductive bias
Inductive biasInductive bias
Inductive bias
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
AlexNet
AlexNetAlexNet
AlexNet
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Gradient Boosting
Gradient BoostingGradient Boosting
Gradient Boosting
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 

Similar to Capsule Networks

Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Charles Martin
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Charles Martin
 
Stanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning WorksStanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning Works
Charles Martin
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Charles Martin
 
Cc stat phys draft
Cc stat phys draftCc stat phys draft
Cc stat phys draft
Charles Martin
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019
Charles Martin
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
Chao Han chaohan@vt.edu
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Charles Martin
 
Block coordinate descent__in_computer_vision
Block coordinate descent__in_computer_visionBlock coordinate descent__in_computer_vision
Block coordinate descent__in_computer_vision
YoussefKitane
 
Semet Gecco06
Semet Gecco06Semet Gecco06
Semet Gecco06ysemet
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022
Charles Martin
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
LPrashanthi
 
ENS Macrh 2022.pdf
ENS Macrh 2022.pdfENS Macrh 2022.pdf
ENS Macrh 2022.pdf
Charles Martin
 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
Fabio Petroni, PhD
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
Data Con LA
 
WeightWatcher LLM Update
WeightWatcher LLM UpdateWeightWatcher LLM Update
WeightWatcher LLM Update
Charles Martin
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
crowd counting.pptx
crowd counting.pptxcrowd counting.pptx
crowd counting.pptx
shubhampawar445982
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...butest
 

Similar to Capsule Networks (20)

Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
 
Stanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning WorksStanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning Works
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
 
Cc stat phys draft
Cc stat phys draftCc stat phys draft
Cc stat phys draft
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
 
Block coordinate descent__in_computer_vision
Block coordinate descent__in_computer_visionBlock coordinate descent__in_computer_vision
Block coordinate descent__in_computer_vision
 
Semet Gecco06
Semet Gecco06Semet Gecco06
Semet Gecco06
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
ENS Macrh 2022.pdf
ENS Macrh 2022.pdfENS Macrh 2022.pdf
ENS Macrh 2022.pdf
 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
WeightWatcher LLM Update
WeightWatcher LLM UpdateWeightWatcher LLM Update
WeightWatcher LLM Update
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
crowd counting.pptx
crowd counting.pptxcrowd counting.pptx
crowd counting.pptx
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
 

More from Charles Martin

Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
Charles Martin
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
Charles Martin
 
ICCF24.pdf
ICCF24.pdfICCF24.pdf
ICCF24.pdf
Charles Martin
 
Georgetown B-school Talk 2021
Georgetown B-school Talk  2021Georgetown B-school Talk  2021
Georgetown B-school Talk 2021
Charles Martin
 
Search relevance
Search relevanceSearch relevance
Search relevance
Charles Martin
 
WeightWatcher Introduction
WeightWatcher IntroductionWeightWatcher Introduction
WeightWatcher Introduction
Charles Martin
 
WeightWatcher Update: January 2021
WeightWatcher Update:  January 2021WeightWatcher Update:  January 2021
WeightWatcher Update: January 2021
Charles Martin
 
Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery
Charles Martin
 
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Charles Martin
 
AI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start UpAI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start Up
Charles Martin
 
Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107
Charles Martin
 
CC mmds talk 2106
CC mmds talk 2106CC mmds talk 2106
CC mmds talk 2106
Charles Martin
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin
 
Cc hass b school talk 2105
Cc hass b school talk  2105Cc hass b school talk  2105
Cc hass b school talk 2105
Charles Martin
 
CC Talk at Berekely
CC Talk at BerekelyCC Talk at Berekely
CC Talk at Berekely
Charles Martin
 

More from Charles Martin (15)

Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
 
ICCF24.pdf
ICCF24.pdfICCF24.pdf
ICCF24.pdf
 
Georgetown B-school Talk 2021
Georgetown B-school Talk  2021Georgetown B-school Talk  2021
Georgetown B-school Talk 2021
 
Search relevance
Search relevanceSearch relevance
Search relevance
 
WeightWatcher Introduction
WeightWatcher IntroductionWeightWatcher Introduction
WeightWatcher Introduction
 
WeightWatcher Update: January 2021
WeightWatcher Update:  January 2021WeightWatcher Update:  January 2021
WeightWatcher Update: January 2021
 
Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery
 
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
 
AI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start UpAI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start Up
 
Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107
 
CC mmds talk 2106
CC mmds talk 2106CC mmds talk 2106
CC mmds talk 2106
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Cc hass b school talk 2105
Cc hass b school talk  2105Cc hass b school talk  2105
Cc hass b school talk 2105
 
CC Talk at Berekely
CC Talk at BerekelyCC Talk at Berekely
CC Talk at Berekely
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Capsule Networks

  • 1. calculation | consulting capsule networks (TM) c|c (TM) charles@calculationconsulting.com
  • 3. c|c (TM) (TM) 3 calculation | consulting capsule networks Capsule networks by Hinton
  • 4. c|c (TM) (TM) 4 calculation | consulting capsule networks Capsule networks by Hinton
  • 5. c|c (TM) (TM) 5 calculation | consulting capsule networks Where ConvNets come from: LeNet 5 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.
  • 6. c|c (TM) (TM) 6 calculation | consulting capsule networks Convolutions usually w/ max pooling we get gross spatial invariance by ignoring exactly where a feature occurs “A vision system needs to use the same knowledge at all locations in the image” Hinton ConvNet: share weights + max pooling
  • 7. c|c (TM) (TM) 7 calculation | consulting capsule networks Hierarchical model of the visual system HMax model, Riesenhuber and Poggio (1999) dotted line selects max pooled features from lower layer
  • 8. c|c (TM) (TM) 8 calculation | consulting capsule networks Hierarchical model of the visual system Pooling proposed by Hubel andWiesel in1962 A. Receptive field (RF) of simple cell (green) formed by pooling over (center-surround) cells (yellow) in the same orientation row B. RF of complex cell (green) formed by pooling over over simple cells. here: (crude) translation invariance
  • 9. c|c (TM) (TM) 9 calculation | consulting capsule networks Hierarchical model of the visual system ConvNets resemble hierarchical models (but notice the hyper-column) HMax model, Riesenhuber and Poggio (1999)
  • 10. c|c (TM) (TM) 10 calculation | consulting capsule networks Hinton: why max pooling is bad ? (If) the brain embeds things in rectangular space, then Translation is easy; Rotation is hard Experiment: time for mind to process rotation ~ amount Conv Nets: Crude translation invariance No explicit pose (orientation) information Can not distinguish left from right (actually some people have stopped using pooling) A vision system needs to use the same knowledge at all locations in the image
  • 11. c|c (TM) (TM) 11 calculation | consulting capsule networks 2 streams hypothesis: what and where Ventral: what objects are Dorsal: where objects are in space How do we know ? Neurological disorders Simultanagnosia: can only see one object at a time idea dates back to 1968 lots of other evidence as well https://www.youtube.com/watch?v=mCoYOFzSS9A
  • 12. c|c (TM) (TM) 12 calculation | consulting capsule networks Cortical Microcolumns Capsules may encode orientation scale velocity color … Column through cortical layers of the brain 80-120 neurons (2X long inV1) share the same receptive field part of Hubel andWiesel, Nobel Prize 1981 also see recent review: https://www.sciencedirect.com/science/article/pii/S0166223615001484
  • 13. c|c (TM) (TM) 13 calculation | consulting capsule networks Canonical object based frames of reference: Hinton 1981 Hinton has been thinking about this a long time A kind of inverse computer graphics
  • 14. c|c (TM) (TM) 14 calculation | consulting capsule networks Capsule networks: inverse computer graphics computer graphics: rendering engine capsule network: inverse graphics matrix of pose information Hinton proposes that our brain does a kind-of inverse computer graphics transformation.
  • 15. c|c (TM) (TM) 15 calculation | consulting capsule networks Invariance vs Equivariance Max pooling provides spatial Invariance, but Hinton argues we need spatial Equivariance. so use vectors and Affine transformations Invariance: similar results if image is shifted or rotated Equivariance: invariance under a Symmetry Transformations (S,A,…) Group homomorphism: f(g*x)=g*f(x)=f(x)*g-1 Geometric: i.e. triangle centers invariant under Similarity (S) centroid invariant under Affine (A) Statistics: mean: invariant under change of units median: more generally invariant; a better statistic
  • 16. c|c (TM) (TM) 16 calculation | consulting capsule networks Segmenting highly overlapping objects Explaining away: Even if two hidden causes are independent, they can become dependent when we observe an effect that they can both influence. Hinton
  • 17. c|c (TM) (TM) 17 calculation | consulting capsule networks Capsule networks: architecture + unsupervised | reconstruction loss supervised | max norm loss Hinton et. al. Dynamic Routing Between Capsules (2017)
  • 18. c|c (TM) (TM) 18 calculation | consulting capsule networks Capsule networks by Hinton conv2D Keep first convolutional layer, but replace max pooling with …
  • 19. c|c (TM) (TM) 19 calculation | consulting capsule networks Capsule networks by Hinton conv2D Reshape conv2d into primary capsule vectors (red), and replace max pooling with routing-by-agreement algo
  • 20. c|c (TM) (TM) 20 calculation | consulting capsule networks Capsule networks by Hinton “Active capsules at one level (red) make predictions, via transformation matrices, for the instantiation parameters of higher-level capsules (blue). When multiple predictions agree, a higher level capsule (blue) becomes active” conv2D
  • 21. c|c (TM) (TM) 21 calculation | consulting capsule networks Primary layer: Conv2D reshaped keras implementation: https://github.com/XifengGuo/CapsNet-Keras
  • 22. c|c (TM) (TM) 22 calculation | consulting capsule networks Capsule networks: encodes poses Capsules can represent objects w/ different poses (3D orientations) Latest results (matrix capsules, below) improve best accuracy on SmallNORB by %45
  • 23. c|c (TM) (TM) 23 calculation | consulting capsule networks Capsules capture visual features “A capsule is a group of neurons whose outputs represent different properties of the same entity.” Capsules encode SIFT-like features Perturbing an image causes specific capsules to activate
  • 24. c|c (TM) (TM) 24 calculation | consulting capsule networks Place-coding vs Rate-coding Place-coding: convNet w/out pooling low level features for small receptive fields when a part moves, it may gets a new capsule position maps to active capsules (u) in primary layer Rate-coding: traditional neurological way of coding (1926) stimulus info encoded in rate of firing (as opposed to magnitude, population, timing, …) when a part rotates or moves, the capsule values change maps to real-values of capsule output vectors (v) rates encoded in vector values aside: are ReLUs a kind of rate coding ?
  • 25. c|c (TM) (TM) 25 calculation | consulting capsule networks Hierarchy of parts: coupled layers A higher level entity is present if the lower / primary layer capsules agree on their predictions for its pose.
  • 26. c|c (TM) (TM) 26 calculation | consulting capsule networks Routining algo: some pose prose An effective way to implement the “explaining away” that is needed for segmenting highly overlapping objects. Like an Attention mechanism: The competition … is between the higher-level capsules that a lower-level capsule might send its vote to. stuff Hinton says… A capsule is activated only if the transformed poses coming from the layer below match each other. This is a more effective way to capture covariance and leads to models with many fewer parameters that generalize better. …a powerful segmentation principle that allows knowledge of familiar shapes to drive segmentation, rather than just using low-level cues such as proximity or agreement in color or velocity.
  • 27. c|c (TM) (TM) 27 calculation | consulting capsule networks Data-specific dynamic routes squash softmax “c are determined by an iterative dynamic routing process”ij weighted sum weighted mean prediction
  • 28. c|c (TM) (TM) 28 calculation | consulting capsule networks Capsule vs traditional neuron https://github.com/naturomics/CapsNet-Tensorflow
  • 29. c|c (TM) (TM) 29 calculation | consulting capsule networks Capsule: affine transformation Primary rectangle and triangle capsules (prediction vectors) routed to boat and house capsules (parent layer), and then routes pruned “CapsNet is moderately robust to small affine transformations of the training data”
  • 30. c|c (TM) (TM) 30 calculation | consulting capsule networks Capsule: squashing function https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-ii-how-capsules-work-153b6ade9f66 length of the capsule vector ~ probability entity represented by capsule
  • 31. c|c (TM) (TM) 31 calculation | consulting capsule networks Routing by agreement Algo selects data-specific routes b by matching primary outputs and squashed (secondary) outputs ij first paper uses vector overlap / cosine distance to find cluster centers: ok, but can not tell great from good second paper (matrix capsules) uses a Free Energy cost function
  • 32. c|c (TM) (TM) 32 calculation | consulting capsule networks Routing algorithm How can we implement in Backprop ? fixed point equation
  • 33. c|c (TM) (TM) 33 calculation | consulting capsule networks Routing algo: EM fixed point equation in forward pass of Backprop (like an EM step) must terminate to take dW dot product ~ log likelihood (Energy*) *Similar to fixed point equation for TAP Free Energy in the EMF RBM **and in the later matrix capsule paper, a Free Energy is used explicitly
  • 34. c|c (TM) (TM) 34 calculation | consulting capsule networks Routing algo: fixed point unwound (3 steps) Similar to a 3 layer FCN w/shared weights W = 0
  • 35. c|c (TM) (TM) 35 calculation | consulting capsule networks Routing algorithm: keras Layers https://keras.io/layers/writing-your-own-keras-layers/
  • 36. c|c (TM) (TM) 36 calculation | consulting capsule networks Routing algo: keras
  • 37. c|c (TM) (TM) 37 calculation | consulting capsule networks Routing algo: matrix capsules cluster score = [ log p(x | mixture) - log p(x | uniform)]ii cosine distance —> Free Energy cost: EM to find mean, variance, and mixing proportion of Gaussians “data-points that form a tight cluster from the perspective of one capsule may be widely scattered from the perspective of another capsule” p(x | mixture) ih
  • 38. c|c (TM) (TM) 38 calculation | consulting capsule networks Matrix capsules: after 3 EM iterations recent results from matrix capsule paper (more later)
  • 39. c|c (TM) (TM) 39 calculation | consulting capsule networks Capsule networks: architecture + unsupervised | reconstruction loss supervised | multi-label max-norm loss each digit capsule ~ single digit for MNIST data |v| ~ Prob(digit) image size
  • 40. c|c (TM) (TM) 40 calculation | consulting capsule networks From max pool to max |vector| mask selects (squashed) max vector (by length) - does not throw away position information - inputs vector into Fully Connected Net - reconstructs the image from the vector - similar to a variational auto-encoder
  • 41. c|c (TM) (TM) 41 calculation | consulting capsule networks From max pool to max |vector|
  • 42. c|c (TM) (TM) 42 calculation | consulting capsule networks Reconstruction error: a regularizer
  • 43. Reconstruction: overlapping images c|c (TM) (TM) 43 calculation | consulting capsule networks individual (8, 6) reconstructed after removing a specific capsule and does not reconstruct absent (0, 1) trained on overlapping MNIST images like (8,1) (6,7) does have trouble with close images (like humans) https://www.youtube.com/watch?v=gq-7HgzfDBM&t=62s
  • 44. c|c (TM) (TM) 44 calculation | consulting capsule networks Matrix capsules : Nov 2017 capsule vectors —> matrices cosine distance —> Free Energy cost function (Gaussian mixtures) + convolutions between layers + lots more details … for another video