Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Capsule Networks (CapsNets)

5,400 views

Published on

CapsNets are a hot new architecture for neural networks, invented by Geoffrey Hinton, one of the godfathers of deep learning.
You can view this presentation on YouTube at: https://youtu.be/pPN8d0E3900

NIPS 2017 Paper:
* Dynamic Routing Between Capsules,
* by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
* https://arxiv.org/abs/1710.09829

The 2011 paper:
* Transforming Autoencoders
* by Geoffrey E. Hinton, Alex Krizhevsky and Sida D. Wang
* https://goo.gl/ARSWM6

CapsNet implementations:
* Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet-Keras
* TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow
* PyTorch: https://github.com/gram-ai/capsule-networks

Book:
Hands-On Machine with Scikit-Learn and TensorFlow
O'Reilly, 2017
Amazon: https://goo.gl/IoWYKD

Github: https://github.com/ageron
Twitter: https://twitter.com/aureliengeron

Published in: Data & Analytics
  • In slide 6, only two capsules are fired (are longer and have proper orientation) and the rest are correctly shown as smaller arrows (i.e. not fired/inactive); but they have several different orientations. I do not understand the interpretation behind the orientation of the inactive arrows. The question is that if they have not detected any shape, how come they possess some orientation (i.e. other features of the shape). If my assumption is correct, I would draw all of them as small pale arrows with the same orientation (e.g. upward); otherwise I would appreciate if you could elaborate more on the orientation of the inactive arrows in the picture.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thank you very much!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • In slide 46 (at 15:47 in the video), in the margin loss equation, the max should be squared, but not the norm: L_k = T_k max(0, m+ − ||v_k||)² + λ (1 − T_k) max(0, ||v_k|| − m−)². Therefore, on slide 47 (at 16:08 in the video), the network should output a vector whose length (not squared length) is longer than 0.9 for digits that are present, or smaller than 0.1 for digits that are absent.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Introduction to Capsule Networks (CapsNets)

  1. 1. Capsule Networks Aurélien Géron, November 2017 https://youtu.be/pPN8d0E3900
  2. 2. Aurélien Géron, 2017 NIPS 2017 Paper Dynamic Routing Between Capsules by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton October 2017: https://arxiv.org/abs/1710.09829
  3. 3. Aurélien Géron, 2017 Computer Graphics Rectangle x=20 y=30 angle=16° Triangle x=24 y=25 angle=-65° Instantiation parameters ImageRendering
  4. 4. Aurélien Géron, 2017 Inverse Graphics Instantiation parameters ImageInverse rendering Rectangle x=20 y=30 angle=16° Triangle x=24 y=25 angle=-65°
  5. 5. Aurélien Géron, 2017 Capsules Capsule activations ImageInverse rendering = =
  6. 6. Aurélien Géron, 2017 Activation vector: Capsules Length = estimated probability of presence Orientation = object’s estimated pose parameters = =
  7. 7. Aurélien Géron, 2017 Squash(u) = Capsules = = Convolutional Layers + Reshape + Squash ||u||2 1 + ||u||2 u ||u||
  8. 8. Aurélien Géron, 2017 Equivariance = =
  9. 9. Aurélien Géron, 2017 Equivariance = =
  10. 10. Aurélien Géron, 2017 A hierarchy of parts Boat x=22 y=28 angle=16°
  11. 11. Aurélien Géron, 2017 A hierarchy of parts Rectangle x=20 y=30 angle=16° Triangle x=24 y=25 angle=-65° Boat x=22 y=28 angle=16°
  12. 12. Aurélien Géron, 2017 A hierarchy of parts Rectangle x=20 y=30 angle=-5° Triangle x=26 y=31 angle=137° House x=22 y=28 angle=-5°
  13. 13. Aurélien Géron, 2017 Primary Capsules = = Primary Capsules
  14. 14. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  15. 15. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  16. 16. Aurélien Géron, 2017 Predict Next Layer’s Output = = One transformation matrix Wi,j per part/whole pair (i, j). ûj|i = Wi,j ui Primary Capsules
  17. 17. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  18. 18. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  19. 19. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules
  20. 20. Aurélien Géron, 2017 Routing by Agreement = = Predicted Outputs Primary Capsules Strong agreement!
  21. 21. Aurélien Géron, 2017 The rectangle and triangle capsules should be routed to the boat capsules. Routing by Agreement = = Predicted Outputs Primary Capsules Strong agreement!
  22. 22. Aurélien Géron, 2017 Clusters of Agreement
  23. 23. Aurélien Géron, 2017 Clusters of Agreement Mean
  24. 24. Aurélien Géron, 2017 Clusters of Agreement Mean
  25. 25. Aurélien Géron, 2017 Clusters of Agreement Mean
  26. 26. Aurélien Géron, 2017 Clusters of Agreement Mean
  27. 27. Aurélien Géron, 2017 Clusters of Agreement Mean
  28. 28. Aurélien Géron, 2017 Routing Weights = = Predicted Outputs Primary Capsules bi,j=0 for all i, j
  29. 29. Aurélien Géron, 2017 Routing Weights = = Predicted Outputs Primary Capsules 0.5 0.5 0.5 0.5 bi,j=0 for all i, j ci = softmax(bi)
  30. 30. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs sj = weighted sum Primary Capsules 0.5 0.5 0.5 0.5
  31. 31. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.5 0.5 0.5 0.5 sj = weighted sum vj = squash(sj)
  32. 32. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.5 0.5 0.5 0.5 sj = weighted sum vj = squash(sj)
  33. 33. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Agreement
  34. 34. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Agreement bi,j += ûj|i . vj
  35. 35. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Agreement bi,j += ûj|i . vj Large
  36. 36. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Disagreement bi,j += ûj|i . vj Small
  37. 37. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.2 0.1 0.8 0.9
  38. 38. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs sj = weighted sum Primary Capsules 0.2 0.1 0.8 0.9
  39. 39. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules sj = weighted sum vj = squash(sj)0.2 0.1 0.8 0.9
  40. 40. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #2) Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.2 0.1 0.8 0.9
  41. 41. Aurélien Géron, 2017 Handling Crowded Scenes = = = =
  42. 42. Aurélien Géron, 2017 Handling Crowded Scenes = = = = Is this an upside down house?
  43. 43. Aurélien Géron, 2017 Handling Crowded Scenes = = = = House Thanks to routing by agreement, the ambiguity is quickly resolved (explaining away). Boat
  44. 44. Aurélien Géron, 2017 Classification CapsNet || ℓ2 || Estimated Class Probability
  45. 45. Aurélien Géron, 2017 Training || ℓ2 || Estimated Class Probability To allow multiple classes, minimize margin loss: Lk = Tk max(0, m+ - ||vk||2) + λ (1 - Tk) max(0, ||vk||2 - m-) Tk = 1 iff class k is present In the paper: m- = 0.1 m+ = 0.9 λ = 0.5
  46. 46. Aurélien Géron, 2017 Training Translated to English: “If an object of class k is present, then ||vk||2 should be no less than 0.9. If not, then ||vk||2 should be no more than 0.1.” || ℓ2 || Estimated Class Probability To allow multiple classes, minimize margin loss: Lk = Tk max(0, m+ - ||vk||2) + λ (1 - Tk) max(0, ||vk||2 - m-) Tk = 1 iff class k is present In the paper: m- = 0.1 m+ = 0.9 λ = 0.5
  47. 47. Aurélien Géron, 2017 Regularization by Reconstruction || ℓ2 || Feedforward Neural Network Decoder Reconstruction
  48. 48. Aurélien Géron, 2017 Regularization by Reconstruction || ℓ2 || Feedforward Neural Network Decoder Reconstruction Loss = margin loss + α reconstruction loss The reconstruction loss is the squared difference between the reconstructed image and the input image. In the paper, α = 0.0005.
  49. 49. Aurélien Géron, 2017 A CapsNet for MNIST (Figure 1 from the paper)
  50. 50. Aurélien Géron, 2017 A CapsNet for MNIST – Decoder (Figure 2 from the paper)
  51. 51. Aurélien Géron, 2017 Interpretable Activation Vectors (Figure 4 from the paper)
  52. 52. Aurélien Géron, 2017 Pros ● Reaches high accuracy on MNIST, and promising on CIFAR10 ● Requires less training data ● Position and pose information are preserved (equivariance) ● This is promising for image segmentation and object detection ● Routing by agreement is great for overlapping objects (explaining away) ● Capsule activations nicely map the hierarchy of parts ● Offers robustness to affine transformations ● Activation vectors are easier to interpret (rotation, thickness, skew…) ● It’s Hinton! ;-)
  53. 53. Aurélien Géron, 2017 ● Not state of the art on CIFAR10 (but it’s a good start) ● Not tested yet on larger images (e.g., ImageNet): will it work well? ● Slow training, due to the inner loop (in the routing by agreement algorithm) ● A CapsNet cannot see two very close identical objects ○ This is called “crowding”, and it has been observed as well in human vision Cons
  54. 54. Aurélien Géron, 2017 Implementations ● Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet- Keras ● TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow ● PyTorch: https://github.com/gram-ai/capsule-networks
  55. 55. Amazon: https://goo.gl/IoWYKD Twitter: @aureliengeron github.com/ageron

×