Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[PR12] Capsule Networks - Jaejun Yoo

1,327 views

Published on

Introduction to CapsNet (or Capsule Network)
video: https://youtu.be/_YT_8CT2w_Q
Paper: Dynamic Routing Between Capsules

Published in: Science
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[PR12] Capsule Networks - Jaejun Yoo

  1. 1. Capsule Networks PR12와 함께 이해하는 Jaejun Yoo Ph.D. Candidate @KAIST PR12 17th Dec, 2017
  2. 2. Today’s contents Dynamic Routing Between Capsules by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton Oct. 2017: https://arxiv.org/abs/1710.09829 NIPS 2017 Paper
  3. 3. Convolutional Neural Networks What is the problem with CNNs? Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc 1) If the images have rotation, tilt or any other different orientation then CNNs have poor performance. 2) In CNN each layer understands an image at a much more granular level (slow increase in receptive field). DATA AUGMENTATION, MAX POOLING
  4. 4. Convolutional Neural Networks What is the problem with CNNs? Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc “Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”
  5. 5. Convolutional Neural Networks What is the problem with CNNs? Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc “Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.” This was never the intention of pooling layer!
  6. 6. Convolutional Neural Networks What we need : EQUIVARIANCE (not invariance) Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc “Equivariance makes a CNN understand the rotation or proportion change and adapt itself accordingly so that the spatial positioning inside an image is not lost.”
  7. 7. Capsules “A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.” 8D capsule e.g. Hue, Position, Size, Orientation, deformation, texture, etc.
  8. 8. Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets 8D capsule e.g. Hue, Position, Size, Orientation, deformation, texture, etc. Capsules 8D vector Inverse Rendering
  9. 9. Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets 8D capsule e.g. Hue, Position, Size, Orientation, deformation, texture, etc. Capsules 8D vector
  10. 10. Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets 8D capsule e.g. Hue, Position, Size, Orientation, deformation, texture, etc. Capsules 8D vector
  11. 11. Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets 8D capsule e.g. Hue, Position, Size, Orientation, deformation, texture, etc. Capsules 8D vector Equivariance of Capsules
  12. 12. Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets 8D capsule e.g. Hue, Position, Size, Orientation, deformation, texture, etc. Capsules 8D vector Equivariance of Capsules
  13. 13. Contents from https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418 Routing by Agreements
  14. 14. Aurélien Géron, 2017 Primary Capsules = = Primary Capsules
  15. 15. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  16. 16. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  17. 17. Aurélien Géron, 2017 Predict Next Layer’s Output = = One transformation matrix Wi,j per part/whole pair (i, j). ûj|i = Wi,j ui Primary Capsules
  18. 18. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  19. 19. Aurélien Géron, 2017 Predict Next Layer’s Output = = Primary Capsules
  20. 20. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules
  21. 21. Aurélien Géron, 2017 Routing by Agreement = = Predicted Outputs Primary Capsules Strong agreement!
  22. 22. Aurélien Géron, 2017 The rectangle and triangle capsules should be routed to the boat capsules. Routing by Agreement = = Predicted Outputs Primary Capsules Strong agreement!
  23. 23. Aurélien Géron, 2017 Routing Weights = = Predicted Outputs Primary Capsules bi,j=0 for all i, j
  24. 24. Aurélien Géron, 2017 Routing Weights = = Predicted Outputs Primary Capsules 0.5 0.5 0.5 0.5 bi,j=0 for all i, j ci = softmax(bi)
  25. 25. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs sj = weighted sum Primary Capsules 0.5 0.5 0.5 0.5
  26. 26. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.5 0.5 0.5 0.5 sj = weighted sum vj = squash(sj)
  27. 27. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.5 0.5 0.5 0.5 sj = weighted sum vj = squash(sj)
  28. 28. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Agreement
  29. 29. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Agreement bi,j += ûj|i . vj
  30. 30. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Agreement bi,j += ûj|i . vj Large
  31. 31. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #1) Update Routing Weights = = Predicted Outputs Primary Capsules Disagreement bi,j += ûj|i . vj Small
  32. 32. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.2 0.1 0.8 0.9
  33. 33. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs sj = weighted sum Primary Capsules 0.2 0.1 0.8 0.9
  34. 34. Aurélien Géron, 2017 Compute Next Layer’s Output = = Predicted Outputs Primary Capsules sj = weighted sum vj = squash(sj)0.2 0.1 0.8 0.9
  35. 35. Aurélien Géron, 2017 Actual outputs of the next layer capsules (round #2) Compute Next Layer’s Output = = Predicted Outputs Primary Capsules 0.2 0.1 0.8 0.9
  36. 36. Aurélien Géron, 2017 Handling Crowded Scenes = = = =
  37. 37. Aurélien Géron, 2017 Handling Crowded Scenes = = = = Is this an upside down house?
  38. 38. Aurélien Géron, 2017 Handling Crowded Scenes = = = = House Thanks to routing by agreement, the ambiguity is quickly resolved (explaining away). Boat
  39. 39. Aurélien Géron, 2017 Classification CapsNet || ℓ2 || Estimated Class Probability
  40. 40. Aurélien Géron, 2017 Training || ℓ2 || Estimated Class Probability To allow multiple classes, minimize margin loss: Lk = Tk max(0, m+ - ||vk||2) + λ (1 - Tk) max(0, ||vk||2 - m-) Tk = 1 iff class k is present In the paper: m- = 0.1 m+ = 0.9 λ = 0.5
  41. 41. Aurélien Géron, 2017 Training Translated to English: “If an object of class k is present, then ||vk||2 should be no less than 0.9. If not, then ||vk||2 should be no more than 0.1.” || ℓ2 || Estimated Class Probability To allow multiple classes, minimize margin loss: Lk = Tk max(0, m+ - ||vk||2) + λ (1 - Tk) max(0, ||vk||2 - m-) Tk = 1 iff class k is present In the paper: m- = 0.1 m+ = 0.9 λ = 0.5
  42. 42. Aurélien Géron, 2017 Regularization by Reconstruction || ℓ2 || Feedforward Neural Network Decoder Reconstruction
  43. 43. Aurélien Géron, 2017 Regularization by Reconstruction || ℓ2 || Feedforward Neural Network Decoder Reconstruction Loss = margin loss + α reconstruction loss The reconstruction loss is the squared difference between the reconstructed image and the input image. In the paper, α = 0.0005.
  44. 44. Aurélien Géron, 2017 A CapsNet for MNIST (Figure 1 from the paper)
  45. 45. Aurélien Géron, 2017 A CapsNet for MNIST – Decoder (Figure 2 from the paper)
  46. 46. Aurélien Géron, 2017 Interpretable Activation Vectors (Figure 4 from the paper)
  47. 47. Aurélien Géron, 2017 Pros ● Reaches high accuracy on MNIST, and promising on CIFAR10 ● Requires less training data ● Position and pose information are preserved (equivariance) ● This is promising for image segmentation and object detection ● Routing by agreement is great for overlapping objects (explaining away) ● Capsule activations nicely map the hierarchy of parts ● Offers robustness to affine transformations ● Activation vectors are easier to interpret (rotation, thickness, skew…) ● It’s Hinton! ;-)
  48. 48. Aurélien Géron, 2017 ● Not state of the art on CIFAR10 (but it’s a good start) ● Not tested yet on larger images (e.g., ImageNet): will it work well? ● Slow training, due to the inner loop (in the routing by agreement algorithm) ● A CapsNet cannot see two very close identical objects ○ This is called “crowding”, and it has been observed as well in human vision Cons
  49. 49. Results What the individual dimensions of a capsule represent
  50. 50. Results MultiMNIST Segmenting Highly Overlapping Digits
  51. 51. Questions Remained Does capsules really work as the real neurons do? perceptual illusions Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.
  52. 52. • https://arxiv.org/abs/1710.09829 (paper) • https://jhui.github.io/2017/11/03/Dynamic-Routing-Between- Capsules/ • https://hackernoon.com/what-is-a-capsnet-or-capsule-network- 2bfbe48769cc • https://medium.com/ai%C2%B3-theory-practice- business/understanding-hintons-capsule-networks-part-i-intuition- b4b559d1159b • https://www.youtube.com/watch?v=pPN8d0E3900 (video) • https://www.slideshare.net/aureliengeron/introduction-to-capsule- networks-capsnets (video slides) References

×