SlideShare a Scribd company logo
1 of 21
Tech Talk
on
CAPSULE NETWORKS
January, 2018
What is the current State of the Art in Image
Classification & Object Recognition?
● Image classification is a central problem in machine learning
● We would not have been successful if we simply used a raw multi-layer
perceptron connected to each pixel of an image.
● On top of becoming quickly intractable, this direct operation is not very efficient
as pixels are spatially correlated. ---So we initially need to extract features that
are: ---meaningful and ---low-dimensional
● And that's where convolutional neural networks come in the game!
● Convolutional Networks are the state of the art algorithm
● Basic idea is, show an algorithm labeled images i.e photos of dogs labeled "dog"
eventually it will start to abstract features that are more likely to indicate the
presence of an actual dog
● We can use this model to classify new, unlabeled images.
● First, an input image is fed to the network.
● Filters of a given size scan the image and perform convolutions.
● The obtained features then go through an activation function. Then, the output
goes through a succession of pooling and other convolution operations.
● Features are reduced in dimension as the network goes on.
● At the end, high-level features are flattened and fed to fully connected layers,
which will eventually yield class probabilities through a softmax layer.
● During training time, the network learns how to recognize the features that make
a sample belong to a given class through backpropagation.
● ConvNets appear as a way to construct features that we would have had to
handcraft ourselves otherwise.
For a CNN, a mere presence of these objects can be a very strong indicator to
consider that there is a face in the image.
Why Convolutional Networks are Doomed
1. Sub sampling looses the precise spatial relationships between higher level
parts such as nose and a mouth. The precise spatial relationships are need for
identity recognition.
2. They cannot extrapolate their understanding of geomatrical relationships to
radically new viewpoints
EQUIVARIANCE Vs. INVARIANCE
● Sub-sampling tries to make the neural activities invariant for small changes in
view point
● Its better to aim for equivariance: Changes in viewpoint leads to corresponding
changes in neural activities.
What is the right representation of the images?
● Computer graphics deals with constructing a visual image from some internal
hierarchical representation of geometric data.
● Note that the structure of this representation needs to take into account relative
positions of objects.
● That internal representation is stored in computer’s memory as arrays of
geometrical objects and matrices that represent relative positions and orientation
of these objects.
● Then, special software takes that representation and converts it into an image on
the screen. It is called rendering.
● Hinton argues that brains, in fact, do the opposite of rendering. He calls it
inverse graphics: from visual information received by eyes, they deconstruct a
hierarchical representation of the world around us.
● He argues that in order to correctly do classification and object recognition, it is
important to preserve hierarchical pose relationships between object parts. This
is the key intuition that will allow us to understand why capsule theory is so
important. It incorporates relative relationships between objects and it is
represented numerically as a 4D pose matrix.
In simple words, a Capsule network is a neural network that tries to perform Inverse
Graphics
What is a Capsule?
● A capsule is any function that tries to predict the presence and the instantiation
parameters of a particular object at a given location.
● Max pooling loses valuable information and also does not encode relative spatial
relationships between features.
● We should use capsules instead, because they will encapsulate all important
information about the state of the features they are detecting in a form of a vector
(as opposed to a scalar that a neuron outputs).
● Capsules encode probability of detection of a feature as the length of their output
vector. And the state of the detected feature is encoded as the direction in which
that vector points to (“instantiation parameters”).
● So when detected feature moves around the image or its state somehow
changes, the probability still stays the same (length of vector does not change),
but its orientation changes(thus achieving Equivariance).
● In the diagram below, the network contains 50 capsules, the arrows represent the
output vectors of these capsules.
● The black arrows correspons to capsules that tries to find the rectangles, while
the blue arrows represent the output of the capsules looking to find triangles.
● The length of estimated vectors represent the estimated probability, while the
orientation represents object’s estimated pose paratmeters.
● The vector will rotate in its space, representing the changing state of the detected
object, but its length will remain fxed, because the capsule is still sure it has
detected a face.
● This is what Hinton refers to as activities of equivariance: neuronal activities will
change when an object “moves over the manifold of possible appearances” in the
picture. At the same time, the probabilities of detection remain constant, which is
the form of invariance that we should aim at, and not the type offered by CNNs
with max pooling.
A heirarchy of parts
Primary Capsules
Ex: Capsule that detected Rectangle. During training Capsule learns transformation
matrix for each pair of capsules.
Ex: Capsule that detected Triangle
Routing by Agreement
● Since the outputs of both the capsules agree with boat orientation, therefore it is
totaly safe to assume that both triangle and rectangle are part of a boat.
● Thus the output of these capsules should be routed to the boat capsule. This
helps in reducing both the training time and noise in the final output. This is
called routing by agreement.
CapsNet Architecture for MNIST
PROS
● Reaches high accuracy on MNIST, and promising on CIFAR10
● Requires less training data
● Position and pose information are preserved (equivariance)
● This is promising for image segmentation and object detection
● Routing by agreement is great for overlapping objects
● Capsule activations nicely map the hierarchy of parts
● Offers robustness to affine transformations
● Activation vectors are easier to interpret (rotation, thickness, skew...)
● It’s Hinton! ;-)
CONS
● Not state of the art on CIFAR10 (but it’s a good start)
● Not tested yet on larger images (e.g., ImageNet): will it work well?
● Slow training, due to the inner loop (in the routing by agreement algorithm)
● A CapsNet cannot see two very close identical objects. This is called “crowding”,
and it has been observed as well in human vision
Implementations
● Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet-Keras
● TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow
● PyTorch: https://github.com/gram-ai/capsule-networks

More Related Content

What's hot

[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of FunctionsJaeJun Yoo
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GANJaeJun Yoo
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsJaeJun Yoo
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...sipij
 
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...adeij1
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Matching Network
Matching NetworkMatching Network
Matching NetworkSuwhanBaek
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolutionPrudhvi Raj
 
proposal_pura
proposal_puraproposal_pura
proposal_puraErick Lin
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)
[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)
[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)Deep Learning JP
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...MLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningMLAI2
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkItachi SK
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniquesJaeJun Yoo
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...IOSR Journals
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 

What's hot (20)

[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GAN
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
 
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Matching Network
Matching NetworkMatching Network
Matching Network
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)
[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)
[DL輪読会]Learning Visible Connectivity Dynamics for Cloth Smoothing (CoRL2021)
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
Pixel Recursive Super Resolution. Google Brain
 Pixel Recursive Super Resolution.  Google Brain Pixel Recursive Super Resolution.  Google Brain
Pixel Recursive Super Resolution. Google Brain
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 

Similar to Tech talk

Dynamic Routing Between Capsules.pdf
Dynamic Routing Between Capsules.pdfDynamic Routing Between Capsules.pdf
Dynamic Routing Between Capsules.pdfAbeerPareek1
 
Capsule Network Performance with Autonomous Navigation
Capsule Network Performance with Autonomous Navigation Capsule Network Performance with Autonomous Navigation
Capsule Network Performance with Autonomous Navigation gerogepatton
 
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATIONCAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATIONijaia
 
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATIONCAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATIONgerogepatton
 
Unit II & III_uncovered topics.doc notes
Unit II & III_uncovered topics.doc notesUnit II & III_uncovered topics.doc notes
Unit II & III_uncovered topics.doc notessmithashetty24
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Trackingijsrd.com
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networksFares Hasan
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsSangmin Woo
 
INPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGE
INPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGEINPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGE
INPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGEpaperpublications3
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfMargiShah29
 
Facial Expression Recognition via Python
Facial Expression Recognition via PythonFacial Expression Recognition via Python
Facial Expression Recognition via PythonSaurav Gupta
 
Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network Santanu Paul
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
CariGANs : Unpaired Photo-to-Caricature Translation
CariGANs : Unpaired Photo-to-Caricature TranslationCariGANs : Unpaired Photo-to-Caricature Translation
CariGANs : Unpaired Photo-to-Caricature TranslationRazorthink
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionvivatechijri
 

Similar to Tech talk (20)

Dynamic Routing Between Capsules.pdf
Dynamic Routing Between Capsules.pdfDynamic Routing Between Capsules.pdf
Dynamic Routing Between Capsules.pdf
 
Capsule Network Performance with Autonomous Navigation
Capsule Network Performance with Autonomous Navigation Capsule Network Performance with Autonomous Navigation
Capsule Network Performance with Autonomous Navigation
 
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATIONCAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
 
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATIONCAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION
 
Rupesh-ibPRIA final
Rupesh-ibPRIA finalRupesh-ibPRIA final
Rupesh-ibPRIA final
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Unit II & III_uncovered topics.doc notes
Unit II & III_uncovered topics.doc notesUnit II & III_uncovered topics.doc notes
Unit II & III_uncovered topics.doc notes
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
 
Image captioning
Image captioningImage captioning
Image captioning
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networks
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
INPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGE
INPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGEINPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGE
INPAINTING FOR LAZY RANDOM WALKS SEGMENTED IMAGE
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
Facial Expression Recognition
Facial Expression RecognitionFacial Expression Recognition
Facial Expression Recognition
 
Facial Expression Recognition via Python
Facial Expression Recognition via PythonFacial Expression Recognition via Python
Facial Expression Recognition via Python
 
Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
CariGANs : Unpaired Photo-to-Caricature Translation
CariGANs : Unpaired Photo-to-Caricature TranslationCariGANs : Unpaired Photo-to-Caricature Translation
CariGANs : Unpaired Photo-to-Caricature Translation
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Tech talk

  • 2. What is the current State of the Art in Image Classification & Object Recognition? ● Image classification is a central problem in machine learning ● We would not have been successful if we simply used a raw multi-layer perceptron connected to each pixel of an image. ● On top of becoming quickly intractable, this direct operation is not very efficient as pixels are spatially correlated. ---So we initially need to extract features that are: ---meaningful and ---low-dimensional ● And that's where convolutional neural networks come in the game! ● Convolutional Networks are the state of the art algorithm ● Basic idea is, show an algorithm labeled images i.e photos of dogs labeled "dog" eventually it will start to abstract features that are more likely to indicate the presence of an actual dog ● We can use this model to classify new, unlabeled images.
  • 3. ● First, an input image is fed to the network. ● Filters of a given size scan the image and perform convolutions. ● The obtained features then go through an activation function. Then, the output goes through a succession of pooling and other convolution operations. ● Features are reduced in dimension as the network goes on. ● At the end, high-level features are flattened and fed to fully connected layers, which will eventually yield class probabilities through a softmax layer. ● During training time, the network learns how to recognize the features that make a sample belong to a given class through backpropagation. ● ConvNets appear as a way to construct features that we would have had to handcraft ourselves otherwise.
  • 4. For a CNN, a mere presence of these objects can be a very strong indicator to consider that there is a face in the image.
  • 5. Why Convolutional Networks are Doomed 1. Sub sampling looses the precise spatial relationships between higher level parts such as nose and a mouth. The precise spatial relationships are need for identity recognition.
  • 6. 2. They cannot extrapolate their understanding of geomatrical relationships to radically new viewpoints
  • 7. EQUIVARIANCE Vs. INVARIANCE ● Sub-sampling tries to make the neural activities invariant for small changes in view point ● Its better to aim for equivariance: Changes in viewpoint leads to corresponding changes in neural activities.
  • 8. What is the right representation of the images? ● Computer graphics deals with constructing a visual image from some internal hierarchical representation of geometric data. ● Note that the structure of this representation needs to take into account relative positions of objects. ● That internal representation is stored in computer’s memory as arrays of geometrical objects and matrices that represent relative positions and orientation of these objects. ● Then, special software takes that representation and converts it into an image on the screen. It is called rendering.
  • 9. ● Hinton argues that brains, in fact, do the opposite of rendering. He calls it inverse graphics: from visual information received by eyes, they deconstruct a hierarchical representation of the world around us. ● He argues that in order to correctly do classification and object recognition, it is important to preserve hierarchical pose relationships between object parts. This is the key intuition that will allow us to understand why capsule theory is so important. It incorporates relative relationships between objects and it is represented numerically as a 4D pose matrix.
  • 10. In simple words, a Capsule network is a neural network that tries to perform Inverse Graphics
  • 11. What is a Capsule? ● A capsule is any function that tries to predict the presence and the instantiation parameters of a particular object at a given location. ● Max pooling loses valuable information and also does not encode relative spatial relationships between features. ● We should use capsules instead, because they will encapsulate all important information about the state of the features they are detecting in a form of a vector (as opposed to a scalar that a neuron outputs). ● Capsules encode probability of detection of a feature as the length of their output vector. And the state of the detected feature is encoded as the direction in which that vector points to (“instantiation parameters”). ● So when detected feature moves around the image or its state somehow changes, the probability still stays the same (length of vector does not change), but its orientation changes(thus achieving Equivariance).
  • 12. ● In the diagram below, the network contains 50 capsules, the arrows represent the output vectors of these capsules. ● The black arrows correspons to capsules that tries to find the rectangles, while the blue arrows represent the output of the capsules looking to find triangles. ● The length of estimated vectors represent the estimated probability, while the orientation represents object’s estimated pose paratmeters. ● The vector will rotate in its space, representing the changing state of the detected object, but its length will remain fxed, because the capsule is still sure it has detected a face. ● This is what Hinton refers to as activities of equivariance: neuronal activities will change when an object “moves over the manifold of possible appearances” in the picture. At the same time, the probabilities of detection remain constant, which is the form of invariance that we should aim at, and not the type offered by CNNs with max pooling.
  • 13. A heirarchy of parts
  • 14. Primary Capsules Ex: Capsule that detected Rectangle. During training Capsule learns transformation matrix for each pair of capsules. Ex: Capsule that detected Triangle
  • 15. Routing by Agreement ● Since the outputs of both the capsules agree with boat orientation, therefore it is totaly safe to assume that both triangle and rectangle are part of a boat. ● Thus the output of these capsules should be routed to the boat capsule. This helps in reducing both the training time and noise in the final output. This is called routing by agreement.
  • 16.
  • 17.
  • 19. PROS ● Reaches high accuracy on MNIST, and promising on CIFAR10 ● Requires less training data ● Position and pose information are preserved (equivariance) ● This is promising for image segmentation and object detection ● Routing by agreement is great for overlapping objects ● Capsule activations nicely map the hierarchy of parts ● Offers robustness to affine transformations ● Activation vectors are easier to interpret (rotation, thickness, skew...) ● It’s Hinton! ;-)
  • 20. CONS ● Not state of the art on CIFAR10 (but it’s a good start) ● Not tested yet on larger images (e.g., ImageNet): will it work well? ● Slow training, due to the inner loop (in the routing by agreement algorithm) ● A CapsNet cannot see two very close identical objects. This is called “crowding”, and it has been observed as well in human vision
  • 21. Implementations ● Keras w/ TensorFlow backend: https://github.com/XifengGuo/CapsNet-Keras ● TensorFlow: https://github.com/naturomics/CapsNet-Tensorflow ● PyTorch: https://github.com/gram-ai/capsule-networks