SlideShare a Scribd company logo
1 of 43
Download to read offline
Convolutional Neural
Networks
Anantharaman Palacode Narayana Iyer
narayana dot Anantharaman at gmail dot com
5 Aug 2017
References
β€œA dramatic moment in the meteoric rise of
deep learning came when a convolutional
network won this challenge for the first time
and by a wide margin, bringing down the
state-of-the-art top-5 error rate from 26.1% to
15.3% (Krizhevsky et al., 2012), meaning that
the convolutional network produces a ranked list
of possible categories for each image and the
correct category appeared in the first five entries
of this list for all but 15.3% of the test examples.
Since then, these competitions are consistently
won by deep convolutional nets, and as of this
writing, advances in deep learning have brought
the latest top-5 error rate in this contest down to
3.6%” – Ref: Deep Learning Book by Y Bengio
et al
What is a convolutional neural network?
Convolutional networks are simply
neural networks that use
convolution in place of general
matrix multiplication in at least
one of their layers.
β€’ Convolution is a mathematical
operation having a linear form
Types of inputs
β€’ Inputs have a structure
β€’ Color images are three dimensional and so have a volume
β€’ Time domain speech signals are 1-d while the frequency domain representations (e.g. MFCC
vectors) take a 2d form. They can also be looked at as a time sequence.
β€’ Medical images (such as CT/MR/etc) are multidimensional
β€’ Videos have the additional temporal dimension compared to stationary images
β€’ Speech signals can be modelled as 2 dimensional
β€’ Variable length sequences and time series data are again multidimensional
β€’ Hence it makes sense to model them as tensors instead of vectors.
β€’ The classifier then needs to accept a tensor as input and perform the necessary
machine learning task. In the case of an image, this tensor represents a volume.
CNNs are everywhere
β€’ Image retrieval
β€’ Detection
β€’ Self driving cars
β€’ Semantic segmentation
β€’ Face recognition (FB tagging)
β€’ Pose estimation
β€’ Detect diseases
β€’ Speech Recognition
β€’ Text processing
β€’ Analysing satellite data
Copyright 2016 JNResearch, All Rights Reserved
CNNs for applications that involve images
β€’ Why CNNs are more suitable to process images?
β€’ Pixels in an image correlate to each other. However, nearby pixels correlate
stronger and distant pixels don’t influence much
β€’ Local features are important: Local Receptive Fields
β€’ Affine transformations: The class of an image doesn’t change with translation. We
can build a feature detector that can look for a particular feature (e.g. an edge)
anywhere in the image plane by moving across. A convolutional layer may have
several such filters constituting the depth dimension of the layer.
Fully connected layers
β€’ Fully connected layers (such as the hidden layers of a traditional neural network)
are agnostic to the structure of the input
β€’ They take inputs as vectors and generate an output vector
β€’ There is no requirement to share parameters unless forced upon in specific architectures.
This blows up the number of parameters as the input and/or output dimensions increase.
β€’ Suppose we are to perform classification on an image of 100x100x3 dimensions.
β€’ If we implement using a feed forward neural network that has an input, hidden
and an output layer, where: hidden units (nh) = 1000, output classes = 10 :
β€’ Input layer = 10k pixels * 3 = 30k, weight matrix for hidden to input layer = 1k * 30k = 30 M
and output layer matrix size = 10 * 1000 = 10k
β€’ We may handle this is by extracting the features using pre processing and
presenting a lower dimensional input to the Neural Network. But this requires
expert engineered features and hence domain knowledge
Convolution
πΆπ‘œπ‘›π‘£π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› 𝑖𝑛 1 π·π‘–π‘šπ‘’π‘›π‘ π‘–π‘œπ‘›:
𝑦 𝑛 =
π‘˜=βˆ’βˆž
π‘˜=∞
π‘₯ π‘˜ β„Ž[𝑛 βˆ’ π‘˜]
πΆπ‘œπ‘›π‘£π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› 𝑖𝑛 2 π·π‘–π‘šπ‘’π‘›π‘ π‘–π‘œπ‘›π‘ :
𝑦 𝑛1, 𝑛2 =
π‘˜1=βˆ’βˆž
π‘˜1=βˆ’βˆž
π‘˜2=βˆ’βˆž
π‘˜2=∞
π‘₯ π‘˜1, π‘˜2 β„Ž[ 𝑛1 βˆ’ π‘˜1 , 𝑛2 βˆ’ π‘˜2 ]
CNNs
Types of layers in a CNN:
β€’ Convolution Layer
β€’ Pooling Layer
β€’ Fully Connected Layer
Convolution Layer
β€’ A layer in a regular neural
network take vector as input
and output a vector.
β€’ A convolution layer takes a
tensor (3d volume for RGB
images) as input and
generates a tensor as output
Fig Credit: Lex Fridman, MIT, 6.S094
Slide Credit: Lex Fridman, MIT, 6.S094
Local Receptive Fields
β€’ Filter (Kernel) is applied on the input image
like a moving window along width and height
β€’ The depth of a filter matches that of the input.
β€’ For each position of the filter, the dot product
of filter and the input are computed
(Activation)
β€’ The 2d arrangement of these activations is
called an activation map.
β€’ The number of such filters constitute the
depth of the convolution layer
Fig Credit: Lex Fridman, MIT, 6.S094
Convolution Operation between filter and image
β€’ The convolution layer
computes dot products
between the filter and a
piece of image as it slides
along the image
β€’ The step size of slide is
called stride
β€’ Without any padding, the
convolution process
decreases the spatial
dimensions of the output
Fig Credit: A Karpathy, CS231n
Activation Maps
β€’ Example:
β€’ Consider an image 32 x 32 x 3 and a 5 x 5 x 3 filter.
β€’ The convolution happens between a 5 x 5 x 3 chunk of the image with the filter: 𝑀 𝑇 π‘₯ + 𝑏
β€’ In this example we get 75 dimensional vector and a bias term
β€’ In this example, with a stride of 1, we get 28 x 28 x 1 activation for 1 filter without padding
β€’ If we have 6 filters, we would get 28 x 28 x 6 without padding
β€’ In the above example we have an activation map of 28 x 28 per filter.
β€’ Activation maps are feature inputs to the subsequent layer of the network
β€’ Without any padding, the 2D surface area of the activation map is smaller than
the input surface area for a stride of >= 1
Copyright 2016 JNResearch, All Rights Reserved
Stacking Convolution Layers
Fig Credit: A Karpathy, CS231n
Feature Representation as a hierarchy
Padding
β€’ The spatial (x, y) extent of the output produced by the convolutional layer is less
than the respective dimensions of the input (except for the special case of 1 x 1
filter with a stride 1).
β€’ As we add more layers and use larger strides, the output surface dimensions keep
reducing and this may impact the accuracy.
β€’ Often, we may want to preserve the spatial extent during the initial layers and
downsample them at a later time.
β€’ Padding the input with suitable values (padding with zero is common) helps to
preserve the spatial size
Zero Padding the border
Fig Credit: A Karpathy, CS231n
Hyperparameters of the convolution layer
β€’ Filter Size
β€’ # Filters
β€’ Stride
β€’ Padding
Fig Credit: A Karpathy, CS231n
Pooling Layer
β€’ Pooling is a downsampling
operation
β€’ The rationale is that the β€œmeaning”
embedded in a piece of image can
be captured using a small subset of
β€œimportant” pixels
β€’ Max pooling and average pooling
are the two most common
operations
β€’ Pooling layer doesn’t have any
trainable parameters
Fig Credit: A Karpathy, CS231n
Max Pooling Illustration
Popular Network Architectures
Current trend: Deeper Models
β€’ CNNs consistently outperform other
approaches for the core tasks of CV
β€’ Deeper models work better
β€’ Increasing the number of parameters in layers
of CNN without increasing their depth is not
effective at increasing test set performance.
β€’ Shallow models overfit at around 20 million
parameters while deep ones can benefit from
having over 60 million.
β€’ Key insight: Model performs better when it is
architected to reflect composition of simpler
functions than a single complex function. This
may also be explained off viewing the
computation as a chain of dependencies
VGG Net
VGG net
ResNet
Core Tasks of Computer Vision
Core CV Task Task Description Output Metrics
Classification Given an image, assign a label Class Label Accuracy
Localization Determine the bounding box containing
the object in the given image
Box given by (x1, y1,
x2, y2)
Ratio of intersection to
the union (Overlap)
between the ground truth
and bounding box
Object
Detection
Given an image, detect all the objects and
their locations in the image
For each object:
(Label, Box)
Mean Avg Best Overlap
(MABO,) mean Average
Precision (mAP)
Semantic
Segmentation
Given an image, assign each pixel to a
class label, so that we can look at the
image as a set of labelled segments
A set of image
segments
Classification metrics,
Intersection by Union
overlap
Instance
Segmentation
Same as semantic segmentation, but each
instance of a segment class is determined
uniquely
A set of image
segments
Object Localization
β€’ Given an image containing an object
of interest, determine the bounding
box for the object
β€’ Classify the object
Slide Credit: A Karpathy, CS231n
Slide Credit: A Karpathy, CS231n
Datasets for evaluation
β€’ Imagenet challenges provide a platform for
researchers to benchmark their novel
algorithms
β€’ PASCAL VOC 2010 is great for small scale
experiments. About 1.3 GB download size.
β€’ MS COCO datasets are available for tasks
like Image Captioning. Download size is
huge but selective download is possible.

More Related Content

What's hot

Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
Β 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
Β 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
Β 
Cnn method
Cnn methodCnn method
Cnn methodAmirSajedi1
Β 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
Β 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
Β 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
Β 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
Β 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning Asma-AH
Β 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
Β 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
Β 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkVignesh Suresh
Β 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
Β 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
Β 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
Β 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
Β 

What's hot (20)

Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
Β 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
Β 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
Β 
Cnn method
Cnn methodCnn method
Cnn method
Β 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
Β 
Cnn
CnnCnn
Cnn
Β 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
Β 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Β 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
Β 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
Β 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
Β 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
Β 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Β 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
Β 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
Β 
Cnn
CnnCnn
Cnn
Β 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
Β 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
Β 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
Β 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Β 

Similar to Overview of Convolutional Neural Networks

Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxJawadHaider36
Β 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1ananth
Β 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
Β 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
Β 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
Β 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
Β 
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...IRJET Journal
Β 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
Β 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000Ivonne Liu
Β 
Convolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep LearningConvolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep Learningalihassaah1994
Β 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
Β 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
Β 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docxjaffarbikat
Β 

Similar to Overview of Convolutional Neural Networks (20)

CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
Β 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
Β 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
Β 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
Β 
Mnist report
Mnist reportMnist report
Mnist report
Β 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
Β 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Β 
DL.pdf
DL.pdfDL.pdf
DL.pdf
Β 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
Β 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Β 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
Β 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
Β 
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Β 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
Β 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000
Β 
Convolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep LearningConvolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep Learning
Β 
CNN.pptx
CNN.pptxCNN.pptx
CNN.pptx
Β 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Β 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
Β 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docx
Β 

More from ananth

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsananth
Β 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
Β 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
Β 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
Β 
An Overview of NaΓ―ve Bayes Classifier
An Overview of NaΓ―ve Bayes Classifier An Overview of NaΓ―ve Bayes Classifier
An Overview of NaΓ―ve Bayes Classifier ananth
Β 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
Β 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligenceananth
Β 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligenceananth
Β 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
Β 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
Β 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processingananth
Β 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
Β 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
Β 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
Β 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overviewananth
Β 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
Β 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distanceananth
Β 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
Β 
L05 word representation
L05 word representationL05 word representation
L05 word representationananth
Β 
Natural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpNatural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpananth
Β 

More from ananth (20)

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
Β 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
Β 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
Β 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
Β 
An Overview of NaΓ―ve Bayes Classifier
An Overview of NaΓ―ve Bayes Classifier An Overview of NaΓ―ve Bayes Classifier
An Overview of NaΓ―ve Bayes Classifier
Β 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
Β 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
Β 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
Β 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
Β 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
Β 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
Β 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
Β 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
Β 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
Β 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
Β 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
Β 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
Β 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
Β 
L05 word representation
L05 word representationL05 word representation
L05 word representation
Β 
Natural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpNatural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlp
Β 

Recently uploaded

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
Β 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
Β 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
Β 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
Β 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
Β 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
Β 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
Β 
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”soniya singh
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
Β 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
Β 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
Β 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
Β 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
Β 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
Β 

Recently uploaded (20)

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Β 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Β 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
Β 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
Β 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
Β 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
Β 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
Β 
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
Β 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
Β 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Β 
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCRβ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
Β 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
Β 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Β 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Β 
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
Β 

Overview of Convolutional Neural Networks

  • 1. Convolutional Neural Networks Anantharaman Palacode Narayana Iyer narayana dot Anantharaman at gmail dot com 5 Aug 2017
  • 3. β€œA dramatic moment in the meteoric rise of deep learning came when a convolutional network won this challenge for the first time and by a wide margin, bringing down the state-of-the-art top-5 error rate from 26.1% to 15.3% (Krizhevsky et al., 2012), meaning that the convolutional network produces a ranked list of possible categories for each image and the correct category appeared in the first five entries of this list for all but 15.3% of the test examples. Since then, these competitions are consistently won by deep convolutional nets, and as of this writing, advances in deep learning have brought the latest top-5 error rate in this contest down to 3.6%” – Ref: Deep Learning Book by Y Bengio et al
  • 4. What is a convolutional neural network? Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. β€’ Convolution is a mathematical operation having a linear form
  • 5. Types of inputs β€’ Inputs have a structure β€’ Color images are three dimensional and so have a volume β€’ Time domain speech signals are 1-d while the frequency domain representations (e.g. MFCC vectors) take a 2d form. They can also be looked at as a time sequence. β€’ Medical images (such as CT/MR/etc) are multidimensional β€’ Videos have the additional temporal dimension compared to stationary images β€’ Speech signals can be modelled as 2 dimensional β€’ Variable length sequences and time series data are again multidimensional β€’ Hence it makes sense to model them as tensors instead of vectors. β€’ The classifier then needs to accept a tensor as input and perform the necessary machine learning task. In the case of an image, this tensor represents a volume.
  • 6. CNNs are everywhere β€’ Image retrieval β€’ Detection β€’ Self driving cars β€’ Semantic segmentation β€’ Face recognition (FB tagging) β€’ Pose estimation β€’ Detect diseases β€’ Speech Recognition β€’ Text processing β€’ Analysing satellite data Copyright 2016 JNResearch, All Rights Reserved
  • 7. CNNs for applications that involve images β€’ Why CNNs are more suitable to process images? β€’ Pixels in an image correlate to each other. However, nearby pixels correlate stronger and distant pixels don’t influence much β€’ Local features are important: Local Receptive Fields β€’ Affine transformations: The class of an image doesn’t change with translation. We can build a feature detector that can look for a particular feature (e.g. an edge) anywhere in the image plane by moving across. A convolutional layer may have several such filters constituting the depth dimension of the layer.
  • 8. Fully connected layers β€’ Fully connected layers (such as the hidden layers of a traditional neural network) are agnostic to the structure of the input β€’ They take inputs as vectors and generate an output vector β€’ There is no requirement to share parameters unless forced upon in specific architectures. This blows up the number of parameters as the input and/or output dimensions increase. β€’ Suppose we are to perform classification on an image of 100x100x3 dimensions. β€’ If we implement using a feed forward neural network that has an input, hidden and an output layer, where: hidden units (nh) = 1000, output classes = 10 : β€’ Input layer = 10k pixels * 3 = 30k, weight matrix for hidden to input layer = 1k * 30k = 30 M and output layer matrix size = 10 * 1000 = 10k β€’ We may handle this is by extracting the features using pre processing and presenting a lower dimensional input to the Neural Network. But this requires expert engineered features and hence domain knowledge
  • 9. Convolution πΆπ‘œπ‘›π‘£π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› 𝑖𝑛 1 π·π‘–π‘šπ‘’π‘›π‘ π‘–π‘œπ‘›: 𝑦 𝑛 = π‘˜=βˆ’βˆž π‘˜=∞ π‘₯ π‘˜ β„Ž[𝑛 βˆ’ π‘˜] πΆπ‘œπ‘›π‘£π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› 𝑖𝑛 2 π·π‘–π‘šπ‘’π‘›π‘ π‘–π‘œπ‘›π‘ : 𝑦 𝑛1, 𝑛2 = π‘˜1=βˆ’βˆž π‘˜1=βˆ’βˆž π‘˜2=βˆ’βˆž π‘˜2=∞ π‘₯ π‘˜1, π‘˜2 β„Ž[ 𝑛1 βˆ’ π‘˜1 , 𝑛2 βˆ’ π‘˜2 ]
  • 10.
  • 11. CNNs Types of layers in a CNN: β€’ Convolution Layer β€’ Pooling Layer β€’ Fully Connected Layer
  • 12. Convolution Layer β€’ A layer in a regular neural network take vector as input and output a vector. β€’ A convolution layer takes a tensor (3d volume for RGB images) as input and generates a tensor as output Fig Credit: Lex Fridman, MIT, 6.S094
  • 13. Slide Credit: Lex Fridman, MIT, 6.S094
  • 14. Local Receptive Fields β€’ Filter (Kernel) is applied on the input image like a moving window along width and height β€’ The depth of a filter matches that of the input. β€’ For each position of the filter, the dot product of filter and the input are computed (Activation) β€’ The 2d arrangement of these activations is called an activation map. β€’ The number of such filters constitute the depth of the convolution layer Fig Credit: Lex Fridman, MIT, 6.S094
  • 15. Convolution Operation between filter and image β€’ The convolution layer computes dot products between the filter and a piece of image as it slides along the image β€’ The step size of slide is called stride β€’ Without any padding, the convolution process decreases the spatial dimensions of the output Fig Credit: A Karpathy, CS231n
  • 16. Activation Maps β€’ Example: β€’ Consider an image 32 x 32 x 3 and a 5 x 5 x 3 filter. β€’ The convolution happens between a 5 x 5 x 3 chunk of the image with the filter: 𝑀 𝑇 π‘₯ + 𝑏 β€’ In this example we get 75 dimensional vector and a bias term β€’ In this example, with a stride of 1, we get 28 x 28 x 1 activation for 1 filter without padding β€’ If we have 6 filters, we would get 28 x 28 x 6 without padding β€’ In the above example we have an activation map of 28 x 28 per filter. β€’ Activation maps are feature inputs to the subsequent layer of the network β€’ Without any padding, the 2D surface area of the activation map is smaller than the input surface area for a stride of >= 1 Copyright 2016 JNResearch, All Rights Reserved
  • 17. Stacking Convolution Layers Fig Credit: A Karpathy, CS231n
  • 19. Padding β€’ The spatial (x, y) extent of the output produced by the convolutional layer is less than the respective dimensions of the input (except for the special case of 1 x 1 filter with a stride 1). β€’ As we add more layers and use larger strides, the output surface dimensions keep reducing and this may impact the accuracy. β€’ Often, we may want to preserve the spatial extent during the initial layers and downsample them at a later time. β€’ Padding the input with suitable values (padding with zero is common) helps to preserve the spatial size
  • 20. Zero Padding the border Fig Credit: A Karpathy, CS231n
  • 21. Hyperparameters of the convolution layer β€’ Filter Size β€’ # Filters β€’ Stride β€’ Padding Fig Credit: A Karpathy, CS231n
  • 22. Pooling Layer β€’ Pooling is a downsampling operation β€’ The rationale is that the β€œmeaning” embedded in a piece of image can be captured using a small subset of β€œimportant” pixels β€’ Max pooling and average pooling are the two most common operations β€’ Pooling layer doesn’t have any trainable parameters Fig Credit: A Karpathy, CS231n
  • 25. Current trend: Deeper Models β€’ CNNs consistently outperform other approaches for the core tasks of CV β€’ Deeper models work better β€’ Increasing the number of parameters in layers of CNN without increasing their depth is not effective at increasing test set performance. β€’ Shallow models overfit at around 20 million parameters while deep ones can benefit from having over 60 million. β€’ Key insight: Model performs better when it is architected to reflect composition of simpler functions than a single complex function. This may also be explained off viewing the computation as a chain of dependencies
  • 26.
  • 30. Core Tasks of Computer Vision Core CV Task Task Description Output Metrics Classification Given an image, assign a label Class Label Accuracy Localization Determine the bounding box containing the object in the given image Box given by (x1, y1, x2, y2) Ratio of intersection to the union (Overlap) between the ground truth and bounding box Object Detection Given an image, detect all the objects and their locations in the image For each object: (Label, Box) Mean Avg Best Overlap (MABO,) mean Average Precision (mAP) Semantic Segmentation Given an image, assign each pixel to a class label, so that we can look at the image as a set of labelled segments A set of image segments Classification metrics, Intersection by Union overlap Instance Segmentation Same as semantic segmentation, but each instance of a segment class is determined uniquely A set of image segments
  • 31. Object Localization β€’ Given an image containing an object of interest, determine the bounding box for the object β€’ Classify the object
  • 32. Slide Credit: A Karpathy, CS231n
  • 33.
  • 34. Slide Credit: A Karpathy, CS231n
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Datasets for evaluation β€’ Imagenet challenges provide a platform for researchers to benchmark their novel algorithms β€’ PASCAL VOC 2010 is great for small scale experiments. About 1.3 GB download size. β€’ MS COCO datasets are available for tasks like Image Captioning. Download size is huge but selective download is possible.