MACHINE
LEARNING –
CONVOLUTIONAL
NEURAL NETWORK
Basic Structure of CNN
• Input Layer: Accepts input images as pixel
data.
• Convolutional Layer: Applies filters to
extract features.
• ReLU Layer: Introduces non-linearity to
the network.
• Pooling Layer: Reduces spatial dimensions
of feature maps.
• Fully Connected Layer: Final layer for
classification.
Convolutional Layer
• Filters/Kernels:
Detect specific
features in input
images.
• Stride: Controls
the movement of
filters across the
input.
• Padding: Adds
pixels around the
input to maintain
dimensions.
• Output:
Produces feature
maps indicating
detected features.
Padding in CNN
• Zero Padding: Adds zeros around
the input image to preserve
dimensions.
• Valid Padding: No padding,
reduces the size of output feature
maps.
• Role: Helps preserve edge
information during convolution.
Pooling Layer
• • Purpose: Reduces dimensionality
and computation in the network.
• • Max Pooling: Selects the
maximum value from each pooling
region.
• • Average Pooling: Takes the
average value from each pooling
region.
• • Impact: Retains important
features while reducing overfitting.
Basic Mathematics of CNN (B&W Image)
• • Convolution: Applies a filter
matrix across the image to detect
features.
• • Example: Sliding a 3x3 filter over
a grayscale image, producing a
feature map.
• • ReLU: Applies non-linearity after
convolution.
• • Pooling: Reduces the size of the
resulting feature map.
Basic Mathematics of CNN (Colored Image)
• • Convolution: Applies the same filter across
each RGB channel.
• • Result: Produces a combined feature map
from all channels.
• • Example: Sliding a filter across an RGB
image and summing up feature maps.
• • Pooling: Reduces the size of the resulting
feature map while preserving important
information.
Fully Connected Layer
• • Purpose: Flattens the output and connects to a fully connected
layer.
• • Function: Combines features for final classification.
• • Uses: Softmax or sigmoid activation functions for output.
LeNet-5 Architecture
• • Designed for handwritten digit
recognition (MNIST dataset).
• • Structure: 2 convolutional layers,
2 subsampling layers, 2 fully
connected layers.
• • Key Feature: Simple and efficient,
early CNN model.
AlexNet Architecture
• • Winner of the ImageNet
competition in 2012.
• • Structure: 5 convolutional layers, 3
fully connected layers.
• • Features: Uses ReLU, dropout, and
data augmentation.
• • Impact: Revolutionized deep
learning and computer vision.
VGG-16 Architecture
• • Uses 16 layers (13 convolutional, 3
fully connected).
• • Features: Smaller filters (3x3) with
deeper networks.
• • Strength: Achieves high accuracy
with a simple structure.
ResNet Architecture
• • Introduces Residual Learning to
combat vanishing gradients.
• • Structure: Skip connections or
shortcuts between layers.
• • Impact: Allows very deep networks
(e.g., ResNet-50, ResNet-101).
Inception (GoogLeNet) Architecture
• • Introduces Inception modules: parallel
convolutional filters.
• • Structure: Multiple filter sizes (1x1,
3x3, 5x5) in parallel.
• • Impact: Efficient and scalable for large-
scale image recognition.
Transfer Learning
• • Concept: Uses a pre-trained model on a new but related task.
• • Benefits: Speeds up training, requires less data, and improves
performance.
• • Example: Using a pre-trained model like ResNet for a new image
classification task.
Object Localization
• • Purpose: Identifies the location of objects within an image.
• • Methods: Bounding box regression, Region Proposal Networks
(RPNs).
• • Applications: Object detection, image segmentation.
Landmark Detection
• • Definition: Detects specific key
points or landmarks within an
image.
• • Applications: Facial recognition,
medical imaging (e.g., key
anatomical points).
• • Methods: CNNs used to detect
and regress the position of
landmarks.
Conclusion
• • CNNs have revolutionized computer vision tasks.
• • Architectures like LeNet, AlexNet, VGG, ResNet, and Inception paved
the way for modern image processing.
• • Transfer learning, object localization, and landmark detection
expand the versatility of CNNs.

Introduction to Convolutional Neural Networks (CNNs).pptx

  • 1.
  • 2.
    Basic Structure ofCNN • Input Layer: Accepts input images as pixel data. • Convolutional Layer: Applies filters to extract features. • ReLU Layer: Introduces non-linearity to the network. • Pooling Layer: Reduces spatial dimensions of feature maps. • Fully Connected Layer: Final layer for classification.
  • 3.
    Convolutional Layer • Filters/Kernels: Detectspecific features in input images. • Stride: Controls the movement of filters across the input. • Padding: Adds pixels around the input to maintain dimensions. • Output: Produces feature maps indicating detected features.
  • 4.
    Padding in CNN •Zero Padding: Adds zeros around the input image to preserve dimensions. • Valid Padding: No padding, reduces the size of output feature maps. • Role: Helps preserve edge information during convolution.
  • 5.
    Pooling Layer • •Purpose: Reduces dimensionality and computation in the network. • • Max Pooling: Selects the maximum value from each pooling region. • • Average Pooling: Takes the average value from each pooling region. • • Impact: Retains important features while reducing overfitting.
  • 6.
    Basic Mathematics ofCNN (B&W Image) • • Convolution: Applies a filter matrix across the image to detect features. • • Example: Sliding a 3x3 filter over a grayscale image, producing a feature map. • • ReLU: Applies non-linearity after convolution. • • Pooling: Reduces the size of the resulting feature map.
  • 7.
    Basic Mathematics ofCNN (Colored Image) • • Convolution: Applies the same filter across each RGB channel. • • Result: Produces a combined feature map from all channels. • • Example: Sliding a filter across an RGB image and summing up feature maps. • • Pooling: Reduces the size of the resulting feature map while preserving important information.
  • 8.
    Fully Connected Layer •• Purpose: Flattens the output and connects to a fully connected layer. • • Function: Combines features for final classification. • • Uses: Softmax or sigmoid activation functions for output.
  • 9.
    LeNet-5 Architecture • •Designed for handwritten digit recognition (MNIST dataset). • • Structure: 2 convolutional layers, 2 subsampling layers, 2 fully connected layers. • • Key Feature: Simple and efficient, early CNN model.
  • 10.
    AlexNet Architecture • •Winner of the ImageNet competition in 2012. • • Structure: 5 convolutional layers, 3 fully connected layers. • • Features: Uses ReLU, dropout, and data augmentation. • • Impact: Revolutionized deep learning and computer vision.
  • 11.
    VGG-16 Architecture • •Uses 16 layers (13 convolutional, 3 fully connected). • • Features: Smaller filters (3x3) with deeper networks. • • Strength: Achieves high accuracy with a simple structure.
  • 12.
    ResNet Architecture • •Introduces Residual Learning to combat vanishing gradients. • • Structure: Skip connections or shortcuts between layers. • • Impact: Allows very deep networks (e.g., ResNet-50, ResNet-101).
  • 13.
    Inception (GoogLeNet) Architecture •• Introduces Inception modules: parallel convolutional filters. • • Structure: Multiple filter sizes (1x1, 3x3, 5x5) in parallel. • • Impact: Efficient and scalable for large- scale image recognition.
  • 14.
    Transfer Learning • •Concept: Uses a pre-trained model on a new but related task. • • Benefits: Speeds up training, requires less data, and improves performance. • • Example: Using a pre-trained model like ResNet for a new image classification task.
  • 15.
    Object Localization • •Purpose: Identifies the location of objects within an image. • • Methods: Bounding box regression, Region Proposal Networks (RPNs). • • Applications: Object detection, image segmentation.
  • 16.
    Landmark Detection • •Definition: Detects specific key points or landmarks within an image. • • Applications: Facial recognition, medical imaging (e.g., key anatomical points). • • Methods: CNNs used to detect and regress the position of landmarks.
  • 17.
    Conclusion • • CNNshave revolutionized computer vision tasks. • • Architectures like LeNet, AlexNet, VGG, ResNet, and Inception paved the way for modern image processing. • • Transfer learning, object localization, and landmark detection expand the versatility of CNNs.