Deep Learning Convolutional neural network architectures
The PPT describes more about the CNN architectures. It well describe about the transfer learning, Google LeNet, AlexNet and many more architectures their advantages disadvantages.
Transfer Learning
• Transferlearning is a machine learning technique where a model
trained on one task is repurposed as the foundation for a second task.
• This approach is beneficial when the second task is related to the first
or when data for the second task is limited.
3.
Working of TransferLearning
• Transfer learning involves a structured process to use existing knowledge from a pre-trained
model for new tasks:
• Pre-trained Model: Start with a model already trained on a large dataset for a specific task.
This pre-trained model has learned general features and patterns that are relevant across
related tasks.
• Base Model: This pre-trained model, known as the base model, includes layers that have
processed data to learn hierarchical representations, capturing low-level to complex features.
• Transfer Layers: Identify layers within the base model that hold generic information
applicable to both the original and new tasks. These layers often near the top of the network
capture broad, reusable features.
• Fine-tuning: Fine-tune these selected layers with data from the new task. This process helps
retain the pre-trained knowledge while adjusting parameters to meet the specific
requirements of the new task, improving accuracy and adaptability.
• The LeNetarchitecture consists of several layers that progressively extract and condense
information from input images. Here, is it the description of each layer of the LeNet
architecture:
• Input Layer: Accepts 32x32 pixel images, often zero-padded if original images are smaller.
• First Convolutional Layer (C1): Consists of six 5x5 filters, producing six feature maps of 28x28
each.
• First Pooling Layer (S2): Applies 2x2 average pooling, reducing feature maps' size to 14x14.
• Second Convolutional Layer (C3): Uses sixteen 5x5 filters, but with sparse connections,
outputting sixteen 10x10 feature maps.
• Second Pooling Layer (S4): Further reduces feature maps to 5x5 using 2x2 average pooling.
• Fully Connected Layers:
• First Fully Connected Layer (C5): Fully connected with 120 nodes.
• Second Fully Connected Layer (F6): Comprises 84 nodes.
• Output Layer: Softmax or Gaussian activation that outputs probabilities across 10 classes
(digits 0-9).
8.
Applications of LeNet
•Handwritten Character Recognition: Beyond recognizing digits, LeNet has been adapted to
recognize a broad range of handwritten characters, including alphabets from various languages.
This adaptation has been crucial for applications such as automated form processing and
handwriting-based authentication systems.
• Object Recognition in Images: The principles of LeNet have been extended to more complex
object recognition tasks. Modified versions of LeNet are used in systems that need to recognize
objects in photos and videos, such as identifying products in a retail setting or vehicles in traffic
management systems.
• Document Classification: LeNet can be adapted for document classification by recognizing and
learning from the textual and layout features of different document types. This application is
particularly useful in digital document management systems where automatic categorization of
documents based on their content and layout can significantly enhance searchability and
retrieval.
• Medical Image Analysis: Adaptations of LeNet have been applied in the field of medical image
analysis, such as identifying abnormalities in radiographic images, segmenting biological features
in microscopic images, and diagnosing diseases from patterns in medical imagery. These
applications demonstrate the potential of convolutional neural networks in supporting diagnostic
processes and enhancing the accuracy of medical evaluations.
9.
AlexNet
• AlexNet isa deep learning model that made a big impact in image
recognition. It became famous for its ability to classify images accurately. It
won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012
with a top-5 error rate of 15.3% (beating the runner up which had a top-5
error rate of 26.2%).
• Most important features of the AlexNet are:
• Overfitting Prevention: Dropout (0.5) was applied to the first two fully
connected layers and data augmentation dynamically expanded the dataset
hence both helping in reducing overfitting.
• Faster Training: ReLU activation was used instead of tanh or sigmoid, leading
to a 6× speedup in training by avoiding activation saturation.
10.
AlexNet Architecture
• Itsarchitecture includes:
• 5 convolutional layers with Max-Pooling applied after the 1st, 2nd
and 5th layers to enhance feature extraction.
• Overlapping Max-Pooling uses a 3×3 filter with stride 2 which
improved performance by reducing top-1 error by 0.4% and top-5
error by 0.3% compared to non-overlapping pooling.
• Followed by 2 fully connected layers each using dropout to prevent
overfitting.
• Ends with a softmax layer for final classification.
11.
Comparison with LeNet
•Comparison with LeNet:
• - LeNet: 5 layers, grayscale images (28x28)
• - AlexNet: 8 layers, color images (227x227)
• - LeNet used tanh/sigmoid; AlexNet used ReLU.
• - AlexNet trained on GPUs, LeNet on CPUs.
• - AlexNet introduced dropout and data augmentation for better
generalization.
12.
Key Contributions:
• -Demonstrated scalability of deep CNNs.
• - Introduced ReLU activation in large-scale networks.
• - Used GPU computation for faster training.
• - Employed dropout to reduce overfitting.
• - Inspired deeper architectures (VGG, ResNet, Inception).
• ZFNet, whichwas designed by Zeiler and Fergus (2013), won
the ILSVRC competition in 2013 by achieving a 11.2% top-5
error rate.
The architecture was a slight modification to AlexNet
•Instead of using 11×11 Filters in the first Convolutional layer,
ZFNet used a 7×7 Filter with S=2.
•A smaller filter size helps to pick image features at a finer level
of resolution.
•ZFNet increased the number of Activation Maps in the 3rd, 4th
and 5th Convolutional layers
from (385,384,256) to (512,1024,512), which increased the
number of features that the network is capable of detecting.
15.
VGG Net
• TheVisual Geometry Group (VGG) models, particularly VGG-16 and
VGG-19, have significantly influenced the field of computer vision
since their inception.
• These models, introduced by the Visual Geometry Group from the
University of Oxford, stood out in the 2014 ImageNet Large Scale
Visual Recognition Challenge (ILSVRC) for their deep convolutional
neural networks (CNNs) with a uniform architecture.
• VGG-19, the deeper variant of the VGG models, has garnered
considerable attention due to its simplicity and effectiveness.
16.
VGG-19 Architecture
• VGG-19is a deep convolutional neural network with 19 weight layers, comprising 16
convolutional layers and 3 fully connected layers. The architecture follows a straightforward
and repetitive pattern, making it easier to understand and implement.
• The key components of the VGG-19 architecture are:
• Convolutional Layers: 3x3 filters with a stride of 1 and padding of 1 to preserve spatial
resolution.
• Activation Function: ReLU (Rectified Linear Unit) applied after each convolutional layer to
introduce non-linearity.
• Pooling Layers: Max pooling with a 2x2 filter and a stride of 2 to reduce the spatial dimensions.
• Fully Connected Layers: Three fully connected layers at the end of the network for
classification.
• Softmax Layer: Final layer for outputting class probabilities.
18.
Google LeNet
• GoogLeNet(Inception V1) is a deep convolutional neural network
architecture designed for efficient image classification.
• It introduces the Inception module, which performs multiple
convolution operations (1x1, 3x3, 5x5) in parallel, along with max
pooling and concatenates their outputs.
• The architecture is deep, yet optimized for speed and performance,
which makes it suitable for large-scale visual recognition tasks.
• It brought forward innovative architectural choices such as 1×1
convolutions, global average pooling and the Inception module, all
aimed at improving depth and computational efficiency.
19.
Architecture
• GoogLeNet isa 22-layer deep network (excluding pooling layers) that
emphasizes computational efficiency, making it feasible to run even
on hardware with limited resources. Below is Layer by Layer
architectural details of GoogLeNet.
21.
Key Features ofGoogLeNet
• 1. 1×1 Convolutions
• One of the core techniques employed in GoogLeNet is the use of 1×1
convolutions, primarily for dimensionality reduction. These layers help
decrease the number of trainable parameters while enabling deeper
and more efficient architectures.
• Example Comparison:
• Without 1×1 Convolution:
• (14×14×48)×(5×5×480)=112.9M operations
• With 1×1Convolution:
(14×14×16)×(1×1×480)+(14×14×48)×(5×5×16)=5.3M operations
22.
• 2. GlobalAverage Pooling
• In traditional architectures like AlexNet, fully connected layers at the
end introduce a large number of parameters. GoogLeNet replaces
these with Global Average Pooling, which computes the average of
each feature map (e.g. converting 7×7 maps to 1×1), this significantly
reduces the model’s parameter count and solves overfitting.
• Benefits:
• Zero additional trainable parameters
• Reduces overfitting
• Improves top-1 accuracy by approximately 0.6%
23.
Inception Module
• TheInception module is the architectural core of GoogLeNet. It processes the
input using multiple types of operations in parallel, including 1×1, 3×3, 5×5
convolutions and 3×3 max pooling. The outputs from all paths are concatenated
depth-wise.
• Purpose: Enables the network to capture features at multiple scales effectively.
• Advantage: Improves representational power without dramatically increasing
computation.
24.
• 4. AuxiliaryClassifiers
• To address the vanishing gradient problem during training, GoogLeNet
introduces auxiliary classifiers(intermediate branches that act as
smaller classifiers). These are active only during training and help
regularize the network.
• Structure of Each Auxiliary Classifier:
• Average pooling layer (5×5, stride 3)
• 1×1 convolution (128 filters, ReLU)
• Fully connected layer (1024 units, ReLU)
• Dropout layer (dropout rate = 0.7)
• Fully connected softmax layer (1000 classes)
• The auxiliary losses are added to the main loss with a weight of 0.3 to
stabilize training.