CNN Waste Classification Using Image Processing

Waste Classification System
using Convolutional
Neural Networks

Under the Guidance Of
Dr. Divya Kumar

Group Members
▶ Abhinav Dixit (20151012)
▶ Ishaan Rajput (20154086)
▶ Abhishek Sharma (20154077)
▶ Harshita Rastogi (20154041)
▶ John Prasad (20154010)

Objective
▶ To develop a system to effectively segregate the collected waste on the basis
of different categories using the concepts of artificial neural networks and
image processing.
▶ To create Convolution Neural Network(CNN) model which is a powerful and a
deep layered network that helps in the classiﬁcation process along with
tuning various parameters to get the highest accuracy.
▶ To help in eliminating the need of a middle-man for the treatment of waste.

Motivation
▶ By designing an autonomous system we can reduce the workforce required for
waste management and also perform the task efficiently.
▶ We can classify the waste into one of the three categories – recyclable,
compost and landfill(Non-Recyclable waste).
▶ Hazardous waste materials which cause harm to humans can easily be
disposed off.

Proposed Work
▶ To carry out the image classiﬁcation of waste materials using Convolutional Neural
Networks(CNN). Given an image of a random waste item, the system seeks to
categorize the waste item into one of the 5 mentioned categories.
▶ Categories for classiﬁcation:
▶ 1. Metal
▶ 2. Organic waste
▶ 3. Container
▶ 4. Paper
▶ 5. Plastic

Artificial Neural Networks
▶ An ANN is a computational model based on the structure and functions of
biological neural networks.
▶ Information that flows through the network affects the structure of the ANN
because a neural network changes - or learns, in a sense - based on that input
and output.
▶ There are weighted connections (correspond to synapses) between simulated
neurons where signals it receives (numbers) are summed.
▶ A signal is sent (fired) if a certain threshold is reached.

▶ An ANN is typically defined by three types of parameters:
▶ The interconnection pattern between different layers of neurons
▶ The learning process for updating the weights of the interconnections.
▶ The activation function that converts a neurons weighted input to its output
activation.

▶ The model learns by repeating the following steps -
▶ Feed Forward Algorithm - The input layer of neural network is fed with samples.
These sample values will be multiplied with the corresponding weights and added
up. The bias will be added to the resulting sum and this sum will be passed to the
activation function.
▶ After the Feedforward the output of the neural network is compared with the
target output. The difference between expected output of a neuron and actual
output of the same neuron gives the error of that neuron.
▶ Back Propagation Algorithm - Backpropagation algorithm is applied after the
feedforward algorithm in order to propagate the errors in other direction of feed
forward and adjust the weights to overcome that error.

Convolutional Neural Network (CNN)
▶ Why CNN?
▶ Image recognition is not an easy task to achieve. In Theory, we can use
conventional neural networks for analyzing images, but in practice, it will be
highly expensive from a computational perspective.
▶ For instance an image of more respectable size, e.g. 200x200x3, would lead to
neurons that have 200*200*3 = 120,000 weights.
▶ Convolutional Neural Networks take advantage of the fact that the input consists
of images and they constrain the architecture in a more sensible way. The layers of
a ConvNet have neurons arranged in 3 dimensions: width, height, depth.
▶ The neurons in a layer will only be connected to a small region of the layer before
it, instead of all of the neurons in a fully-connected manner.

CNN - Structure
▶ Any CNN comprises of following layers –
▶ Convolution Layer (With ReLU)
▶ Pooling Layer
▶ Flattening
▶ Followed by fully connected ANN

Convolution Layer
▶ The CONV layer’s parameters consist of a set of learnable filters.
▶ During the forward pass, we convolve each filter across the width and height
of the input volume and compute dot products between the entries of the
filter and the input at any position.
▶ As we slide the filter over the width and height of the input volume we will
produce a 2-dimensional activation map that gives the responses of that filter
at every spatial position.
▶ Intuitively, the network will learn filters that activate when they see some
type of visual feature such as an edge of some orientation, color patterns,
etc. on higher layers of the network.

ReLU Layer
▶ The Convolution Layer is followed by ReLU (Rectified Linear Unit) activation
function, defined as f(x) = Max(0, x).
▶ It is also known as ramp function.
▶ It is used to remove linearity in the output of the CONV Layer.

Pooling
▶ The Pooling Layer operates independently on every depth slice of the input
and resizes it spatially, using the MAX operation.
▶ Its function is to progressively reduce the spatial size of the representation to
reduce the amount of parameters and computation in the network, and hence
to also control overfitting.
▶ The most common form is a pooling layer with filters of size 2x2 applied with
a stride of 2 down-samples every depth slice in the input by 2 along both
width and height, discarding 75% of the activations. Every MAX operation
would in this case be taking a max over 4 numbers.

Flattening
▶ We need to convert the output of the convolutional part of the CNN into a 1D
feature vector, to be used by the ANN part of it. This is achieved by
Flattening.
▶ It gets the output of the convolutional layers, flattens all its structure to
create a single long feature vector to be used by the dense layer for the final
classification.

Fully Connected ANN
▶ Neurons in a fully connected layer have full connections to all activations in
the previous layer, as seen in Conventional Neural Networks.
▶ This is proceeded by further adding dense hidden layers and finally computing
the values of the Output Layer.

Dataset used
▶ The dataset was manually collected for every category from various search
engines like Bing, Google and Yahoo through a software called Extreme
Picture Finder.
▶ The dataset contains images that include the waste items in diﬀerent
illumination, background colors and various angles.
▶ The dataset consists of a total of 3,302 images belonging to one of the 5
classes.
▶ Care was taken to include images of objects as they would be when disposed
oﬀ like crushed bottles, crumpled paper, and so on to make the training more
robust to deal with real-life images.

Software tools used
▶ Python 3
▶ Tensor-ﬂow
▶ Keras
▶ Cuda (NVIDIA Computing Toolkit)
▶ Matplotlib
▶ SciPy

Description Of Model
▶ After doing parameter hyper-tuning, we got the following setting to produce
the best outcome.
▶ Pictures were rescaled to a size of 256 X 256 pixels.
▶ The train-test split ratio was taken to be around 12:1.
▶ Batch size was chosen to be 16.
▶ We used Categorical Cross-Entropy as loss function and used Adam Optimizer
to reduce it over 30 epochs (repetitions).

Description Of Model - Layers
▶ Layer 1 - CNN (followed by Batch Normalization and Max Pooling)
▶ Filters - 96, Kernel Size - 11 x 11, Strides - 4
▶ Activation Function - ReLU
▶ Layer 2 - CNN (followed by Batch Normalization and Max Pooling)
▶ Filters – 256, Kernel Size - 5 x 5
▶ Layer 3-4 - CNN
▶ Layer 5 - CNN (followed by Max Pooling)
▶ Layer 6 - Flattening

Description Of Model - Layers
▶ Layer 7 - Fully Connected Hidden Layer
▶ Neurons - 2048
▶ Layer 8 - Fully Connected Hidden Layer
▶ Neurons - 2048
▶ Activation Function – ReLU
▶ Layer 9 - Output Layer
▶ Neurons - 5
▶ Activation Function - Softmax

Model Diagram
Inputs 3@256x256
Feature Maps
96@64x64
Feature
Maps
256@64x64
Feature
Maps
384@64x64
Feature
Maps
256@64x64
Feature
Maps
384@64x64
Hidden
Layer 2048
Hidden
Layer 2048
Output
Layer 5
Convolution
11x11 Kernel,
Max Pooling 2x2
ReLU
Convolution
5x5 Kernel,
Max Pooling 2x2
ReLU
Convolution
3x3 Kernel
ReLU
Convolution
3x3 Kernel
ReLU
Convolution
3x3 Kernel,
Max Pooling 2x2
ReLU
Flatten Fully
Connected
Fully
Connected

Result Analysis and Conclusion
Image Size No of
Classes
Accuracy Validation
Accuracy
Loss Validation
Loss
256x256 5 0.9848 0.7456 0.0864 2.1231

Result Analysis and Conclusion
The model correctly classiﬁed 11/14 images.
It could not classify images 2,4 and 12 correctly.

Future Works
▶ Localization — In addition to Classification of Waste, we can use techniques such as
Bounding Box Localization, Instance Segmentation etc. to locate the classified
object in the image.
▶ Generalization — Rather than constricting our search among 5 classes we can use
pre-trained models (like VGG-16) to generalize this approach over all kinds of
waste. The model can be fine-tuned to fit our requirements and provide wider and
better results.
▶ Automation — Build a more sophisticated mechanical system to automatically
separate and segregate the cluster of wastes.
▶ Improving the accuracy rates of the system by adding more training and test
samples to the existing dataset.

CNN Waste Classification Using Image Processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CNN Waste Classification Using Image Processing

Similar to CNN Waste Classification Using Image Processing (20)

Recently uploaded

Recently uploaded (20)

CNN Waste Classification Using Image Processing