The MNIST dataset is a classic benchmark dataset in the field of machine learning and computer vision. It consists of 28x28 grayscale images of handwritten digits (0-9) along with their corresponding labels. The goal of this project is to build and train a deep learning model that can accurately classify these handwritten digits.
2. Introduction
➢ Convolutional neural networks
(CNNs) have been tremendously
successful in computer vision, e.g.
image recognition and object
detection
➢ But convolutions are not able to
express non-linear behaviour, they
can do so using an activation
function but even though it can only
provide pointwise non-linearity.
Hence, the paper used kervolution
which uses the kernel trick to solve
this.
3. Recent Approaches to the Problem
A minimal character based CNN architecture based model:
https://arxiv.org/ftp/arxiv/papers/1901/1901.06032.pdf
https://www.analyticsvidhya.com/blog/2020/10/what-is-the-
convolutional-neural-network-architecture/
4. Our Implementation to the Problem
● We used Kervolutional layers to deploy our model using PyTorch.
● When Kernel type is linear, it’s a usual CNN, but in our implementation we
changed our Kernel types across Polynomial and Gaussian to introduce non-
linearity which in turn, gave better performance.
6. Baseline Model: Kervolution
● The ith element of the
convolution output f(x) is
calculated as a simple inner
product between vector x(i) and
vector w.
● Whereas the kervolution is
calculated via the kernel trick
which essentially maps the
vector in a non linear space
and then takes the inner
product
Convolution
Kervolution
7. ● Kernel function takes kervolution to non-linear space, thus
the model capacity is increased without introducing extra
parameters.
● Kervolution measures the similarity by match kernels, which
are equivalent to extracting specific features.
● One of the advantages of kervolution is that the non-linear
properties can be customized without explicit calculation.
Models Capacity and features
8. Polynomial Kervolution
● To show the behavior of polynomial Kervolution, the learned filters of
LeNet-5 trained for MNIST are visualized i which contains all six channels
of the first Kervolutional layer using polynomial kernel (dp = 3, cp = 1)
9. Continued..
● For a comparison, the learned filters from CNN are also presented. It is
interesting that some of the learned filters of KNN and CNN are quite
similar, This verifies our understanding of polynomial kernel, which is a
combination of linear and higher order terms.
● This also indicates that polynomial kervolution introduces higher order
feature interaction in a more flexible and direct way than the existing
methods.
10. Gaussian Kervolution
The Gaussian RBF kernel extends kervolution to infinite dimensions.
where γg (γg ∈ R+) is a hyperparameter to control the smoothness of
decision boundary.
15. Conclusions & Future Work
● Kervolution generalise convolution to non-linear space.
● Extends convolutional neural networks to kervolutional Neural
network.
● Not only retains the advantages of convolution( sharing weights and
equivalence to translation) but also enhances model capacity and
captures higher order interactions of features, via patch-wise kernel
functions without introducing additional parameters.
16. Future Work: Continued...
● With careful kernel chosen, the performance of CNN can be
significantly improved on MNIST, CIFAR, and ImageNet dataset
via replacing convolutional layers by kervolutional layers.
● Due to the large number of choices of kervolution, we cannot
perform a brute force search for all the possibilities.
● We expect the introduction of kervolutional layers in more
architectures and extensive hyperparameter searches can further
improve the performance.
17. Individual Contribution & Code
Sahasra Ranjan
(190050102)
Worked on the Kervolution Neural Networks and implemented the
training procedure on GPU using pytorch.
Paarth Jain (190050076) Worked on the training procedure and generated results for
various hyperparameters and network settings
Atul Verma (19B090004) Prepared presentation and project report
Tirthankar Adhikari
(190070003)
Debugging the implemented code and preparing presentation
Shrey Gupta (190100112)
18. Github Repository Link for Final code, Readme Files and Results:
GitHub Repo: https://github.com/Lhisoka/GNR-638-Project
Project PPT: https://docs.google.com/presentation/d/1-
VgwYgyPi4UW1CoTHDgVi7EISm5AbeZPVu62bCwqDsg/edit?usp=sharing
Note: All of our code is based on the following documentation:
https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Kervolutiona
l_Neural_Networks_CVPR_2019_paper.pdf
19. Given the recent rapid development in this field, there
are a lot more remaining to be explored