GNR638_Course Project for spring semester

GNR638: Course Project
Kervolutional Neural Networks
Nov 21, 2021
Sahasra Ranjan Paarth Jain Atul Verma
190050102 190050076 19B090004
Tirthankar Adhikari Shrey Gupta
190070003 190100112

Introduction
➢ Convolutional neural networks
(CNNs) have been tremendously
successful in computer vision, e.g.
image recognition and object
detection
➢ But convolutions are not able to
express non-linear behaviour, they
can do so using an activation
function but even though it can only
provide pointwise non-linearity.
Hence, the paper used kervolution
which uses the kernel trick to solve
this.

Recent Approaches to the Problem
A minimal character based CNN architecture based model:
https://arxiv.org/ftp/arxiv/papers/1901/1901.06032.pdf
https://www.analyticsvidhya.com/blog/2020/10/what-is-the-
convolutional-neural-network-architecture/

Our Implementation to the Problem
● We used Kervolutional layers to deploy our model using PyTorch.
● When Kernel type is linear, it’s a usual CNN, but in our implementation we
changed our Kernel types across Polynomial and Gaussian to introduce non-
linearity which in turn, gave better performance.

Dataset and Features
MNIST CIFAR10

Baseline Model: Kervolution
● The ith element of the
convolution output f(x) is
calculated as a simple inner
product between vector x(i) and
vector w.
● Whereas the kervolution is
calculated via the kernel trick
which essentially maps the
vector in a non linear space
and then takes the inner
product
Convolution
Kervolution

● Kernel function takes kervolution to non-linear space, thus
the model capacity is increased without introducing extra
parameters.
● Kervolution measures the similarity by match kernels, which
are equivalent to extracting specific features.
● One of the advantages of kervolution is that the non-linear
properties can be customized without explicit calculation.
Models Capacity and features

Polynomial Kervolution
● To show the behavior of polynomial Kervolution, the learned filters of
LeNet-5 trained for MNIST are visualized i which contains all six channels
of the first Kervolutional layer using polynomial kernel (dp = 3, cp = 1)

Continued..
● For a comparison, the learned filters from CNN are also presented. It is
interesting that some of the learned filters of KNN and CNN are quite
similar, This verifies our understanding of polynomial kernel, which is a
combination of linear and higher order terms.
● This also indicates that polynomial kervolution introduces higher order
feature interaction in a more flexible and direct way than the existing
methods.

Gaussian Kervolution
The Gaussian RBF kernel extends kervolution to infinite dimensions.
where γg (γg ∈ R+) is a hyperparameter to control the smoothness of
decision boundary.

Continued...
It extends kervolution to infinite dimensions because of the ith-degree terms in

Results
MNIST Dataset
● Test Accuracy (trained for 5 epochs):
● Convolution : 98.1%
● Poly-linear-linear : 98.4%
● Linear-poly-linear : 98.47%

Graph to show faster training with kervolution

Conclusions & Future Work
● Kervolution generalise convolution to non-linear space.
● Extends convolutional neural networks to kervolutional Neural
network.
● Not only retains the advantages of convolution( sharing weights and
equivalence to translation) but also enhances model capacity and
captures higher order interactions of features, via patch-wise kernel
functions without introducing additional parameters.

Future Work: Continued...
● With careful kernel chosen, the performance of CNN can be
significantly improved on MNIST, CIFAR, and ImageNet dataset
via replacing convolutional layers by kervolutional layers.
● Due to the large number of choices of kervolution, we cannot
perform a brute force search for all the possibilities.
● We expect the introduction of kervolutional layers in more
architectures and extensive hyperparameter searches can further
improve the performance.

Individual Contribution & Code
Sahasra Ranjan
(190050102)
Worked on the Kervolution Neural Networks and implemented the
training procedure on GPU using pytorch.
Paarth Jain (190050076) Worked on the training procedure and generated results for
various hyperparameters and network settings
Atul Verma (19B090004) Prepared presentation and project report
Tirthankar Adhikari
(190070003)
Debugging the implemented code and preparing presentation
Shrey Gupta (190100112)

Github Repository Link for Final code, Readme Files and Results:
GitHub Repo: https://github.com/Lhisoka/GNR-638-Project
Project PPT: https://docs.google.com/presentation/d/1-
VgwYgyPi4UW1CoTHDgVi7EISm5AbeZPVu62bCwqDsg/edit?usp=sharing
Note: All of our code is based on the following documentation:
https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Kervolutiona
l_Neural_Networks_CVPR_2019_paper.pdf

Given the recent rapid development in this field, there
are a lot more remaining to be explored

GNR638_Course Project for spring semester

Recommended

Recommended

More Related Content

Similar to GNR638_Course Project for spring semester

Similar to GNR638_Course Project for spring semester (20)

Recently uploaded

Recently uploaded (20)

GNR638_Course Project for spring semester