Implementing the Perceptron Algorithm for Finding the weights of a Linear Discriminant Function
1. Implementing the Perceptron algorithm for finding
the weights of a Linear Discriminant function.
Dipesh Shome
Department of Computer Science and Engineering, AUST
Ahsanullah University of Science and Technology
Dhaka, Bangladesh
160204045@aust.edu
Abstract—In machine learning, Perceptron algorithm is a
simplest type of neural model.It is used as an algorithm or linear
classifier to facilitate supervised learning of binary classifier.In
this experiment,main objective is to implement perceptron al-
gorithm for finding the weights of linear discriminant function
by performing several tasks: convert the sample points into
higher dimension using phi function, normalize one class by
multiplying negetive one,perform weight update using single
and batch update, boundary equation and finally plot different
figures.
Index Terms—perceptron algorithm, linear classifier, gradient
descent, normalization, weight update, learning rate
I. INTRODUCTION
The main idea of perceptron came from the operating
principle of the basic processing unit of the brain — Neuron.
Like neuron the perceptron comprised of many inputs often
called features that are fed into a Linear unit that produces
one binary output. Therefore, perceptrons can be applied in
solving Binary Classification problems where the sample is
to be identified as belonging to one of the predefined two
classes.Perceptron algorithm was invented by Frank Rosenbaltt
in 1957. Main drawback of this algorithm is it does not work
well on non linear data.But performing several task perceptron
can work well on non linear data.That will be broadly discuss
in this experiment.
II. EXPERIMENTAL DESIGN / METHODOLOGY
A. Description of the different tasks:
Two-class set of prototypes have to be taken from “train.txt”
files.
Task 1: Take input from “train.txt” file. Plot all sample
points from both classes, but samples from the same class
should have the same color and marker. Observe if these two
classes can be separated with a linear boundary.
Task 2: Consider the case of a second order polynomial
discriminant function. Generate the high dimensional sample
points y, as discussed in the class. We shall use the following
formula:
y = [x1
2
x2
2
x1 ∗ x2 x1 x2 1]
Also, normalize any one of the two classes.
Task 3:Use Perceptron Algorithm (both one at a time and
many at a time) for finding the weight-coefficients of the
discriminant function (i.e., values of w) boundary for your
linear classifier in task 2. Here α is the learning rate and
0 < α ≤ 1.
Task 4: Three initial weights have to be used (all one,
all zero, randomly initialized with seed fixed). For all of
these three cases vary the learning rate between 0.1 and 1
with step size 0.1. Create a table which should contain your
learning rate, number of iterations for one at a time and batch
Perceptron for all of the three initial weights. You also have
to create a bar chart visualizing your table data. Also, in your
report, address these following questions:
a. In task 2, why do we need to take the sample points to a
high dimension?
b. In each of the three initial weight cases and for each
learning rate, how many updatesdoes the algorithm take
before converging?
B. Implementation:
1) Plotting of all sample data of train data: Here we have
a training dataset which is consist of 6 samples belong to
two different classes. First task is to plot all the data point of
both class.For plotting we import two python library: Numpy
and Matplotlib. Scatter plot function and marker were used
to plot the samples from same classes with same color. Train
class 1 is plotted using dot(.) marker with red color and train
class 2 is plotted using star(*) marker with blue color.Finally,
we legend the plot and the plotted figure is given in Fig.1. As
the dataset is non linear data, so it is not possible to separate
the data points with a linear boundary. We need hyper-plane
to separate the data points.
2) Generating the high dimensional sample points using phi
function and Normalization: : As earlier we said, perceptron
algorithm perform better in linear data.But most of the real
world data is non linear. So we need to convert the data points
2. Fig. 1. Sample point Plotting
into higher dimension to perform perceptron algorithm. Given
formula for converting higher dimension:
y = [x1
2
x2
2
x1 ∗ x2 x1 x2 1]
Our given training dataset is 2d and using the given phi
function or second order polynomial discriminant function the
data points converted into 6D.
Then another sub-task is normalization of any one of the two
classes.In normalization process:
a) instead of 2 criteria there is considered one criteria,
b) take the sample of class 1 as it is and
c) Negating the sample of class 2
No it can easily said that
if a|
yi > 0 then correctly classified and if a|
yi ≤ 0 then
misclassified where a|
is modified weight vector and yi is
augmented feature vector. Moreover a|
yi is the homogeneous
form of
g(x) = w|
x + w0
3) Perceptron Algorithm both on one at a time and many at
a time): In task 3 can be solved in two different ways: batch
process or many at a time and single process or one at a time.
We tried to solve in both ways.In this stage we use gradient
descent in a interactive way using different step size(learning
rate) until it hits local minima. The formula for batch process
or many at a time:
w(t+1)=wt + η
X
y
The formula for single process or one at a time:
w(t+1)=wt + ηy
Here, w(t+1) is new weight and wt is old weight. Moreover we
will use three different initial weight for this experiment: all
ZERO, w = [0 0 0 0 0 0], all ONE, w = [1 1 1 1 1 1],
randomly initialized weights and learning rate with step size
0.1: 0 < α ≤ 1.
4) Table creation and visualization: For ALL ONE initial
weights with step size 0.1 from 0.1 to 1 for both perceptron,
table and bar chart has been given in Fig. 1
TABLE I
INITIAL WEIGHT ALL ONE
Alpha(learning rate) one at a time many at a time
0.1 6 102
0.2 92 104
0.3 104 91
0.4 106 116
0.5 93 105
0.6 93 114
0.7 108 91
0.8 115 91
0.9 94 105
1.0 94 93
Fig. 2. Bar chart
For ALL ZERO initial weights with step size 0.1 from 0.1
to 1 for both perceptron, table and bar chart has been given
in Fig. 2:
TABLE II
INITIAL WEIGHT ALL ONE
Alpha(learning rate) one at a time many at a time
0.1 94 105
0.2 94 105
0.3 94 105
0.4 94 105
0.5 94 92
0.6 94 92
0.7 94 92
0.8 94 105
0.9 94 105
1.0 94 92
3. Fig. 3. Bar chart
For RANDOM initial weights with step size 0.1 from 0.1
to 1 for both perceptron, table and bar chart has been given
in Fig. 3
TABLE III
INITIAL WEIGHT RANDOM
Alpha(learning rate) one at a time many at a time
0.1 97 84
0.2 95 91
0.3 93 117
0.4 101 133
0.5 106 90
0.6 113 105
0.7 94 88
0.8 113 138
0.9 108 138
1.0 101 150
Fig. 4. Bar chart
III. RESULT ANALYSIS
In the implementation of perceptron algorithm, we exper-
iment with different parameters, initial wights and learning
rate. The efficiency of the algorithm is measured by how many
loops that each of the learning rate used to complete the task.
Here i used a variable to check.
From TABLE I, TALE II and TABLE III, we can see that,
many at a time takes much more time to converge than one at
a time. It is because in one at a time the weight updates itself
every time but in many at a time it do not.
IV. CONCLUSION
In this experiment, i tried to implement perceptron algorithm
in simplest way. For this i have to follow some steps. First,
perceptron is a linear classifier so it perform better in linearly
separable data but in non linear data it doesn’t give hyper
plane. So, data points converted in to higher dimension to
apply perceptron algorithm. Secondly apply normalization any
of two classes. Then i used three different initial weight with
learning rate from 0.1 to 1 having step size 0.1 for both one at
a time and many at a time.Finally observing the result from the
above mentioned tables and bar charts and come to conclusion
that, many at a time takes more time than one at a time.
V. ALGORITHM IMPLEMENTATION / CODE
4. REFERENCES
[1] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of
Lipschitz-Hankel type involving products of Bessel functions,” Phil.
Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955.
[2] J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol.
2. Oxford: Clarendon, 1892, pp.68–73.