Machine learning

MACHINE LEARNING
Using Artificial Neural Network and ENCOG framework
Vikas Sinha

Outline
• What is Machine Learning?
• Why Machine Learning?
• Where is Machine Learning Used?
• Types of ML tasks
• Types of Learning
• Types of Data
• Normalization
• Types of Algorithms
• Artificial Neural Network
• Neuron Computation
• Training a Model
• Training Algorithm
• Model Training Flowchart
• Neural Network Structure
• Neural Network Computation
• Model Validation
• Disruptive Technology?
• ENCOG Framework
• Problem Statement
• Analysis of Outputs
• Outcomes

What is Machine Learning?
• Machine Learning (ML) is a subset of Artificial Intelligence
that provides computers with the ability to learn without
being explicitly programmed.
• Machine learning focuses on the development of computer
programs that can teach themselves to change when an
unseen data is exposed to them.
• They are the programs that learn and improve on the basis
of their “experience” on some “task” which increases their
performance measure.
Machine learning = “Learning from data or experience”

Why Machine Learning?
• With the advent of Data Mining we have been able to find out
more patterns in our data. Analyzing and validating this data
can get more success to our business.
• It is humanly impossible to interpret all or even some pattern in
these data using conventional program instructions (static if
then statements) .
• It is difficult to write mathematical relationships or equations
concerning them.
• It is way different than conventional way of writing programs.
• Not only Inputs are important but Factors also plays key role.
• What if Machines are programmed to perform tasks to learn
from data and re-use their experience to increase their
performance. This is the heart and soul of Machine Learning.

Where is Machine Learning Used?
• Forecasting
• Performance Prediction
• Recommendations
• Sentiments Prediction
• Spam detection
• Document Classification
• Face detection
• Language
processing/understanding.
• OCR
• Predicting sensor failure
• News clustering (e.g. Google
News)
• Medical diagnosis
• Many more…

Types of ML Tasks
• There are basically three types of tasks for which ML
programs are majorly used -
• Classification
• Regression
• Clustering

Classification Task
• Those tasks for which the primary
goal of the ML program is to classify
the data into classes (or categories
or label).
• Ex. The task of selecting SPAM
emails from a set of email is an
example of Classification task. Here
the class will be “SPAM” or “NOT
SPAM”.

Regression Task
goal of the ML program is to predict
the output for unseen input data are
know as Regression Tasks.
• Ex. The task of predicting human
sentiments or predicting the gamble
game.
• The program needs to learn from
the past data and predict the output
for the new unseen data from its
learning.

Clustering Task
goal of the ML program is to form a
cluster of data from large set of data
depending upon common
behavior/pattern or attributes
values.
• Ex. The task of selecting customer
details and product details from a
data set.
• The program needs to analyze the
common characteristics /pattern/
behavior and then put the data into
its justifiable cluster.

Types of Learning
• There are majorly two types of learning –
1. Supervised Learning
2. Unsupervised Learning
In supervised learning, the output datasets are provided
which are used to train the machine and get the desired
outputs whereas in unsupervised learning no output
datasets are provided, instead the data is clustered into
different classes .
SPF Check Valid Sender Valid Domain Result
No Yes No SPAM
SPF Check Valid Sender Valid Domain
No Yes Yes

Types of Data
• Numerical data - These are continuous numeric value type
data. Like, 1,2,3.7,4.8 (any integer or double)
• Nominal data - These are data which could be represented
as class (or category values). Like, category “Color” could
have values as “Red,Blue,Green”. Hence our column
“Color” will have these 3 types of data repeated over the
whole data set.
Machine Learning algorithms only accepts double values as
an Input and produces only double values as an Output.

Numerical and Nominal Data
• In the above data set , column “Age and Salary” are
Numerical data example where column “Gender “ is an
example of Nominal data example.
• Note: “S.No, Name” field will not affect the outcomes
howsoever and hence could be ignored.
S.No Name Age (in
years)
Salary(in
Rupees)
Gender
1 Nitin 24 11234.98 M
2 Abhishek 34 51234.00 M
3 Vikas 26 1222.99 M
4 Jyoti 25 7777.55 F

Normalization
• As the data could vary in range a lot (like in previous data
set we have Age which ranges between 20-35 but Salary
ranging between 1000-50000. Moreover the column
Gender is not at all numerical) , there is a need to bring all
of them on one scale.
• This process of scaling (resizing) the data is known as
Normalization.
• In the next two slides we will see how we can do
Numerical and Nominal data field normalization.

Numerical field data Normalization
• Suppose we have an actual value range between 40-50 .
We need to normalize value 42.5 on a scale of -1 to 1.
• The calculation will be:

Nominal field data Normalization
• In the above Data set , column Color holds the value either of “Red,
Blue or Green”.
• Since there are three distinct values let us encode Red as 1.0,0.0,0.0
and rest two as other combination as shown above.
• But this can lead to wrong prediction as there are two combinations
where the encoding ends with 0.0
• So if an Ideal is Red that could also be interpreted as Blue as shown
below-
Ideal : Red { 1.0, 0.0, 0.0 } Predicted : Blue: { 0.0, 1.0, 0.0}
Color
Red
Blue
Green
Color
1.0,0.0,0.0
0.0,1.0,0.0
0.0,0.0,1.0

• The previous encoding for Nominal field data is known as
One-of-N encoding where N is the number of distinct
values.
• In order to overcome the prediction problem of this
encoding mechanism, Equilateral Encoding mechanism
was introduced.
• It equally distributes fault behavior in wrong prediction
E.g. Color : {Red, Green, Blue}
on {0,1}.
• Cannot work on less than 3
classes.
Color Encoded
Color
Red {0.06,0.25}
Blue {0.93,0.28}
Green {0.5,1}

Types of Algorithms
In the area of supervised learning which deals much with
classification / regression. These are the algorithms types:
• Neural networks , Support vector machines , Genetic
programming , Bayesian Network , Decision trees , Case
based reasoning , Information fuzzy networks
In the area of unsupervised learning which deals much with
clustering. These are the algorithms types:
• K-means , Apriori , Mixture Model , Hierarchical clustering
We will be limiting our scope of presentation to
Artificial Neural Network

What's required to create good machine
learning systems?
• Data preparation capabilities.
• Deep analysis of the data.
• Even the training data set should be accurate.
• Algorithms – Basic or Advanced algorithms but their
selection plays key role in software success.
• Automation and iterative processes.
• Scalability.

Recap
Till now we have seen -
• What is Machine Learning?
• Why Machine Learning?
• Where is Machine Learning Used?
• Types of ML tasks
• Types of Learning
• Types of Data
• Normalization
• Types of Algorithms

Artificial Neural Network (ANN)
• In a very simple comparison with Human Neuron , ANN
receives signals , collect (sum) them , activate them and
ultimately produces output.
• Human neuron dendrites are the INPUT signals , the cell
body receives the weighted SUM of the signals , the axon
is the part where it is ACTIVATED and Axon terminals is
the exit gate for the OUTPUT.

• When ever a human mind processes a decision making
problem, it considers multiple inputs according to their
importance (weightage of the input) and cognitively
assess all the possibilities again and again (learn) so that
a decision (output) is made with minimal error.
• On similar fashion , ANN also works.
• It takes multiple inputs according to their weightage ,
process them and adjust the weights of the inputs again
and again (i.e. learn) so that an approximate output could
be produced which has minimal error.

• Above figure shows ANN model which shows two input
signals (In1 and In2). The weightage of each one of them is
Wt1 and Wt2 respectively.
In1
In2
Wt1
Wt2
SUM
ACTIVATION
FUNCTION
(A)
Out
∑
=A(∑)

• SUM is the weighted sum of all the input signals , i.e.
∑ = In1* Wt1 + In2*Wt2
• This weighted average ( ∑ ) is taken as an input by a
Activation function A and an output is generated.
Output= A(∑)
• The Output produced by the ANN model is not exactly
equal to the ideal output (say Z). There will always be
some gap and that gap is known as Error (E). Hence,
E= Z – A (∑)

Activation Function A=f(x)
• Selection of Activation function is important in generating
an output.
• There are many Activation function which could be used
depending upon the required range of the output.
• For example Sigmoid function, Hyperbolic function, Linear
function are few of the Activation function.
• In order to chose one amongst them is dependent upon
the range of the output which your model should
generate.

• For example, if you know that your target output is in the
range of 0 to 1 then you can use Sigmoid function as an
Activation function. The sigmoid method returns an output
between 0 and 1. The graph and formula is shown below-

• For example, if you know that your target output is in the
range of -1 to 1 then you can use Hyperbolic Tangent
function as an Activation function. The hyperbolic method
returns an output between -1 and 1. The graph and
formula is shown below-

• For example, if you know that your target output could be
any continuous number between infinities then you can
use linear function as an Activation function. The graph
and formula is shown below-

Neuron Computation
• Let us try to compute adjacent NN
using Sigmoid activation function.
• Let, In 1 = 0.1 , Wt 1=0.01 and In
2=0.5 and Wt 2=0.02
Therefore,
∑ = 0.1 *0.01 + 0.5*0.02 = 0.011
Using Sigmoid Activation function
A on ∑
Therefore,
A(∑) = A(0.011) =
1
1+𝑒(−𝑥)
where x=0.011
A(∑) =
1
1+𝑒(−0.011)
= 0.05027

Neuron Computation
• Suppose the ideal output for this combination of inputs
(In1 and In2) is 0.06
• The predicted output from our ANN model is 0.05027
which is very close to the ideal value. And hence our
prediction could be considered as useful.
• Suppose if the predicted output was less than or equal to
0.04
then there would have been a significant error
E= 0.06 – 0.04 = 0.02

Training a Model
• Even after using multiple Activation function also there are every
chance that the prediction made by our model is inaccurate. Hence,
there is a need to train the model again and again (i.e. learning
process) so that this Error could be reduced for any given Activation
function.
• Training a model is a process of updating the weights across the
network so that the error E could be minimized.
• For this, we suppose a Threshold value of error (say H). We train our
model till the time the error of our model is reduced to this threshold
value H.
• At this stage where the model error is reduced to threshold limit we
say that our Model is trained and it is ready for prediction.
• Deciding value for threshold limit (H) also plays major role in training
the model.
if, H is very small then our model could be Over Trained,
if H is very high then our model could be Under Trained.

Training Algorithm
• In order to train a model we require some kind of training
algorithm which can take care of adjusting the weights in
such a way that the global error is reduced to threshold
limit (H).
• There are various Training algorithm as named below-
Back propagation algorithm
Resilient propagation algorithm
Quick propagation algorithm
LMA algorithm
• Different training algorithms
Different rules to update weights and they have their own learning
rate
Different global error calculation method (like Gradient descent
etc.)
Different flow chart

Neural Network Structure
Type of neurons
 Input neurons (I) : No processing, used to provide input signals
 Output neurons (O) : Processing units, used to get output
 Hidden neurons (H) : Additional processing units used to converge the solution
 Bias neurons (B): Used to get non-zero result even if input is zero
I1
I2
H1
H2
B1
O1
O2
0.1A
0.2
0.06A
0.05A
0.03A
0.04A
0.5035A
0.5027A
1.0A
0.05A
0.03A
0.02A
0.01
0.06
0.03
0.51506
A
0.516294
A

Neural Network Computation
• H1: Sum = 0.1 * 0.06+ 0.2 * 0.04 =
0.014
Output = A(Sum) = 0.5035
• H2 : Sum = 0.1 * 0.05+ 0.2 * 0.03 =
0.011
• O1 : Sum = 0.5035 * 0.05+ 0.50275 *
0.03 + 1*0.02 = 0.060258
• O2 : Sum = 0.5035 * 0.01+ 0.50275 *
0.06 + 1*0.03 = 0.0652
Output = A(Sum) = 0.516294
In order to calculate the value for H1 , H2
and O1 , O2 we have used Sigmoid
Activation Function
A(∑) =
1
1+𝑒(−𝑥)

Model Validation
• Raw data is available which is divided
into two sets of data. One set is known
as Training data and another Validation
data.
• Training data and Validation data are
Normalized.
• Training normalized data is fed into a
Model.
• This Model is trained (by training
algorithm) such that it produces minimal
global error (by minimizing the local
errors). This gives us Trained Model.
• Normalized validation data is fed into the
Trained model.
• Normalized output is recovered.
• This output is De-normalized to give us
Validation Result.

Is this a Disruptive Technology?
• Yes Machine Learning is an example of Disruptive
Technology.
• It is replacing our conventional way of working on
classification problems, prediction problems and linear
results problems.
• Creating footholds in the low end market and even
creating new market.
• Turning non-customers to customers.
• Example – Email services disrupted postal services ,
Smart phones disrupted basic phones , 3G/4G over Edge
, HD Live Streaming over buffering etc.

ENCOG Framework
• ENCOG is open source
Machine Learning
framework.
• It supports almost all the ML
algorithms and their
architectures.
 Neural Networks
 Bayesian Networks
 Clustering
 Genetic Algorithms
 Hidden Markov Models
 Particle Swarm Optimization
 Simulated Annealing
 Support Vector Machines
• It supports almost all the
training algorithms.
 ADALINE Training
 Backpropagation
 Competitive Learning
 Genetic Algorithm Training
 Hopfield Learning
 Instar & Outstar Training
 Levenberg Marquardt (LMA)
 Manhattan Update Rule Propagation
 Nelder Mead Training
We will be using ENCOG
framework to build and train our
Network.

PROBLEM STATEMENT
Demonstrate AND Gate output prediction through ML
program using ENCOG framework

Program Sample
• I have created a simple C# console application in VS
2015.
• I have referred Encog core library (downloaded form
ENCOG site).
• The steps for creating a program are as follows:
Create Input Data and Ideal Output
Create Network
Perform Training of the model using looping construct
Perform Evaluation

Input and Ideal data
• First of all add a reference of Encog library in your project.
• Input Data and Ideal data for AND Gate. Note: All data is an array of
double type.
• Construct a data set from Input and Ideal data.

Create Network
• We are creating Basic Neural network (though there are many others as
well).
• This network has 3 layers of Neuron.
• 1st Layer has 2 input neuron and 1 bias neuron. The property value “true”
signifies whether there are any Bias neuron or not.
• 2nd Layer has 2 hidden neuron and 1 bias neuron.
• 3rd Layer has 1 output neuron.
• 2nd and 3rd layer has Activation function as well. We are using Sigmoid
activation function though there are many others as well.

Train the network
• We are using Basic Resilient propagation training algorithm.
• We are using do…while loop to iterate the training process on the network for
the training dataset.
• We have given a Threshold value of 0.0168.
• Our network will be trained till the error is reduced to this threshold limit.
• As discussed earlier, choosing this value is hit and trial. In such a case the
number of training iterations will be very less.
• A very high value could lead to Under trained network. In such a case the
number of training iterations will be very high.
• A very low value could lead to Over trained network.

Evaluation
• We are iterating over the training dataset and computing the
Inputs over the network to get the predicted output and
printing them.

Output
The above two outputs are for the same
Activation function, same inputs, same
Propagation algorithm, same network. It
is true that not every time the number of
iterations will be same or nearly equal for
same criteria and this is how NN works.
You can see that in order to generate
output the first execution took 59
iterations while the second took merely
16 iteration.

Analysis of Outputs
• Analysis of first output screen
• Analysis of second output screen
Input Combination Ideal Output Predicted Output
0,0 0 1E-20 i.e. 1* 10^-20
1,0 0 0.104
0,1 0 0.108
1,1 1 0.844
Input Combination Ideal Output Predicted Output
0,0 0 0.005
1,0 0 0.080
0,1 0 0.077
1,1 1 0.856

Comparison of Propagations
Resilient Propagation Manhattan Propagation

Outcome
• We changed the propagation algorithm and saw that with
Manhattan propagation algorithm it takes nearly 64
iterations to reduce the network error to 0.0168.
• There are various others training algorithms as well.
• Feel free to try different combinations of different
Activation functions, propagation algorithms , threshold
values , network layer neurons and analyze the
outcomes.
• I will be discussing various other Activation functions and
Propagation algorithms in my next presentation.

Machine learning

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Machine learning

Similar to Machine learning (20)

Recently uploaded

Recently uploaded (20)

Machine learning

Editor's Notes