MACHINE LEARNING
Using Artificial Neural Network and ENCOG framework
Vikas Sinha
Outline
• What is Machine Learning?
• Why Machine Learning?
• Where is Machine Learning Used?
• Types of ML tasks
• Types of Learning
• Types of Data
• Normalization
• Types of Algorithms
• Artificial Neural Network
• Neuron Computation
• Training a Model
• Training Algorithm
• Model Training Flowchart
• Neural Network Structure
• Neural Network Computation
• Model Validation
• Disruptive Technology?
• ENCOG Framework
• Problem Statement
• Analysis of Outputs
• Outcomes
What is Machine Learning?
• Machine Learning (ML) is a subset of Artificial Intelligence
that provides computers with the ability to learn without
being explicitly programmed.
• Machine learning focuses on the development of computer
programs that can teach themselves to change when an
unseen data is exposed to them.
• They are the programs that learn and improve on the basis
of their “experience” on some “task” which increases their
performance measure.
Machine learning = “Learning from data or experience”
Why Machine Learning?
• With the advent of Data Mining we have been able to find out
more patterns in our data. Analyzing and validating this data
can get more success to our business.
• It is humanly impossible to interpret all or even some pattern in
these data using conventional program instructions (static if
then statements) .
• It is difficult to write mathematical relationships or equations
concerning them.
• It is way different than conventional way of writing programs.
• Not only Inputs are important but Factors also plays key role.
• What if Machines are programmed to perform tasks to learn
from data and re-use their experience to increase their
performance. This is the heart and soul of Machine Learning.
Where is Machine Learning Used?
• Forecasting
• Performance Prediction
• Recommendations
• Sentiments Prediction
• Spam detection
• Document Classification
• Face detection
• Language
processing/understanding.
• OCR
• Predicting sensor failure
• News clustering (e.g. Google
News)
• Medical diagnosis
• Many more…
Types of ML Tasks
• There are basically three types of tasks for which ML
programs are majorly used -
• Classification
• Regression
• Clustering
Classification Task
• Those tasks for which the primary
goal of the ML program is to classify
the data into classes (or categories
or label).
• Ex. The task of selecting SPAM
emails from a set of email is an
example of Classification task. Here
the class will be “SPAM” or “NOT
SPAM”.
Regression Task
• Those tasks for which the primary
goal of the ML program is to predict
the output for unseen input data are
know as Regression Tasks.
• Ex. The task of predicting human
sentiments or predicting the gamble
game.
• The program needs to learn from
the past data and predict the output
for the new unseen data from its
learning.
Clustering Task
• Those tasks for which the primary
goal of the ML program is to form a
cluster of data from large set of data
depending upon common
behavior/pattern or attributes
values.
• Ex. The task of selecting customer
details and product details from a
data set.
• The program needs to analyze the
common characteristics /pattern/
behavior and then put the data into
its justifiable cluster.
Types of Learning
• There are majorly two types of learning –
1. Supervised Learning
2. Unsupervised Learning
In supervised learning, the output datasets are provided
which are used to train the machine and get the desired
outputs whereas in unsupervised learning no output
datasets are provided, instead the data is clustered into
different classes .
SPF Check Valid Sender Valid Domain Result
No Yes No SPAM
SPF Check Valid Sender Valid Domain
No Yes Yes
Types of Data
• Numerical data - These are continuous numeric value type
data. Like, 1,2,3.7,4.8 (any integer or double)
• Nominal data - These are data which could be represented
as class (or category values). Like, category “Color” could
have values as “Red,Blue,Green”. Hence our column
“Color” will have these 3 types of data repeated over the
whole data set.
Machine Learning algorithms only accepts double values as
an Input and produces only double values as an Output.
Numerical and Nominal Data
• In the above data set , column “Age and Salary” are
Numerical data example where column “Gender “ is an
example of Nominal data example.
• Note: “S.No, Name” field will not affect the outcomes
howsoever and hence could be ignored.
S.No Name Age (in
years)
Salary(in
Rupees)
Gender
1 Nitin 24 11234.98 M
2 Abhishek 34 51234.00 M
3 Vikas 26 1222.99 M
4 Jyoti 25 7777.55 F
Normalization
• As the data could vary in range a lot (like in previous data
set we have Age which ranges between 20-35 but Salary
ranging between 1000-50000. Moreover the column
Gender is not at all numerical) , there is a need to bring all
of them on one scale.
• This process of scaling (resizing) the data is known as
Normalization.
• In the next two slides we will see how we can do
Numerical and Nominal data field normalization.
Numerical field data Normalization
• Suppose we have an actual value range between 40-50 .
We need to normalize value 42.5 on a scale of -1 to 1.
• The calculation will be:
Nominal field data Normalization
• In the above Data set , column Color holds the value either of “Red,
Blue or Green”.
• Since there are three distinct values let us encode Red as 1.0,0.0,0.0
and rest two as other combination as shown above.
• But this can lead to wrong prediction as there are two combinations
where the encoding ends with 0.0
• So if an Ideal is Red that could also be interpreted as Blue as shown
below-
Ideal : Red { 1.0, 0.0, 0.0 } Predicted : Blue: { 0.0, 1.0, 0.0}
Color
Red
Blue
Green
Color
1.0,0.0,0.0
0.0,1.0,0.0
0.0,0.0,1.0
• The previous encoding for Nominal field data is known as
One-of-N encoding where N is the number of distinct
values.
• In order to overcome the prediction problem of this
encoding mechanism, Equilateral Encoding mechanism
was introduced.
• It equally distributes fault behavior in wrong prediction
E.g. Color : {Red, Green, Blue}
on {0,1}.
• Cannot work on less than 3
classes.
Color Encoded
Color
Red {0.06,0.25}
Blue {0.93,0.28}
Green {0.5,1}
Types of Algorithms
In the area of supervised learning which deals much with
classification / regression. These are the algorithms types:
• Neural networks , Support vector machines , Genetic
programming , Bayesian Network , Decision trees , Case
based reasoning , Information fuzzy networks
In the area of unsupervised learning which deals much with
clustering. These are the algorithms types:
• K-means , Apriori , Mixture Model , Hierarchical clustering
We will be limiting our scope of presentation to
Artificial Neural Network
What's required to create good machine
learning systems?
• Data preparation capabilities.
• Deep analysis of the data.
• Even the training data set should be accurate.
• Algorithms – Basic or Advanced algorithms but their
selection plays key role in software success.
• Automation and iterative processes.
• Scalability.
Recap
Till now we have seen -
• What is Machine Learning?
• Why Machine Learning?
• Where is Machine Learning Used?
• Types of ML tasks
• Types of Learning
• Types of Data
• Normalization
• Types of Algorithms
Artificial Neural Network (ANN)
• In a very simple comparison with Human Neuron , ANN
receives signals , collect (sum) them , activate them and
ultimately produces output.
• Human neuron dendrites are the INPUT signals , the cell
body receives the weighted SUM of the signals , the axon
is the part where it is ACTIVATED and Axon terminals is
the exit gate for the OUTPUT.
Artificial Neural Network (ANN)
• When ever a human mind processes a decision making
problem, it considers multiple inputs according to their
importance (weightage of the input) and cognitively
assess all the possibilities again and again (learn) so that
a decision (output) is made with minimal error.
• On similar fashion , ANN also works.
• It takes multiple inputs according to their weightage ,
process them and adjust the weights of the inputs again
and again (i.e. learn) so that an approximate output could
be produced which has minimal error.
Artificial Neural Network (ANN)
• Above figure shows ANN model which shows two input
signals (In1 and In2). The weightage of each one of them is
Wt1 and Wt2 respectively.
In1
In2
Wt1
Wt2
SUM
ACTIVATION
FUNCTION
(A)
Out
∑
=A(∑)
Artificial Neural Network (ANN)
• SUM is the weighted sum of all the input signals , i.e.
∑ = In1* Wt1 + In2*Wt2
• This weighted average ( ∑ ) is taken as an input by a
Activation function A and an output is generated.
Output= A(∑)
• The Output produced by the ANN model is not exactly
equal to the ideal output (say Z). There will always be
some gap and that gap is known as Error (E). Hence,
E= Z – A (∑)
Artificial Neural Network (ANN)
Activation Function A=f(x)
• Selection of Activation function is important in generating
an output.
• There are many Activation function which could be used
depending upon the required range of the output.
• For example Sigmoid function, Hyperbolic function, Linear
function are few of the Activation function.
• In order to chose one amongst them is dependent upon
the range of the output which your model should
generate.
Artificial Neural Network (ANN)
Activation Function A=f(x)
• For example, if you know that your target output is in the
range of 0 to 1 then you can use Sigmoid function as an
Activation function. The sigmoid method returns an output
between 0 and 1. The graph and formula is shown below-
Artificial Neural Network (ANN)
Activation Function A=f(x)
• For example, if you know that your target output is in the
range of -1 to 1 then you can use Hyperbolic Tangent
function as an Activation function. The hyperbolic method
returns an output between -1 and 1. The graph and
formula is shown below-
Artificial Neural Network (ANN)
Activation Function A=f(x)
• For example, if you know that your target output could be
any continuous number between infinities then you can
use linear function as an Activation function. The graph
and formula is shown below-
Neuron Computation
• Let us try to compute adjacent NN
using Sigmoid activation function.
• Let, In 1 = 0.1 , Wt 1=0.01 and In
2=0.5 and Wt 2=0.02
Therefore,
∑ = 0.1 *0.01 + 0.5*0.02 = 0.011
Using Sigmoid Activation function
A on ∑
Therefore,
A(∑) = A(0.011) =
1
1+𝑒(−𝑥)
where x=0.011
A(∑) =
1
1+𝑒(−0.011)
= 0.05027
Neuron Computation
• Suppose the ideal output for this combination of inputs
(In1 and In2) is 0.06
• The predicted output from our ANN model is 0.05027
which is very close to the ideal value. And hence our
prediction could be considered as useful.
• Suppose if the predicted output was less than or equal to
0.04
then there would have been a significant error
E= 0.06 – 0.04 = 0.02
Training a Model
• Even after using multiple Activation function also there are every
chance that the prediction made by our model is inaccurate. Hence,
there is a need to train the model again and again (i.e. learning
process) so that this Error could be reduced for any given Activation
function.
• Training a model is a process of updating the weights across the
network so that the error E could be minimized.
• For this, we suppose a Threshold value of error (say H). We train our
model till the time the error of our model is reduced to this threshold
value H.
• At this stage where the model error is reduced to threshold limit we
say that our Model is trained and it is ready for prediction.
• Deciding value for threshold limit (H) also plays major role in training
the model.
if, H is very small then our model could be Over Trained,
if H is very high then our model could be Under Trained.
Training Algorithm
• In order to train a model we require some kind of training
algorithm which can take care of adjusting the weights in
such a way that the global error is reduced to threshold
limit (H).
• There are various Training algorithm as named below-
Back propagation algorithm
Resilient propagation algorithm
Quick propagation algorithm
LMA algorithm
• Different training algorithms
Different rules to update weights and they have their own learning
rate
Different global error calculation method (like Gradient descent
etc.)
Different flow chart
Model Training Flowchart
Neural Network Structure
Type of neurons
 Input neurons (I) : No processing, used to provide input signals
 Output neurons (O) : Processing units, used to get output
 Hidden neurons (H) : Additional processing units used to converge the solution
 Bias neurons (B): Used to get non-zero result even if input is zero
I1
I2
H1
H2
B1
O1
O2
0.1A
0.2
0.06A
0.05A
0.03A
0.04A
0.5035A
0.5027A
1.0A
0.05A
0.03A
0.02A
0.01
0.06
0.03
0.51506
A
0.516294
A
Neural Network Computation
• H1: Sum = 0.1 * 0.06+ 0.2 * 0.04 =
0.014
Output = A(Sum) = 0.5035
• H2 : Sum = 0.1 * 0.05+ 0.2 * 0.03 =
0.011
Output = A(Sum) = 0.50275
• O1 : Sum = 0.5035 * 0.05+ 0.50275 *
0.03 + 1*0.02 = 0.060258
Output = A(Sum) = 0.51506
• O2 : Sum = 0.5035 * 0.01+ 0.50275 *
0.06 + 1*0.03 = 0.0652
Output = A(Sum) = 0.516294
In order to calculate the value for H1 , H2
and O1 , O2 we have used Sigmoid
Activation Function
A(∑) =
1
1+𝑒(−𝑥)
Model Validation
• Raw data is available which is divided
into two sets of data. One set is known
as Training data and another Validation
data.
• Training data and Validation data are
Normalized.
• Training normalized data is fed into a
Model.
• This Model is trained (by training
algorithm) such that it produces minimal
global error (by minimizing the local
errors). This gives us Trained Model.
• Normalized validation data is fed into the
Trained model.
• Normalized output is recovered.
• This output is De-normalized to give us
Validation Result.
Is this a Disruptive Technology?
• Yes Machine Learning is an example of Disruptive
Technology.
• It is replacing our conventional way of working on
classification problems, prediction problems and linear
results problems.
• Creating footholds in the low end market and even
creating new market.
• Turning non-customers to customers.
• Example – Email services disrupted postal services ,
Smart phones disrupted basic phones , 3G/4G over Edge
, HD Live Streaming over buffering etc.
ENCOG Framework
• ENCOG is open source
Machine Learning
framework.
• It supports almost all the ML
algorithms and their
architectures.
 Neural Networks
 Bayesian Networks
 Clustering
 Genetic Algorithms
 Hidden Markov Models
 Particle Swarm Optimization
 Simulated Annealing
 Support Vector Machines
• It supports almost all the
training algorithms.
 ADALINE Training
 Backpropagation
 Competitive Learning
 Genetic Algorithm Training
 Hopfield Learning
 Instar & Outstar Training
 Levenberg Marquardt (LMA)
 Manhattan Update Rule Propagation
 Nelder Mead Training
We will be using ENCOG
framework to build and train our
Network.
PROBLEM STATEMENT
Demonstrate AND Gate output prediction through ML
program using ENCOG framework
Program Sample
• I have created a simple C# console application in VS
2015.
• I have referred Encog core library (downloaded form
ENCOG site).
• The steps for creating a program are as follows:
Create Input Data and Ideal Output
Create Network
Perform Training of the model using looping construct
Perform Evaluation
Input and Ideal data
• First of all add a reference of Encog library in your project.
• Input Data and Ideal data for AND Gate. Note: All data is an array of
double type.
• Construct a data set from Input and Ideal data.
Create Network
• We are creating Basic Neural network (though there are many others as
well).
• This network has 3 layers of Neuron.
• 1st Layer has 2 input neuron and 1 bias neuron. The property value “true”
signifies whether there are any Bias neuron or not.
• 2nd Layer has 2 hidden neuron and 1 bias neuron.
• 3rd Layer has 1 output neuron.
• 2nd and 3rd layer has Activation function as well. We are using Sigmoid
activation function though there are many others as well.
Train the network
• We are using Basic Resilient propagation training algorithm.
• We are using do…while loop to iterate the training process on the network for
the training dataset.
• We have given a Threshold value of 0.0168.
• Our network will be trained till the error is reduced to this threshold limit.
• As discussed earlier, choosing this value is hit and trial. In such a case the
number of training iterations will be very less.
• A very high value could lead to Under trained network. In such a case the
number of training iterations will be very high.
• A very low value could lead to Over trained network.
Evaluation
• We are iterating over the training dataset and computing the
Inputs over the network to get the predicted output and
printing them.
Output
The above two outputs are for the same
Activation function, same inputs, same
Propagation algorithm, same network. It
is true that not every time the number of
iterations will be same or nearly equal for
same criteria and this is how NN works.
You can see that in order to generate
output the first execution took 59
iterations while the second took merely
16 iteration.
Analysis of Outputs
• Analysis of first output screen
• Analysis of second output screen
Input Combination Ideal Output Predicted Output
0,0 0 1E-20 i.e. 1* 10^-20
1,0 0 0.104
0,1 0 0.108
1,1 1 0.844
Input Combination Ideal Output Predicted Output
0,0 0 0.005
1,0 0 0.080
0,1 0 0.077
1,1 1 0.856
Comparison of Propagations
Resilient Propagation Manhattan Propagation
Outcome
• We changed the propagation algorithm and saw that with
Manhattan propagation algorithm it takes nearly 64
iterations to reduce the network error to 0.0168.
• There are various others training algorithms as well.
• Feel free to try different combinations of different
Activation functions, propagation algorithms , threshold
values , network layer neurons and analyze the
outcomes.
• I will be discussing various other Activation functions and
Propagation algorithms in my next presentation.
THANK YOU FOR LEARNING

Machine learning

  • 1.
    MACHINE LEARNING Using ArtificialNeural Network and ENCOG framework Vikas Sinha
  • 2.
    Outline • What isMachine Learning? • Why Machine Learning? • Where is Machine Learning Used? • Types of ML tasks • Types of Learning • Types of Data • Normalization • Types of Algorithms • Artificial Neural Network • Neuron Computation • Training a Model • Training Algorithm • Model Training Flowchart • Neural Network Structure • Neural Network Computation • Model Validation • Disruptive Technology? • ENCOG Framework • Problem Statement • Analysis of Outputs • Outcomes
  • 3.
    What is MachineLearning? • Machine Learning (ML) is a subset of Artificial Intelligence that provides computers with the ability to learn without being explicitly programmed. • Machine learning focuses on the development of computer programs that can teach themselves to change when an unseen data is exposed to them. • They are the programs that learn and improve on the basis of their “experience” on some “task” which increases their performance measure. Machine learning = “Learning from data or experience”
  • 4.
    Why Machine Learning? •With the advent of Data Mining we have been able to find out more patterns in our data. Analyzing and validating this data can get more success to our business. • It is humanly impossible to interpret all or even some pattern in these data using conventional program instructions (static if then statements) . • It is difficult to write mathematical relationships or equations concerning them. • It is way different than conventional way of writing programs. • Not only Inputs are important but Factors also plays key role. • What if Machines are programmed to perform tasks to learn from data and re-use their experience to increase their performance. This is the heart and soul of Machine Learning.
  • 5.
    Where is MachineLearning Used? • Forecasting • Performance Prediction • Recommendations • Sentiments Prediction • Spam detection • Document Classification • Face detection • Language processing/understanding. • OCR • Predicting sensor failure • News clustering (e.g. Google News) • Medical diagnosis • Many more…
  • 6.
    Types of MLTasks • There are basically three types of tasks for which ML programs are majorly used - • Classification • Regression • Clustering
  • 7.
    Classification Task • Thosetasks for which the primary goal of the ML program is to classify the data into classes (or categories or label). • Ex. The task of selecting SPAM emails from a set of email is an example of Classification task. Here the class will be “SPAM” or “NOT SPAM”.
  • 8.
    Regression Task • Thosetasks for which the primary goal of the ML program is to predict the output for unseen input data are know as Regression Tasks. • Ex. The task of predicting human sentiments or predicting the gamble game. • The program needs to learn from the past data and predict the output for the new unseen data from its learning.
  • 9.
    Clustering Task • Thosetasks for which the primary goal of the ML program is to form a cluster of data from large set of data depending upon common behavior/pattern or attributes values. • Ex. The task of selecting customer details and product details from a data set. • The program needs to analyze the common characteristics /pattern/ behavior and then put the data into its justifiable cluster.
  • 10.
    Types of Learning •There are majorly two types of learning – 1. Supervised Learning 2. Unsupervised Learning In supervised learning, the output datasets are provided which are used to train the machine and get the desired outputs whereas in unsupervised learning no output datasets are provided, instead the data is clustered into different classes . SPF Check Valid Sender Valid Domain Result No Yes No SPAM SPF Check Valid Sender Valid Domain No Yes Yes
  • 11.
    Types of Data •Numerical data - These are continuous numeric value type data. Like, 1,2,3.7,4.8 (any integer or double) • Nominal data - These are data which could be represented as class (or category values). Like, category “Color” could have values as “Red,Blue,Green”. Hence our column “Color” will have these 3 types of data repeated over the whole data set. Machine Learning algorithms only accepts double values as an Input and produces only double values as an Output.
  • 12.
    Numerical and NominalData • In the above data set , column “Age and Salary” are Numerical data example where column “Gender “ is an example of Nominal data example. • Note: “S.No, Name” field will not affect the outcomes howsoever and hence could be ignored. S.No Name Age (in years) Salary(in Rupees) Gender 1 Nitin 24 11234.98 M 2 Abhishek 34 51234.00 M 3 Vikas 26 1222.99 M 4 Jyoti 25 7777.55 F
  • 13.
    Normalization • As thedata could vary in range a lot (like in previous data set we have Age which ranges between 20-35 but Salary ranging between 1000-50000. Moreover the column Gender is not at all numerical) , there is a need to bring all of them on one scale. • This process of scaling (resizing) the data is known as Normalization. • In the next two slides we will see how we can do Numerical and Nominal data field normalization.
  • 14.
    Numerical field dataNormalization • Suppose we have an actual value range between 40-50 . We need to normalize value 42.5 on a scale of -1 to 1. • The calculation will be:
  • 15.
    Nominal field dataNormalization • In the above Data set , column Color holds the value either of “Red, Blue or Green”. • Since there are three distinct values let us encode Red as 1.0,0.0,0.0 and rest two as other combination as shown above. • But this can lead to wrong prediction as there are two combinations where the encoding ends with 0.0 • So if an Ideal is Red that could also be interpreted as Blue as shown below- Ideal : Red { 1.0, 0.0, 0.0 } Predicted : Blue: { 0.0, 1.0, 0.0} Color Red Blue Green Color 1.0,0.0,0.0 0.0,1.0,0.0 0.0,0.0,1.0
  • 16.
    • The previousencoding for Nominal field data is known as One-of-N encoding where N is the number of distinct values. • In order to overcome the prediction problem of this encoding mechanism, Equilateral Encoding mechanism was introduced. • It equally distributes fault behavior in wrong prediction E.g. Color : {Red, Green, Blue} on {0,1}. • Cannot work on less than 3 classes. Color Encoded Color Red {0.06,0.25} Blue {0.93,0.28} Green {0.5,1}
  • 17.
    Types of Algorithms Inthe area of supervised learning which deals much with classification / regression. These are the algorithms types: • Neural networks , Support vector machines , Genetic programming , Bayesian Network , Decision trees , Case based reasoning , Information fuzzy networks In the area of unsupervised learning which deals much with clustering. These are the algorithms types: • K-means , Apriori , Mixture Model , Hierarchical clustering We will be limiting our scope of presentation to Artificial Neural Network
  • 18.
    What's required tocreate good machine learning systems? • Data preparation capabilities. • Deep analysis of the data. • Even the training data set should be accurate. • Algorithms – Basic or Advanced algorithms but their selection plays key role in software success. • Automation and iterative processes. • Scalability.
  • 19.
    Recap Till now wehave seen - • What is Machine Learning? • Why Machine Learning? • Where is Machine Learning Used? • Types of ML tasks • Types of Learning • Types of Data • Normalization • Types of Algorithms
  • 20.
    Artificial Neural Network(ANN) • In a very simple comparison with Human Neuron , ANN receives signals , collect (sum) them , activate them and ultimately produces output. • Human neuron dendrites are the INPUT signals , the cell body receives the weighted SUM of the signals , the axon is the part where it is ACTIVATED and Axon terminals is the exit gate for the OUTPUT.
  • 21.
    Artificial Neural Network(ANN) • When ever a human mind processes a decision making problem, it considers multiple inputs according to their importance (weightage of the input) and cognitively assess all the possibilities again and again (learn) so that a decision (output) is made with minimal error. • On similar fashion , ANN also works. • It takes multiple inputs according to their weightage , process them and adjust the weights of the inputs again and again (i.e. learn) so that an approximate output could be produced which has minimal error.
  • 22.
    Artificial Neural Network(ANN) • Above figure shows ANN model which shows two input signals (In1 and In2). The weightage of each one of them is Wt1 and Wt2 respectively. In1 In2 Wt1 Wt2 SUM ACTIVATION FUNCTION (A) Out ∑ =A(∑)
  • 23.
    Artificial Neural Network(ANN) • SUM is the weighted sum of all the input signals , i.e. ∑ = In1* Wt1 + In2*Wt2 • This weighted average ( ∑ ) is taken as an input by a Activation function A and an output is generated. Output= A(∑) • The Output produced by the ANN model is not exactly equal to the ideal output (say Z). There will always be some gap and that gap is known as Error (E). Hence, E= Z – A (∑)
  • 24.
    Artificial Neural Network(ANN) Activation Function A=f(x) • Selection of Activation function is important in generating an output. • There are many Activation function which could be used depending upon the required range of the output. • For example Sigmoid function, Hyperbolic function, Linear function are few of the Activation function. • In order to chose one amongst them is dependent upon the range of the output which your model should generate.
  • 25.
    Artificial Neural Network(ANN) Activation Function A=f(x) • For example, if you know that your target output is in the range of 0 to 1 then you can use Sigmoid function as an Activation function. The sigmoid method returns an output between 0 and 1. The graph and formula is shown below-
  • 26.
    Artificial Neural Network(ANN) Activation Function A=f(x) • For example, if you know that your target output is in the range of -1 to 1 then you can use Hyperbolic Tangent function as an Activation function. The hyperbolic method returns an output between -1 and 1. The graph and formula is shown below-
  • 27.
    Artificial Neural Network(ANN) Activation Function A=f(x) • For example, if you know that your target output could be any continuous number between infinities then you can use linear function as an Activation function. The graph and formula is shown below-
  • 28.
    Neuron Computation • Letus try to compute adjacent NN using Sigmoid activation function. • Let, In 1 = 0.1 , Wt 1=0.01 and In 2=0.5 and Wt 2=0.02 Therefore, ∑ = 0.1 *0.01 + 0.5*0.02 = 0.011 Using Sigmoid Activation function A on ∑ Therefore, A(∑) = A(0.011) = 1 1+𝑒(−𝑥) where x=0.011 A(∑) = 1 1+𝑒(−0.011) = 0.05027
  • 29.
    Neuron Computation • Supposethe ideal output for this combination of inputs (In1 and In2) is 0.06 • The predicted output from our ANN model is 0.05027 which is very close to the ideal value. And hence our prediction could be considered as useful. • Suppose if the predicted output was less than or equal to 0.04 then there would have been a significant error E= 0.06 – 0.04 = 0.02
  • 30.
    Training a Model •Even after using multiple Activation function also there are every chance that the prediction made by our model is inaccurate. Hence, there is a need to train the model again and again (i.e. learning process) so that this Error could be reduced for any given Activation function. • Training a model is a process of updating the weights across the network so that the error E could be minimized. • For this, we suppose a Threshold value of error (say H). We train our model till the time the error of our model is reduced to this threshold value H. • At this stage where the model error is reduced to threshold limit we say that our Model is trained and it is ready for prediction. • Deciding value for threshold limit (H) also plays major role in training the model. if, H is very small then our model could be Over Trained, if H is very high then our model could be Under Trained.
  • 31.
    Training Algorithm • Inorder to train a model we require some kind of training algorithm which can take care of adjusting the weights in such a way that the global error is reduced to threshold limit (H). • There are various Training algorithm as named below- Back propagation algorithm Resilient propagation algorithm Quick propagation algorithm LMA algorithm • Different training algorithms Different rules to update weights and they have their own learning rate Different global error calculation method (like Gradient descent etc.) Different flow chart
  • 32.
  • 33.
    Neural Network Structure Typeof neurons  Input neurons (I) : No processing, used to provide input signals  Output neurons (O) : Processing units, used to get output  Hidden neurons (H) : Additional processing units used to converge the solution  Bias neurons (B): Used to get non-zero result even if input is zero I1 I2 H1 H2 B1 O1 O2 0.1A 0.2 0.06A 0.05A 0.03A 0.04A 0.5035A 0.5027A 1.0A 0.05A 0.03A 0.02A 0.01 0.06 0.03 0.51506 A 0.516294 A
  • 34.
    Neural Network Computation •H1: Sum = 0.1 * 0.06+ 0.2 * 0.04 = 0.014 Output = A(Sum) = 0.5035 • H2 : Sum = 0.1 * 0.05+ 0.2 * 0.03 = 0.011 Output = A(Sum) = 0.50275 • O1 : Sum = 0.5035 * 0.05+ 0.50275 * 0.03 + 1*0.02 = 0.060258 Output = A(Sum) = 0.51506 • O2 : Sum = 0.5035 * 0.01+ 0.50275 * 0.06 + 1*0.03 = 0.0652 Output = A(Sum) = 0.516294 In order to calculate the value for H1 , H2 and O1 , O2 we have used Sigmoid Activation Function A(∑) = 1 1+𝑒(−𝑥)
  • 35.
    Model Validation • Rawdata is available which is divided into two sets of data. One set is known as Training data and another Validation data. • Training data and Validation data are Normalized. • Training normalized data is fed into a Model. • This Model is trained (by training algorithm) such that it produces minimal global error (by minimizing the local errors). This gives us Trained Model. • Normalized validation data is fed into the Trained model. • Normalized output is recovered. • This output is De-normalized to give us Validation Result.
  • 36.
    Is this aDisruptive Technology? • Yes Machine Learning is an example of Disruptive Technology. • It is replacing our conventional way of working on classification problems, prediction problems and linear results problems. • Creating footholds in the low end market and even creating new market. • Turning non-customers to customers. • Example – Email services disrupted postal services , Smart phones disrupted basic phones , 3G/4G over Edge , HD Live Streaming over buffering etc.
  • 37.
    ENCOG Framework • ENCOGis open source Machine Learning framework. • It supports almost all the ML algorithms and their architectures.  Neural Networks  Bayesian Networks  Clustering  Genetic Algorithms  Hidden Markov Models  Particle Swarm Optimization  Simulated Annealing  Support Vector Machines • It supports almost all the training algorithms.  ADALINE Training  Backpropagation  Competitive Learning  Genetic Algorithm Training  Hopfield Learning  Instar & Outstar Training  Levenberg Marquardt (LMA)  Manhattan Update Rule Propagation  Nelder Mead Training We will be using ENCOG framework to build and train our Network.
  • 38.
    PROBLEM STATEMENT Demonstrate ANDGate output prediction through ML program using ENCOG framework
  • 39.
    Program Sample • Ihave created a simple C# console application in VS 2015. • I have referred Encog core library (downloaded form ENCOG site). • The steps for creating a program are as follows: Create Input Data and Ideal Output Create Network Perform Training of the model using looping construct Perform Evaluation
  • 40.
    Input and Idealdata • First of all add a reference of Encog library in your project. • Input Data and Ideal data for AND Gate. Note: All data is an array of double type. • Construct a data set from Input and Ideal data.
  • 41.
    Create Network • Weare creating Basic Neural network (though there are many others as well). • This network has 3 layers of Neuron. • 1st Layer has 2 input neuron and 1 bias neuron. The property value “true” signifies whether there are any Bias neuron or not. • 2nd Layer has 2 hidden neuron and 1 bias neuron. • 3rd Layer has 1 output neuron. • 2nd and 3rd layer has Activation function as well. We are using Sigmoid activation function though there are many others as well.
  • 42.
    Train the network •We are using Basic Resilient propagation training algorithm. • We are using do…while loop to iterate the training process on the network for the training dataset. • We have given a Threshold value of 0.0168. • Our network will be trained till the error is reduced to this threshold limit. • As discussed earlier, choosing this value is hit and trial. In such a case the number of training iterations will be very less. • A very high value could lead to Under trained network. In such a case the number of training iterations will be very high. • A very low value could lead to Over trained network.
  • 43.
    Evaluation • We areiterating over the training dataset and computing the Inputs over the network to get the predicted output and printing them.
  • 44.
    Output The above twooutputs are for the same Activation function, same inputs, same Propagation algorithm, same network. It is true that not every time the number of iterations will be same or nearly equal for same criteria and this is how NN works. You can see that in order to generate output the first execution took 59 iterations while the second took merely 16 iteration.
  • 45.
    Analysis of Outputs •Analysis of first output screen • Analysis of second output screen Input Combination Ideal Output Predicted Output 0,0 0 1E-20 i.e. 1* 10^-20 1,0 0 0.104 0,1 0 0.108 1,1 1 0.844 Input Combination Ideal Output Predicted Output 0,0 0 0.005 1,0 0 0.080 0,1 0 0.077 1,1 1 0.856
  • 46.
    Comparison of Propagations ResilientPropagation Manhattan Propagation
  • 47.
    Outcome • We changedthe propagation algorithm and saw that with Manhattan propagation algorithm it takes nearly 64 iterations to reduce the network error to 0.0168. • There are various others training algorithms as well. • Feel free to try different combinations of different Activation functions, propagation algorithms , threshold values , network layer neurons and analyze the outcomes. • I will be discussing various other Activation functions and Propagation algorithms in my next presentation.
  • 48.
    THANK YOU FORLEARNING

Editor's Notes

  • #46 As is evident from the two tables that each predicted output is nearly equal to the Ideal output hence we can say that out Predictor program is giving proper predictions. And this is ready for any further unforeseen data to predict the outcome.