SlideShare a Scribd company logo
26/3/2008 1
Genetic Algorithm
Genetic Algorithms (GA) apply an evolutionary
approach to inductive learning. GA has been
successfully applied to problems that are
difficult to solve using conventional techniques
such as scheduling problems, traveling
salesperson problem, network routing problems
and financial marketing.
26/3/2008 2
Supervised genetic learning
26/3/2008 3
Genetic learning algorithm
• Step 1: Initialize a population P of n elements
as a potential solution.
• Step 2: Until a specified termination condition
is satisfied:
– 2a: Use a fitness function to evaluate each
element of the current solution. If an element
passes the fitness criteria, it remains in P.
– 2b: The population now contains m elements (m
<= n). Use genetic operators to create (n – m)
new elements. Add the new elements to the
population.
26/3/2008 4
Digitalized Genetic knowledge
representation
• A common technique for representing
genetic knowledge is to transform
elements into binary strings.
• For example, we can represent income
range as a string of two bits for assigning
“00” to 20-30k, “01” to 30-40k, and “11” to
50-60k.
26/3/2008 5
Genetic operator - Crossover
• The elements most often used for
crossover are those destined to be
eliminated from the population.
• Crossover forms new elements for the
population by combining parts of two
elements currently in the population.
26/3/2008 6
Genetic operator - Mutation
• Mutation is sparingly applied to elements
chosen for elimination.
• Mutation can be applied by randomly
flipping bits (or attribute values) within a
single element.
26/3/2008 7
Genetic operator - Selection
• Selection is to replace to-be-deleted
elements by copies of elements that pass
the fitness test with high scores.
• With selection, the overall fitness of the
population is guaranteed to increase.
26/3/2008 8
Step 1 of Supervised genetic learning
This step initializes a population P of
elements. The P referred to population
elements. The process modifies the
elements of the population until a
termination condition is satisfied, which
might be all elements of the population
meet some minimum criteria. An
alternative is a fixed number of iterations
of the learning process.
26/3/2008 9
Step 2 of supervised genetic learning
Step 2a applies a fitness function to
evaluate each element currently in the
population. With each iteration, elements
not satisfying the fitness criteria are
eliminated from the population. The final
result of a supervised genetic learning
session is a set of population elements
that best represents the training data.
26/3/2008 10
Step 2 of supervised genetic learning
Step 2b adds new elements to the
population to replace any elements
eliminated in step 2a. New elements are
formed from previously deleted elements
by applying crossover and mutation.
26/3/2008 11
An initial population for supervised
genetic learning example
Population
element
Income
Range
Life
Insurance
Promotion
Credit Card
Insurance
Sex Age
1 20-30k No Yes Male 30-39
2 30k-40k Yes No Female 50-59
3 ? No No Male 40-49
4 30k-40k Yes Yes Male 40-49
26/3/2008 12
Question mark in population
A question mark in the population means
that it is a “don’t care” condition, which
implied that the attribute is not important to
the learning process.
1/4/2008 13
Training Data for Genetic Learning
Training
Instance
Income Range Life Insurance
Promotion
Credit Card
Insurance
Sex Age
1 30-40k Yes Yes Male 30-39
2 30-40k Yes No Female 40-49
3 50-60k Yes No Female 30-39
4 20-30k No No Female 50-59
5 20-30k No No Male 20-29
6 30-40k No No Male 40-49
26/3/2008 14
Goal and condition
• Our goal is to create a model able to
differentiate individuals who have
accepted the life insurance promotion
from those who have not.
• We require that after each iteration of the
algorithm, exactly two elements from each
class (life insurance promotion=yes) & (life
insurance promotion=no) remain in the
population.
26/3/2008 15
Fitness Function
1. Let N be the number of matches of the input
attribute values of E with training instances
from its own class.
2. Let M be the number of input attribute value
matches to all training instances from the
competing classes.
3. Add 1 to M.
4. Divide N by M.
Note: the higher the fitness score, the smaller will
be the error rate for the solution.
26/3/2008 16
Fitness function for element 1 own
class of life insurance promotion = no
1. Income Range = 20-30k matches with
training instances 4 and 5.
2. No matches for Credit Card
Insurance=yes
3. Sex=Male matches with training
instances 5 and 6.
4. No matches for Age=30-39.
5. ∴ N = 4
26/3/2008 17
Fitness function for element 1 of competing
class of life insurance promotion = yes
1. No matches for Income Range=20-30k
2. Credit Card Insurance=yes matches with
training instance 1.
3. Sex=Male matches with training instance 1.
4. Age=30-39 matches with training instances 1
and 3.
5. ∴M = 4
6. ∴F(1) = 4 / 5 = 0.8
7. Similarly F(2)=0.86, F(3)=1.2, F(4)=1.0
26/3/2008 18
Crossover operation for elements 1 & 2
1/4/2008 19
A Second-Generation Population
Population
element
Income
Range
Life
Insurance
Promotion
Credit Card
Insurance
Sex Age
1 20-30k No No Female 50-59
2 30k-40k Yes Yes Male 30-39
3 ? No No Male 40-49
4 30k-40k Yes Yes Male 40-49
26/3/2008 20
Application of the model(test
phase)
• To use the model, we can compare a new
unknown instance (test data) with the elements
of the final population. A simple technique is to
give the unknown instance the same
classification as the population element to which
it is most similar.
• The algorithm then randomly chooses one of the
m elements and gives the unknown instance the
classification of the randomly selected element.
26/3/2008 21
Genetic Algorithms & unsupervised Clustering
Suppose there are P data instances within
the space where each data instance
consists of n attribute values. Suppose m
clusters are desired. The model will
generate k possible solutions. A specific
solution contains m n-dimensional points,
where each point is a best current
representative element for one of the m
clusters.
26/3/2008 22
For example, S2 represents one of the k possible solutions
and contains two elements E21 and E22.
26/3/2008 23
Crossover operation
A crossover operation is accomplished by
moving elements (n-dimensional points)
from solution Si to solution Sj. There are
several possibilities for implementing
mutation operations. One way to mutate
solution Si is to swap one or more point
coordinates of the elements within Si.
26/3/2008 24
Fitness function
An applicable fitness function for solution Sj is the
average Euclidean distance of the P instances in
the n-dimensional space from their closest
element within Sj. We take each instance I in P
and compute the Enclidean distance from I to
each of the m elements in Sj. Lower values
represent better fitness scores. Once genetic
learning terminates, the best of the k possible
solutions is selected as the final solution. Each
instance in the n-dimensional space is assigned
to the cluster associated with its closest element
in the final solution.
26/3/2008 25
Training data set for unsupervised GA
Instance X Y
1 1.0 1.5
2 1.0 4.5
3 2.0 1.5
4 2.0 3.5
5 3.0 2.5
6 5.0 6.0
26/3/2008 26
Fitness function for unsupervised GA
We apply fitness function to the Training data.
We instruct the algorithm to start with a solution
set consisting of three plausible solutions (k=3).
With m=2, P=6, and k=3, the algorithm
generates the initial set of solutions. An
element in the solution space contains a single
representative data point for each cluster. For
example, the data points for solution S1 are
(1,0, 1.0) and (5.0,5.0).
26/3/2008 27
Euclidean distance
)||...|||(|),( 22
22
2
11 pp j
x
i
x
j
x
i
x
j
x
i
xjid −++−+−=
Fitness score of d(1.0, 1.0) and d(5.0, 5.0)
= min ( Squareroot( |1.0 – 1.0|2
+ |1.0 – 1.5|2
), Squareroot( |5.0 – 1.0|2
+ |5.0 – 1.5|2
) +
min ( Squareroot( |1.0 – 1.0|2
+ |1.0 – 4.5|2
), Squareroot( |5.0 – 1.0|2
+ |5.0 – 4.5|2
) +
min ( Squareroot( |1.0 – 2.0|2
+ |1.0 – 1.5|2
), Squareroot( |5.0 – 2.0|2
+ |5.0 – 1.5|2
) +
min ( Squareroot( |1.0 – 2.0|2
+ |1.0 – 3.5|2
), Squareroot( |5.0 – 2.0|2
+ |5.0 – 3.5|2
) +
min ( Squareroot( |1.0 – 3.0|2
+ |1.0 – 2.5|2
), Squareroot( |5.0 – 3.0|2
+ |5.0 – 2.5|2
) +
min ( Squareroot( |1.0 – 5.0|2
+ |1.0 – 6.0|2
), Squareroot( |5.0 – 5.0|2
+ |5.0 – 6.0|2
)
= 0.5 + 3.5 + 1.11 + 2.69 + 2.5 + 1
= 11.3
26/3/2008 28
Solution Population for
unsupervised Clustering
S1 S2 S3
Solution elements (1.0,1.0) (3.0,2.0) (4.0,3.0)
(initial population) (5.0,5.0) (3.0,5.0) (5.0,1.0)
Fitness score 11.31 9.78 15.55
-----------------------------------------------------------------------------------------------------------------------------
Solution elements (5.0,1.0) (3.0,2.0) (4.0,3.0)
(second generation)(5.0,5.0) (3.0,5.0) (1.0,1.0)
Fitness score 17.96 9.78 11.34
-----------------------------------------------------------------------------------------------------------------------------
Solution elements (5.0,5.0) (3.0,2.0) (4.0,3.0)
(third generation) (1.0,5.0) (3.0,5.0) (1.0,1.0)
Fitness score 13.64 9.78 11.34
-----------------------------------------------------------------------------------------------------------------------------
26/3/2008 29
First Generation Solution
To compute the fitness score of 11.31 for solution
S1 the Euclidean distance between each
instance and its closest data point in S1 is
summed. To illustrate this, consider instance 1
in training data. The Euclidean distance between
(1.0,1.0) and (1.0,1.5) is computed as 0.50. The
distance between (5.0,5.0) and (1.0,1.5) is 5.32.
The smaller value of 0.50 is represented in the
overall fitness score for solution S1. S2 is the
best first-generation solution.
26/3/2008 30
Second Generation Solution
The second generation is obtained by
performing a crossover between solutions
S1 and S3 with solution element (1.0,1.0)
in S1 exchanging places with solution
element (5.0,1.0) is S3. The result of the
crossover operation improves (decreases)
the fitness score for S3 while the score for
S1 increases.
26/3/2008 31
(Final) Third Generation Solution
The third generation is acquired by mutating
S1. The mutation interchanges the y-
coordinate of the first element in S1 with
the x-coordinate of the second element.
The mutation results in an improved
fitness score for S1. Mutation and
crossover continue until a termination
condition is satisfied. If the third
generation is terminal, then the final
solution is S2.
26/3/2008 32
Solution for Clustering
If S2 (3.0, 2.0) and (3.0, 5.0) is the final solution,
then computing the distances between S2 and
the following points are:
Instances 1, 3 and 5 forming one cluster and
instances 2 and 6 forming second cluster, and
instance 4 can be in either clusters.
Cluster 1 center (3.0, 2.0)
Instance X Y
1.0 1.5
2.0 1.5
3.0 2.5
Cluster 2 center (3.0, 5.0)
Instance X Y
1.0 4.5
2.0 3.5
5.0 6.0
26/3/2008 33
General considerations for GA
• GA are designed to find globally optimized
solutions.
• The fitness function determines the
computation complexity of a genetic
algorithm.
• GA explain their results to the extent that
the fitness function is understandable.
• Transforming the data to a form suitable
for a genetic algorithm can be a challenge.
26/3/2008 34
Choosing a data mining technique
Given a set of data containing attributes and
values to be mined together with information
about the nature of the data and the problem to
be solved, determine an appropriate data
mining technique.
26/3/2008 35
Considerations for choosing data
mining techniques
• Is learning supervised or unsupervised?
• Do we require a clear explanation about the
relationships present in the data?
• Is there one set of input attributes and one set of
output attributes or can attributes interact with one
another in several ways?
• Is the input data categorical, numeric, or a
combination of both?
• If learning is supervised, is there one output attribute
or are there several output attributes? Are the output
attribute(s) categorical or numeric?
26/3/2008 36
Behavior of different data mining techniques
1. Neural networks is black-box structured, and is a poor
choice if an explanation about what has been learned is
required.
2. Association rule is a best choice when attributes are
allowed to play multiple roles in the data mining
process.
3. Decision trees can determine attributes most predictive
of class membership.
4. Neural networks and clustering assume attributes to be
of equal importance.
5. Neural networks tend to outperform other models when
a wealth of noisy data are present.
6. Algorithms for building decision trees typically execute
faster than neural network or genetic learning.
7. Genetic algorithms is typically used for problems that
cannot be solved with traditional techniques.
26/3/2008 37
Review question 10
Given the following training data set
Training instance Income range Credit card insurance Sex Age
1 30-40k Yes Male 30-39
2 30-40k No Female 40-49
3 50-60k No Female 30-39
4 20-30k No Female 50-59
5 20-30k No Male 20-29
6 30-40k No Male 40-49
Describe the steps needed to apply unsupervised
genetic learning to cluster the instances of the credit
card promotion database.
26/3/2008 38
Tutorial Question 10Given the following training data set
Training instance Income range Credit card insurance Sex Age
1 30-40k Yes Male 30-39
2 30-40k No Female 40-49
3 50-60k No Female 30-39
4 20-30k No Female 50-59
5 20-30k No Male 20-29
6 30-40k No Male 40-49
After transforming the input data into numeric such as yes=1, no=2, male=1, female=2,
20-29=1, 30-39=2, 40-49=3, 50-59=4, 20-30k=1, 30-40k=2, 40-50k=3, 50-60k=4, the
training data set becomes:
T(1)=(2,1,1,2)
T(2)=(2,2,2,3)
T(3)=(4,2,2,2)
T(4)=(1,2,2,4)
T(5)=(1,2,1,1)
T(6)=(2,2,1,3)
Assume there are two set of initial population for two clusters as:
Solution 1 of 2 clusters centers: K1(1,1,1,1), (4,2,2,4)
Solution 2 of 2 clusters centers: K2(4,4,4,4), (2,2,1,1)
Choose the best solution based on their fitness function score by use of unsupervised
genetic learning.
26/3/2008 39
Reading assignment
“Data Mining: A Tutorial-based Primer” by
Richard J Roiger and Michael W. Geatz,
published by Person Education in 2003,
pp.89-101.

More Related Content

What's hot

Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
IJERA Editor
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
paperpublications3
 
ADAN Symposium
ADAN SymposiumADAN Symposium
ADAN Symposium
Andrew Barnes
 
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...
CSCJournals
 
11 adaptive testing-irt
11 adaptive testing-irt11 adaptive testing-irt
11 adaptive testing-irt宥均 林
 
New view of fuzzy aggregations. part I: general information structure for dec...
New view of fuzzy aggregations. part I: general information structure for dec...New view of fuzzy aggregations. part I: general information structure for dec...
New view of fuzzy aggregations. part I: general information structure for dec...
Journal of Fuzzy Extension and Applications
 
Unit.2. linear programming
Unit.2. linear programmingUnit.2. linear programming
Unit.2. linear programming
DagnaygebawGoshme
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
Comparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face RecognitionComparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face Recognition
ijdmtaiir
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
Krish_ver2
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar
 
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...
IOSRJMCE
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
Using Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
Using Partitioned Design Matrices in Analyzing Nested-Factorial ExperimentsUsing Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
Using Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
International journal of scientific and technical research in engineering (IJSTRE)
 
G6 m4-a-lesson 2-s
G6 m4-a-lesson 2-sG6 m4-a-lesson 2-s
G6 m4-a-lesson 2-smlabuski
 
Unit.3. duality and sensetivity analisis
Unit.3. duality and sensetivity analisisUnit.3. duality and sensetivity analisis
Unit.3. duality and sensetivity analisis
DagnaygebawGoshme
 
xfem using 1D stefan problem
xfem using 1D stefan problemxfem using 1D stefan problem
xfem using 1D stefan problem
Srinivas Varanasi, Ph.D.
 

What's hot (18)

Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
 
ADAN Symposium
ADAN SymposiumADAN Symposium
ADAN Symposium
 
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...
 
11 adaptive testing-irt
11 adaptive testing-irt11 adaptive testing-irt
11 adaptive testing-irt
 
Nishant_thesis_report
Nishant_thesis_report Nishant_thesis_report
Nishant_thesis_report
 
New view of fuzzy aggregations. part I: general information structure for dec...
New view of fuzzy aggregations. part I: general information structure for dec...New view of fuzzy aggregations. part I: general information structure for dec...
New view of fuzzy aggregations. part I: general information structure for dec...
 
Unit.2. linear programming
Unit.2. linear programmingUnit.2. linear programming
Unit.2. linear programming
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Comparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face RecognitionComparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face Recognition
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa Naik
 
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
Using Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
Using Partitioned Design Matrices in Analyzing Nested-Factorial ExperimentsUsing Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
Using Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
 
G6 m4-a-lesson 2-s
G6 m4-a-lesson 2-sG6 m4-a-lesson 2-s
G6 m4-a-lesson 2-s
 
Unit.3. duality and sensetivity analisis
Unit.3. duality and sensetivity analisisUnit.3. duality and sensetivity analisis
Unit.3. duality and sensetivity analisis
 
xfem using 1D stefan problem
xfem using 1D stefan problemxfem using 1D stefan problem
xfem using 1D stefan problem
 

Similar to 5233777

Genetic Algorithm for Solving Balanced Transportation Problem
Genetic Algorithm for Solving Balanced  Transportation ProblemGenetic Algorithm for Solving Balanced  Transportation Problem
Genetic Algorithm for Solving Balanced Transportation Problem
Lyceum of the Philippines University Batangas
 
D05511625
D05511625D05511625
D05511625
IOSR-JEN
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
IRJET Journal
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET Journal
 
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance ExpensesIntegrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
inventionjournals
 
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance ExpensesIntegrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
inventionjournals
 
Best Student Selection Using Extended Promethee II Method
Best Student Selection Using Extended Promethee II MethodBest Student Selection Using Extended Promethee II Method
Best Student Selection Using Extended Promethee II Method
Universitas Pembangunan Panca Budi
 
Q130402109113
Q130402109113Q130402109113
Q130402109113
IOSR Journals
 
---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf
---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf
---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf
uroosavayani
 
The Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHPThe Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHP
Dr. Amarjeet Singh
 
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
IOSR Journals
 
Higgs bosob machine learning challange
Higgs bosob machine learning challangeHiggs bosob machine learning challange
Higgs bosob machine learning challange
Tharindu Ranasinghe
 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - Kaggle
Sajith Edirisinghe
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
IJRES Journal
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
IJERA Editor
 
What makes a good adaptive testing program
What makes a good adaptive testing programWhat makes a good adaptive testing program
What makes a good adaptive testing program
Michael Birdsall
 
chap6_advanced_association_analysis.pptx
chap6_advanced_association_analysis.pptxchap6_advanced_association_analysis.pptx
chap6_advanced_association_analysis.pptx
GautamDematti1
 
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
IOSR Journals
 

Similar to 5233777 (20)

Genetic Algorithm for Solving Balanced Transportation Problem
Genetic Algorithm for Solving Balanced  Transportation ProblemGenetic Algorithm for Solving Balanced  Transportation Problem
Genetic Algorithm for Solving Balanced Transportation Problem
 
D05511625
D05511625D05511625
D05511625
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using Clustering
 
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance ExpensesIntegrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
 
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance ExpensesIntegrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
 
Best Student Selection Using Extended Promethee II Method
Best Student Selection Using Extended Promethee II MethodBest Student Selection Using Extended Promethee II Method
Best Student Selection Using Extended Promethee II Method
 
Q130402109113
Q130402109113Q130402109113
Q130402109113
 
---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf
---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf
---Orientation-Session---Business-Statistics-22092023-044926pm (2).pdf
 
report
reportreport
report
 
The Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHPThe Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHP
 
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
 
Higgs bosob machine learning challange
Higgs bosob machine learning challangeHiggs bosob machine learning challange
Higgs bosob machine learning challange
 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - Kaggle
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
CMT
CMTCMT
CMT
 
What makes a good adaptive testing program
What makes a good adaptive testing programWhat makes a good adaptive testing program
What makes a good adaptive testing program
 
chap6_advanced_association_analysis.pptx
chap6_advanced_association_analysis.pptxchap6_advanced_association_analysis.pptx
chap6_advanced_association_analysis.pptx
 
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...
 

Recently uploaded

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 

Recently uploaded (20)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 

5233777

  • 1. 26/3/2008 1 Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult to solve using conventional techniques such as scheduling problems, traveling salesperson problem, network routing problems and financial marketing.
  • 3. 26/3/2008 3 Genetic learning algorithm • Step 1: Initialize a population P of n elements as a potential solution. • Step 2: Until a specified termination condition is satisfied: – 2a: Use a fitness function to evaluate each element of the current solution. If an element passes the fitness criteria, it remains in P. – 2b: The population now contains m elements (m <= n). Use genetic operators to create (n – m) new elements. Add the new elements to the population.
  • 4. 26/3/2008 4 Digitalized Genetic knowledge representation • A common technique for representing genetic knowledge is to transform elements into binary strings. • For example, we can represent income range as a string of two bits for assigning “00” to 20-30k, “01” to 30-40k, and “11” to 50-60k.
  • 5. 26/3/2008 5 Genetic operator - Crossover • The elements most often used for crossover are those destined to be eliminated from the population. • Crossover forms new elements for the population by combining parts of two elements currently in the population.
  • 6. 26/3/2008 6 Genetic operator - Mutation • Mutation is sparingly applied to elements chosen for elimination. • Mutation can be applied by randomly flipping bits (or attribute values) within a single element.
  • 7. 26/3/2008 7 Genetic operator - Selection • Selection is to replace to-be-deleted elements by copies of elements that pass the fitness test with high scores. • With selection, the overall fitness of the population is guaranteed to increase.
  • 8. 26/3/2008 8 Step 1 of Supervised genetic learning This step initializes a population P of elements. The P referred to population elements. The process modifies the elements of the population until a termination condition is satisfied, which might be all elements of the population meet some minimum criteria. An alternative is a fixed number of iterations of the learning process.
  • 9. 26/3/2008 9 Step 2 of supervised genetic learning Step 2a applies a fitness function to evaluate each element currently in the population. With each iteration, elements not satisfying the fitness criteria are eliminated from the population. The final result of a supervised genetic learning session is a set of population elements that best represents the training data.
  • 10. 26/3/2008 10 Step 2 of supervised genetic learning Step 2b adds new elements to the population to replace any elements eliminated in step 2a. New elements are formed from previously deleted elements by applying crossover and mutation.
  • 11. 26/3/2008 11 An initial population for supervised genetic learning example Population element Income Range Life Insurance Promotion Credit Card Insurance Sex Age 1 20-30k No Yes Male 30-39 2 30k-40k Yes No Female 50-59 3 ? No No Male 40-49 4 30k-40k Yes Yes Male 40-49
  • 12. 26/3/2008 12 Question mark in population A question mark in the population means that it is a “don’t care” condition, which implied that the attribute is not important to the learning process.
  • 13. 1/4/2008 13 Training Data for Genetic Learning Training Instance Income Range Life Insurance Promotion Credit Card Insurance Sex Age 1 30-40k Yes Yes Male 30-39 2 30-40k Yes No Female 40-49 3 50-60k Yes No Female 30-39 4 20-30k No No Female 50-59 5 20-30k No No Male 20-29 6 30-40k No No Male 40-49
  • 14. 26/3/2008 14 Goal and condition • Our goal is to create a model able to differentiate individuals who have accepted the life insurance promotion from those who have not. • We require that after each iteration of the algorithm, exactly two elements from each class (life insurance promotion=yes) & (life insurance promotion=no) remain in the population.
  • 15. 26/3/2008 15 Fitness Function 1. Let N be the number of matches of the input attribute values of E with training instances from its own class. 2. Let M be the number of input attribute value matches to all training instances from the competing classes. 3. Add 1 to M. 4. Divide N by M. Note: the higher the fitness score, the smaller will be the error rate for the solution.
  • 16. 26/3/2008 16 Fitness function for element 1 own class of life insurance promotion = no 1. Income Range = 20-30k matches with training instances 4 and 5. 2. No matches for Credit Card Insurance=yes 3. Sex=Male matches with training instances 5 and 6. 4. No matches for Age=30-39. 5. ∴ N = 4
  • 17. 26/3/2008 17 Fitness function for element 1 of competing class of life insurance promotion = yes 1. No matches for Income Range=20-30k 2. Credit Card Insurance=yes matches with training instance 1. 3. Sex=Male matches with training instance 1. 4. Age=30-39 matches with training instances 1 and 3. 5. ∴M = 4 6. ∴F(1) = 4 / 5 = 0.8 7. Similarly F(2)=0.86, F(3)=1.2, F(4)=1.0
  • 18. 26/3/2008 18 Crossover operation for elements 1 & 2
  • 19. 1/4/2008 19 A Second-Generation Population Population element Income Range Life Insurance Promotion Credit Card Insurance Sex Age 1 20-30k No No Female 50-59 2 30k-40k Yes Yes Male 30-39 3 ? No No Male 40-49 4 30k-40k Yes Yes Male 40-49
  • 20. 26/3/2008 20 Application of the model(test phase) • To use the model, we can compare a new unknown instance (test data) with the elements of the final population. A simple technique is to give the unknown instance the same classification as the population element to which it is most similar. • The algorithm then randomly chooses one of the m elements and gives the unknown instance the classification of the randomly selected element.
  • 21. 26/3/2008 21 Genetic Algorithms & unsupervised Clustering Suppose there are P data instances within the space where each data instance consists of n attribute values. Suppose m clusters are desired. The model will generate k possible solutions. A specific solution contains m n-dimensional points, where each point is a best current representative element for one of the m clusters.
  • 22. 26/3/2008 22 For example, S2 represents one of the k possible solutions and contains two elements E21 and E22.
  • 23. 26/3/2008 23 Crossover operation A crossover operation is accomplished by moving elements (n-dimensional points) from solution Si to solution Sj. There are several possibilities for implementing mutation operations. One way to mutate solution Si is to swap one or more point coordinates of the elements within Si.
  • 24. 26/3/2008 24 Fitness function An applicable fitness function for solution Sj is the average Euclidean distance of the P instances in the n-dimensional space from their closest element within Sj. We take each instance I in P and compute the Enclidean distance from I to each of the m elements in Sj. Lower values represent better fitness scores. Once genetic learning terminates, the best of the k possible solutions is selected as the final solution. Each instance in the n-dimensional space is assigned to the cluster associated with its closest element in the final solution.
  • 25. 26/3/2008 25 Training data set for unsupervised GA Instance X Y 1 1.0 1.5 2 1.0 4.5 3 2.0 1.5 4 2.0 3.5 5 3.0 2.5 6 5.0 6.0
  • 26. 26/3/2008 26 Fitness function for unsupervised GA We apply fitness function to the Training data. We instruct the algorithm to start with a solution set consisting of three plausible solutions (k=3). With m=2, P=6, and k=3, the algorithm generates the initial set of solutions. An element in the solution space contains a single representative data point for each cluster. For example, the data points for solution S1 are (1,0, 1.0) and (5.0,5.0).
  • 27. 26/3/2008 27 Euclidean distance )||...|||(|),( 22 22 2 11 pp j x i x j x i x j x i xjid −++−+−= Fitness score of d(1.0, 1.0) and d(5.0, 5.0) = min ( Squareroot( |1.0 – 1.0|2 + |1.0 – 1.5|2 ), Squareroot( |5.0 – 1.0|2 + |5.0 – 1.5|2 ) + min ( Squareroot( |1.0 – 1.0|2 + |1.0 – 4.5|2 ), Squareroot( |5.0 – 1.0|2 + |5.0 – 4.5|2 ) + min ( Squareroot( |1.0 – 2.0|2 + |1.0 – 1.5|2 ), Squareroot( |5.0 – 2.0|2 + |5.0 – 1.5|2 ) + min ( Squareroot( |1.0 – 2.0|2 + |1.0 – 3.5|2 ), Squareroot( |5.0 – 2.0|2 + |5.0 – 3.5|2 ) + min ( Squareroot( |1.0 – 3.0|2 + |1.0 – 2.5|2 ), Squareroot( |5.0 – 3.0|2 + |5.0 – 2.5|2 ) + min ( Squareroot( |1.0 – 5.0|2 + |1.0 – 6.0|2 ), Squareroot( |5.0 – 5.0|2 + |5.0 – 6.0|2 ) = 0.5 + 3.5 + 1.11 + 2.69 + 2.5 + 1 = 11.3
  • 28. 26/3/2008 28 Solution Population for unsupervised Clustering S1 S2 S3 Solution elements (1.0,1.0) (3.0,2.0) (4.0,3.0) (initial population) (5.0,5.0) (3.0,5.0) (5.0,1.0) Fitness score 11.31 9.78 15.55 ----------------------------------------------------------------------------------------------------------------------------- Solution elements (5.0,1.0) (3.0,2.0) (4.0,3.0) (second generation)(5.0,5.0) (3.0,5.0) (1.0,1.0) Fitness score 17.96 9.78 11.34 ----------------------------------------------------------------------------------------------------------------------------- Solution elements (5.0,5.0) (3.0,2.0) (4.0,3.0) (third generation) (1.0,5.0) (3.0,5.0) (1.0,1.0) Fitness score 13.64 9.78 11.34 -----------------------------------------------------------------------------------------------------------------------------
  • 29. 26/3/2008 29 First Generation Solution To compute the fitness score of 11.31 for solution S1 the Euclidean distance between each instance and its closest data point in S1 is summed. To illustrate this, consider instance 1 in training data. The Euclidean distance between (1.0,1.0) and (1.0,1.5) is computed as 0.50. The distance between (5.0,5.0) and (1.0,1.5) is 5.32. The smaller value of 0.50 is represented in the overall fitness score for solution S1. S2 is the best first-generation solution.
  • 30. 26/3/2008 30 Second Generation Solution The second generation is obtained by performing a crossover between solutions S1 and S3 with solution element (1.0,1.0) in S1 exchanging places with solution element (5.0,1.0) is S3. The result of the crossover operation improves (decreases) the fitness score for S3 while the score for S1 increases.
  • 31. 26/3/2008 31 (Final) Third Generation Solution The third generation is acquired by mutating S1. The mutation interchanges the y- coordinate of the first element in S1 with the x-coordinate of the second element. The mutation results in an improved fitness score for S1. Mutation and crossover continue until a termination condition is satisfied. If the third generation is terminal, then the final solution is S2.
  • 32. 26/3/2008 32 Solution for Clustering If S2 (3.0, 2.0) and (3.0, 5.0) is the final solution, then computing the distances between S2 and the following points are: Instances 1, 3 and 5 forming one cluster and instances 2 and 6 forming second cluster, and instance 4 can be in either clusters. Cluster 1 center (3.0, 2.0) Instance X Y 1.0 1.5 2.0 1.5 3.0 2.5 Cluster 2 center (3.0, 5.0) Instance X Y 1.0 4.5 2.0 3.5 5.0 6.0
  • 33. 26/3/2008 33 General considerations for GA • GA are designed to find globally optimized solutions. • The fitness function determines the computation complexity of a genetic algorithm. • GA explain their results to the extent that the fitness function is understandable. • Transforming the data to a form suitable for a genetic algorithm can be a challenge.
  • 34. 26/3/2008 34 Choosing a data mining technique Given a set of data containing attributes and values to be mined together with information about the nature of the data and the problem to be solved, determine an appropriate data mining technique.
  • 35. 26/3/2008 35 Considerations for choosing data mining techniques • Is learning supervised or unsupervised? • Do we require a clear explanation about the relationships present in the data? • Is there one set of input attributes and one set of output attributes or can attributes interact with one another in several ways? • Is the input data categorical, numeric, or a combination of both? • If learning is supervised, is there one output attribute or are there several output attributes? Are the output attribute(s) categorical or numeric?
  • 36. 26/3/2008 36 Behavior of different data mining techniques 1. Neural networks is black-box structured, and is a poor choice if an explanation about what has been learned is required. 2. Association rule is a best choice when attributes are allowed to play multiple roles in the data mining process. 3. Decision trees can determine attributes most predictive of class membership. 4. Neural networks and clustering assume attributes to be of equal importance. 5. Neural networks tend to outperform other models when a wealth of noisy data are present. 6. Algorithms for building decision trees typically execute faster than neural network or genetic learning. 7. Genetic algorithms is typically used for problems that cannot be solved with traditional techniques.
  • 37. 26/3/2008 37 Review question 10 Given the following training data set Training instance Income range Credit card insurance Sex Age 1 30-40k Yes Male 30-39 2 30-40k No Female 40-49 3 50-60k No Female 30-39 4 20-30k No Female 50-59 5 20-30k No Male 20-29 6 30-40k No Male 40-49 Describe the steps needed to apply unsupervised genetic learning to cluster the instances of the credit card promotion database.
  • 38. 26/3/2008 38 Tutorial Question 10Given the following training data set Training instance Income range Credit card insurance Sex Age 1 30-40k Yes Male 30-39 2 30-40k No Female 40-49 3 50-60k No Female 30-39 4 20-30k No Female 50-59 5 20-30k No Male 20-29 6 30-40k No Male 40-49 After transforming the input data into numeric such as yes=1, no=2, male=1, female=2, 20-29=1, 30-39=2, 40-49=3, 50-59=4, 20-30k=1, 30-40k=2, 40-50k=3, 50-60k=4, the training data set becomes: T(1)=(2,1,1,2) T(2)=(2,2,2,3) T(3)=(4,2,2,2) T(4)=(1,2,2,4) T(5)=(1,2,1,1) T(6)=(2,2,1,3) Assume there are two set of initial population for two clusters as: Solution 1 of 2 clusters centers: K1(1,1,1,1), (4,2,2,4) Solution 2 of 2 clusters centers: K2(4,4,4,4), (2,2,1,1) Choose the best solution based on their fitness function score by use of unsupervised genetic learning.
  • 39. 26/3/2008 39 Reading assignment “Data Mining: A Tutorial-based Primer” by Richard J Roiger and Michael W. Geatz, published by Person Education in 2003, pp.89-101.