The document describes genetic algorithms and their application to supervised and unsupervised learning problems. It provides details on the genetic learning process, including initializing a population, evaluating fitness, and applying genetic operators like crossover and mutation to generate new solutions. An example of applying genetic algorithms to supervised learning for classifying customers as interested or not interested in life insurance is presented. The document also discusses using genetic algorithms for unsupervised clustering, providing examples of calculating fitness scores and evolving solution populations. Key aspects of choosing genetic algorithms or other data mining techniques are outlined.
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
The DEMATEL method is then a good technique for
making decisions. In this paper we analyzed the risk factors of
youth violence and what makes them more aggressive. Since
there are more risk factors of youth violence, to relate each
other more complex to construct FCM and analyze them.
Moreover the data is an unsupervised one obtained from
survey as well as interviews. Hence fuzzy alone has the
capacity to analyses these concepts.
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...ijfls
The aim of this paper is to investigate the trend of the return of a portfolio formed randomly or for any
specific technique. The approach is made using two techniques fuzzy: fuzzy c-means (FCM) algorithm and
the fuzzy transform, where the rules used at fuzzy transform arise from the application of the FCM
algorithm. The results show that the proposed methodology is able to predict the trend of the return of a
stock portfolio, as well as the tendency of the market index. Real data of the financial market are used from
2004 until 2007.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
The DEMATEL method is then a good technique for
making decisions. In this paper we analyzed the risk factors of
youth violence and what makes them more aggressive. Since
there are more risk factors of youth violence, to relate each
other more complex to construct FCM and analyze them.
Moreover the data is an unsupervised one obtained from
survey as well as interviews. Hence fuzzy alone has the
capacity to analyses these concepts.
PREDICTIVE EVALUATION OF THE STOCK PORTFOLIO PERFORMANCE USING FUZZY CMEANS A...ijfls
The aim of this paper is to investigate the trend of the return of a portfolio formed randomly or for any
specific technique. The approach is made using two techniques fuzzy: fuzzy c-means (FCM) algorithm and
the fuzzy transform, where the rules used at fuzzy transform arise from the application of the FCM
algorithm. The results show that the proposed methodology is able to predict the trend of the return of a
stock portfolio, as well as the tendency of the market index. Real data of the financial market are used from
2004 until 2007.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...IJERA Editor
In the present paper a numerical attempt is made to study the combined effects of heat source and sink on unsteady laminar boundary layer flow of a viscous, incompressible, electrically conducting fluid along a semiinfinite vertical plate. A magnetic field of uniform strength is applied normal to the flow. The governing boundary layer equations are solved numerically, using Crank-Nicolson method. Graphical results of velocity and temperature fields, tabular values of Skin-friction and Nusselt are presented and discussed at various parametric conditions. From this study, it is found that the velocity and temperature of the fluid increase in the presence of heat source but they decrease in the presence of heat absorption parameter.
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
Abstract: In this work, the fuzzy nonlinear programming problem (FNLPP) has been developed and their result have also discussed. The numerical solutions of crisp problems and have been compared and the fuzzy solution and its effectiveness have also been presented and discussed. The penalty function method has been developed and mixed with Nelder and Mend’s algorithm of direct optimization problem solutionhave been used together to solve this FNLPP.
Keyword:Fuzzy set theory, fuzzy numbers, decision making, nonlinear programming, Nelder and Mend’s algorithm, penalty function method.
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...CSCJournals
This work investigated the consistency of both the category-level and the map-level accuracy measures for different scenarios and features using Support Vector Machine. It was verified that the classification scenario and the features adopted have not influenced the accuracy measure consistency and all accuracy measures are highly positively correlated.
The Ordered Weighted Averaging (OWA) operator was introduced by Yager [57] to provide a method for aggregating inputs that lie between the max and min operators. In this article two variants of probabilistic extensions the OWA operator-POWA and FPOWA (introduced by Merigo [26] and [27]) are considered as a basis of our generalizations in the environment of fuzzy uncertainty (parts II and III of this work), where different monotone measures (fuzzy measure) are used as uncertainty measures instead of the probability measure. For the identification of “classic” OWA and new operators (presented in parts II and III) of aggregations, the Information Structure is introduced where the incomplete available information in the general decision-making system is presented as a condensation of uncertainty measure, imprecision variable and objective function of weights.
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
Face recognition is used in wide range of application.
In recent years, face recognition has become one of the most
successful applications in image analysis and understanding.
Different statistical method and research groups reported a
contradictory result when comparing principal component
analysis (PCA) algorithm, independent component analysis
(ICA) algorithm, and linear discriminant analysis (LDA)
algorithm that has been proposed in recent years. The goal of
this paper is to compare and analyze the three algorithms and
conclude which is best. Feret Dataset is used for consistency
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...IOSRJMCE
In this paper we use fuzzy set of methods to solve one of the important problems in mechanical engineering: Reliability of Extending the Service Life of Rolling Stock by using Fuzzy Fatigue failure model. The residual service life for rolling stock can be changed depending on its use conditions. This paper presents a new method depending on fuzzy set theory by using the fatigue stress mathematical model to determine the residual service for rolling stock with the value of risk of its use in future. The proposed method used solid works and (Ansys) abilities with especial Fuzzy logic programs in MATLAB.
Using QR Decomposition to calculate the sum of squares of a model has a limitation that the number of rows,
which is also the number of observations or responses, has to be greater than the total number of parameters used in the
model. The main goal in the experimental design model, as a part of the Linear Model, is to analyze the estimable function
of the parameters used in the model. In order not to deal with generalized invers, partitioned design matrix may be used
instead. This partitioned design matrix method may be used to calculate the sum of squares of the models whenever the total
number of parameters is greater than the number of observations. It can also be used to find the degrees of freedom of each
source of variation components. This method is discussed in a Balanced Nested-Factorial Experimental Design.
A Transportation Problem is
one of the
most
typical
problems being encountered in many situations
and
it
has
many
practical applic
ations. Many researches had been conducted
and
many methods
had been proposed to solve it. One of the most
difficult challenge in solving the problem deals with inputting a
very large volume of data. With the development of intelligent
technologies, compu
ters had already been used to solved this
problem. This paper presents a method using Genetic Algorithm
(GA) t
o provide a new tool that can quickly calculate the solution
to the Balanced Transportation Problem.
The test results are compared with selected o
ld methods to
confirm the effectiveness of the use of GA. A
mathematical model
was used to represent the GA and be applied to solve it. Finally,
the test results of the model were presented so show the
effectiveness.
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...IJERA Editor
In the present paper a numerical attempt is made to study the combined effects of heat source and sink on unsteady laminar boundary layer flow of a viscous, incompressible, electrically conducting fluid along a semiinfinite vertical plate. A magnetic field of uniform strength is applied normal to the flow. The governing boundary layer equations are solved numerically, using Crank-Nicolson method. Graphical results of velocity and temperature fields, tabular values of Skin-friction and Nusselt are presented and discussed at various parametric conditions. From this study, it is found that the velocity and temperature of the fluid increase in the presence of heat source but they decrease in the presence of heat absorption parameter.
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
Abstract: In this work, the fuzzy nonlinear programming problem (FNLPP) has been developed and their result have also discussed. The numerical solutions of crisp problems and have been compared and the fuzzy solution and its effectiveness have also been presented and discussed. The penalty function method has been developed and mixed with Nelder and Mend’s algorithm of direct optimization problem solutionhave been used together to solve this FNLPP.
Keyword:Fuzzy set theory, fuzzy numbers, decision making, nonlinear programming, Nelder and Mend’s algorithm, penalty function method.
A Comparison of Accuracy Measures for Remote Sensing Image Classification: Ca...CSCJournals
This work investigated the consistency of both the category-level and the map-level accuracy measures for different scenarios and features using Support Vector Machine. It was verified that the classification scenario and the features adopted have not influenced the accuracy measure consistency and all accuracy measures are highly positively correlated.
The Ordered Weighted Averaging (OWA) operator was introduced by Yager [57] to provide a method for aggregating inputs that lie between the max and min operators. In this article two variants of probabilistic extensions the OWA operator-POWA and FPOWA (introduced by Merigo [26] and [27]) are considered as a basis of our generalizations in the environment of fuzzy uncertainty (parts II and III of this work), where different monotone measures (fuzzy measure) are used as uncertainty measures instead of the probability measure. For the identification of “classic” OWA and new operators (presented in parts II and III) of aggregations, the Information Structure is introduced where the incomplete available information in the general decision-making system is presented as a condensation of uncertainty measure, imprecision variable and objective function of weights.
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
Face recognition is used in wide range of application.
In recent years, face recognition has become one of the most
successful applications in image analysis and understanding.
Different statistical method and research groups reported a
contradictory result when comparing principal component
analysis (PCA) algorithm, independent component analysis
(ICA) algorithm, and linear discriminant analysis (LDA)
algorithm that has been proposed in recent years. The goal of
this paper is to compare and analyze the three algorithms and
conclude which is best. Feret Dataset is used for consistency
Fuzzy Fatigue Failure Model to Estimate the Reliability of Extend the Service...IOSRJMCE
In this paper we use fuzzy set of methods to solve one of the important problems in mechanical engineering: Reliability of Extending the Service Life of Rolling Stock by using Fuzzy Fatigue failure model. The residual service life for rolling stock can be changed depending on its use conditions. This paper presents a new method depending on fuzzy set theory by using the fatigue stress mathematical model to determine the residual service for rolling stock with the value of risk of its use in future. The proposed method used solid works and (Ansys) abilities with especial Fuzzy logic programs in MATLAB.
Using QR Decomposition to calculate the sum of squares of a model has a limitation that the number of rows,
which is also the number of observations or responses, has to be greater than the total number of parameters used in the
model. The main goal in the experimental design model, as a part of the Linear Model, is to analyze the estimable function
of the parameters used in the model. In order not to deal with generalized invers, partitioned design matrix may be used
instead. This partitioned design matrix method may be used to calculate the sum of squares of the models whenever the total
number of parameters is greater than the number of observations. It can also be used to find the degrees of freedom of each
source of variation components. This method is discussed in a Balanced Nested-Factorial Experimental Design.
A Transportation Problem is
one of the
most
typical
problems being encountered in many situations
and
it
has
many
practical applic
ations. Many researches had been conducted
and
many methods
had been proposed to solve it. One of the most
difficult challenge in solving the problem deals with inputting a
very large volume of data. With the development of intelligent
technologies, compu
ters had already been used to solved this
problem. This paper presents a method using Genetic Algorithm
(GA) t
o provide a new tool that can quickly calculate the solution
to the Balanced Transportation Problem.
The test results are compared with selected o
ld methods to
confirm the effectiveness of the use of GA. A
mathematical model
was used to represent the GA and be applied to solve it. Finally,
the test results of the model were presented so show the
effectiveness.
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expensesinventionjournals
: The majority of the allowances being transferred to public institutions are mostly spent for buying new equipment, materials, facilities and their maintenance and repair. Some of the public sectors establish their own plants in order to reduce the maintenance and repair costs and gain ability to perform these activities. However, developing technology and variety of materials make their repair and maintenance activities more expensive for them. In this study, vital criteria for a public institution are determined. By using Fuzzy DEMATEL (Decision Making Trial And Evaluation Laboratory) method the degree of importance is identified by two defuzzification methods and the alternatives are ranked by using SMAA-2 (Stochastic Multi Criteria Acceptability Analysis) in three scenarios. The results show that different defuzzification methods change the order of preferences.
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expensesinventionjournals
The majority of the allowances being transferred to public institutions are mostly spent for buying new equipment, materials, facilities and their maintenance and repair. Some of the public sectors establish their own plants in order to reduce the maintenance and repair costs and gain ability to perform these activities. However, developing technology and variety of materials make their repair and maintenance activities more expensive for them. In this study, vital criteria for a public institution are determined. By using Fuzzy DEMATEL (Decision Making Trial And Evaluation Laboratory) method the degree of importance is identified by two defuzzification methods and the alternatives are ranked by using SMAA-2 (Stochastic Multi Criteria Acceptability Analysis) in three scenarios. The results show that different defuzzification methods change the order of preferences.
At the end of learning at an educational level, leaders often perceive difficulties in
determining the best students at a certain level of education. Cumulative Achievement Index may
not be used for decision makers in determining the best students. There are criteria other criteria that
influence them are actively organize, have never done a repair value, never follow short semester,
never leave. Using these criteria and using Multi-Criteria Decision Making (MCDM) based methods
applied to decision support systems can deliver the expected outcomes of higher education leaders.
Many methods can be used on decision support systems such as Promethee, Promethee II, Electre,
AHP, SAW, or TOPSIS. In this discussion, the author uses Extended Promethee II method in
determining the best student at a college.
The Evaluation Model of Garbage Classification System Based on AHPDr. Amarjeet Singh
Based on Shenzhen as an example, the questionnaire was designed in advance to get statistical data. In this paper, the AHP and the linear weighted sum method are used for the weight calculation of each factor, obtaining the long-term cost benefit function of the garbage classification system and the garbage classification pattern grading. Finally, we can choose the better garbage classification mode according to this score.
The objective of this project was to classify the given set of events as either tau-tau decay of Higgs Boson or as a background noise. This project was completed as a part of the Machine Learning module. We have come up with an ensemble model with XGBoosting and Random Forest classifiers to solve this problem.
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
Data clustering is a common technique for statistical data analysis; it is defined as a class of
statistical techniques for classifying a set of observations into completely different groups. Cluster analysis
seeks to minimize group variance and maximize between group variance. In this study we formulate a
mathematical programming model that chooses the most important variables in cluster analysis. A nonlinear
binary model is suggested to select the most important variables in clustering a set of data. The idea of the
suggested model depends on clustering data by minimizing the distance between observations within groups.
Indicator variables are used to select the most important variables in the cluster analysis.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
Parametric Sensitivity Analysis of a Mathematical Model of Two Interacting Po...IOSR Journals
Experts in the mathematical modeling for two interacting technologies have observed the different contributions between the intraspecific and the interspecific coefficients in conjunction with the starting population sizes and the trading period. In this complex multi-parameter system of competing technologies which evolve over time, we have used the numerical method of mathematical norms to measure the sensitivity values of the intraspecific coefficients b and e, the starting population sizes of the two interacting technologies and the duration of trading. We have observed that the two intraspecific coefficients can be considered as most sensitive parameter while the starting populations are called least sensitive. We will expect these contributions to provide useful insights in the determination of the important parameters which drive the dynamics of the technological substitution model in the context of one-at-a-timesensitivity analysis
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
1. 26/3/2008 1
Genetic Algorithm
Genetic Algorithms (GA) apply an evolutionary
approach to inductive learning. GA has been
successfully applied to problems that are
difficult to solve using conventional techniques
such as scheduling problems, traveling
salesperson problem, network routing problems
and financial marketing.
3. 26/3/2008 3
Genetic learning algorithm
• Step 1: Initialize a population P of n elements
as a potential solution.
• Step 2: Until a specified termination condition
is satisfied:
– 2a: Use a fitness function to evaluate each
element of the current solution. If an element
passes the fitness criteria, it remains in P.
– 2b: The population now contains m elements (m
<= n). Use genetic operators to create (n – m)
new elements. Add the new elements to the
population.
4. 26/3/2008 4
Digitalized Genetic knowledge
representation
• A common technique for representing
genetic knowledge is to transform
elements into binary strings.
• For example, we can represent income
range as a string of two bits for assigning
“00” to 20-30k, “01” to 30-40k, and “11” to
50-60k.
5. 26/3/2008 5
Genetic operator - Crossover
• The elements most often used for
crossover are those destined to be
eliminated from the population.
• Crossover forms new elements for the
population by combining parts of two
elements currently in the population.
6. 26/3/2008 6
Genetic operator - Mutation
• Mutation is sparingly applied to elements
chosen for elimination.
• Mutation can be applied by randomly
flipping bits (or attribute values) within a
single element.
7. 26/3/2008 7
Genetic operator - Selection
• Selection is to replace to-be-deleted
elements by copies of elements that pass
the fitness test with high scores.
• With selection, the overall fitness of the
population is guaranteed to increase.
8. 26/3/2008 8
Step 1 of Supervised genetic learning
This step initializes a population P of
elements. The P referred to population
elements. The process modifies the
elements of the population until a
termination condition is satisfied, which
might be all elements of the population
meet some minimum criteria. An
alternative is a fixed number of iterations
of the learning process.
9. 26/3/2008 9
Step 2 of supervised genetic learning
Step 2a applies a fitness function to
evaluate each element currently in the
population. With each iteration, elements
not satisfying the fitness criteria are
eliminated from the population. The final
result of a supervised genetic learning
session is a set of population elements
that best represents the training data.
10. 26/3/2008 10
Step 2 of supervised genetic learning
Step 2b adds new elements to the
population to replace any elements
eliminated in step 2a. New elements are
formed from previously deleted elements
by applying crossover and mutation.
11. 26/3/2008 11
An initial population for supervised
genetic learning example
Population
element
Income
Range
Life
Insurance
Promotion
Credit Card
Insurance
Sex Age
1 20-30k No Yes Male 30-39
2 30k-40k Yes No Female 50-59
3 ? No No Male 40-49
4 30k-40k Yes Yes Male 40-49
12. 26/3/2008 12
Question mark in population
A question mark in the population means
that it is a “don’t care” condition, which
implied that the attribute is not important to
the learning process.
13. 1/4/2008 13
Training Data for Genetic Learning
Training
Instance
Income Range Life Insurance
Promotion
Credit Card
Insurance
Sex Age
1 30-40k Yes Yes Male 30-39
2 30-40k Yes No Female 40-49
3 50-60k Yes No Female 30-39
4 20-30k No No Female 50-59
5 20-30k No No Male 20-29
6 30-40k No No Male 40-49
14. 26/3/2008 14
Goal and condition
• Our goal is to create a model able to
differentiate individuals who have
accepted the life insurance promotion
from those who have not.
• We require that after each iteration of the
algorithm, exactly two elements from each
class (life insurance promotion=yes) & (life
insurance promotion=no) remain in the
population.
15. 26/3/2008 15
Fitness Function
1. Let N be the number of matches of the input
attribute values of E with training instances
from its own class.
2. Let M be the number of input attribute value
matches to all training instances from the
competing classes.
3. Add 1 to M.
4. Divide N by M.
Note: the higher the fitness score, the smaller will
be the error rate for the solution.
16. 26/3/2008 16
Fitness function for element 1 own
class of life insurance promotion = no
1. Income Range = 20-30k matches with
training instances 4 and 5.
2. No matches for Credit Card
Insurance=yes
3. Sex=Male matches with training
instances 5 and 6.
4. No matches for Age=30-39.
5. ∴ N = 4
17. 26/3/2008 17
Fitness function for element 1 of competing
class of life insurance promotion = yes
1. No matches for Income Range=20-30k
2. Credit Card Insurance=yes matches with
training instance 1.
3. Sex=Male matches with training instance 1.
4. Age=30-39 matches with training instances 1
and 3.
5. ∴M = 4
6. ∴F(1) = 4 / 5 = 0.8
7. Similarly F(2)=0.86, F(3)=1.2, F(4)=1.0
19. 1/4/2008 19
A Second-Generation Population
Population
element
Income
Range
Life
Insurance
Promotion
Credit Card
Insurance
Sex Age
1 20-30k No No Female 50-59
2 30k-40k Yes Yes Male 30-39
3 ? No No Male 40-49
4 30k-40k Yes Yes Male 40-49
20. 26/3/2008 20
Application of the model(test
phase)
• To use the model, we can compare a new
unknown instance (test data) with the elements
of the final population. A simple technique is to
give the unknown instance the same
classification as the population element to which
it is most similar.
• The algorithm then randomly chooses one of the
m elements and gives the unknown instance the
classification of the randomly selected element.
21. 26/3/2008 21
Genetic Algorithms & unsupervised Clustering
Suppose there are P data instances within
the space where each data instance
consists of n attribute values. Suppose m
clusters are desired. The model will
generate k possible solutions. A specific
solution contains m n-dimensional points,
where each point is a best current
representative element for one of the m
clusters.
22. 26/3/2008 22
For example, S2 represents one of the k possible solutions
and contains two elements E21 and E22.
23. 26/3/2008 23
Crossover operation
A crossover operation is accomplished by
moving elements (n-dimensional points)
from solution Si to solution Sj. There are
several possibilities for implementing
mutation operations. One way to mutate
solution Si is to swap one or more point
coordinates of the elements within Si.
24. 26/3/2008 24
Fitness function
An applicable fitness function for solution Sj is the
average Euclidean distance of the P instances in
the n-dimensional space from their closest
element within Sj. We take each instance I in P
and compute the Enclidean distance from I to
each of the m elements in Sj. Lower values
represent better fitness scores. Once genetic
learning terminates, the best of the k possible
solutions is selected as the final solution. Each
instance in the n-dimensional space is assigned
to the cluster associated with its closest element
in the final solution.
25. 26/3/2008 25
Training data set for unsupervised GA
Instance X Y
1 1.0 1.5
2 1.0 4.5
3 2.0 1.5
4 2.0 3.5
5 3.0 2.5
6 5.0 6.0
26. 26/3/2008 26
Fitness function for unsupervised GA
We apply fitness function to the Training data.
We instruct the algorithm to start with a solution
set consisting of three plausible solutions (k=3).
With m=2, P=6, and k=3, the algorithm
generates the initial set of solutions. An
element in the solution space contains a single
representative data point for each cluster. For
example, the data points for solution S1 are
(1,0, 1.0) and (5.0,5.0).
29. 26/3/2008 29
First Generation Solution
To compute the fitness score of 11.31 for solution
S1 the Euclidean distance between each
instance and its closest data point in S1 is
summed. To illustrate this, consider instance 1
in training data. The Euclidean distance between
(1.0,1.0) and (1.0,1.5) is computed as 0.50. The
distance between (5.0,5.0) and (1.0,1.5) is 5.32.
The smaller value of 0.50 is represented in the
overall fitness score for solution S1. S2 is the
best first-generation solution.
30. 26/3/2008 30
Second Generation Solution
The second generation is obtained by
performing a crossover between solutions
S1 and S3 with solution element (1.0,1.0)
in S1 exchanging places with solution
element (5.0,1.0) is S3. The result of the
crossover operation improves (decreases)
the fitness score for S3 while the score for
S1 increases.
31. 26/3/2008 31
(Final) Third Generation Solution
The third generation is acquired by mutating
S1. The mutation interchanges the y-
coordinate of the first element in S1 with
the x-coordinate of the second element.
The mutation results in an improved
fitness score for S1. Mutation and
crossover continue until a termination
condition is satisfied. If the third
generation is terminal, then the final
solution is S2.
32. 26/3/2008 32
Solution for Clustering
If S2 (3.0, 2.0) and (3.0, 5.0) is the final solution,
then computing the distances between S2 and
the following points are:
Instances 1, 3 and 5 forming one cluster and
instances 2 and 6 forming second cluster, and
instance 4 can be in either clusters.
Cluster 1 center (3.0, 2.0)
Instance X Y
1.0 1.5
2.0 1.5
3.0 2.5
Cluster 2 center (3.0, 5.0)
Instance X Y
1.0 4.5
2.0 3.5
5.0 6.0
33. 26/3/2008 33
General considerations for GA
• GA are designed to find globally optimized
solutions.
• The fitness function determines the
computation complexity of a genetic
algorithm.
• GA explain their results to the extent that
the fitness function is understandable.
• Transforming the data to a form suitable
for a genetic algorithm can be a challenge.
34. 26/3/2008 34
Choosing a data mining technique
Given a set of data containing attributes and
values to be mined together with information
about the nature of the data and the problem to
be solved, determine an appropriate data
mining technique.
35. 26/3/2008 35
Considerations for choosing data
mining techniques
• Is learning supervised or unsupervised?
• Do we require a clear explanation about the
relationships present in the data?
• Is there one set of input attributes and one set of
output attributes or can attributes interact with one
another in several ways?
• Is the input data categorical, numeric, or a
combination of both?
• If learning is supervised, is there one output attribute
or are there several output attributes? Are the output
attribute(s) categorical or numeric?
36. 26/3/2008 36
Behavior of different data mining techniques
1. Neural networks is black-box structured, and is a poor
choice if an explanation about what has been learned is
required.
2. Association rule is a best choice when attributes are
allowed to play multiple roles in the data mining
process.
3. Decision trees can determine attributes most predictive
of class membership.
4. Neural networks and clustering assume attributes to be
of equal importance.
5. Neural networks tend to outperform other models when
a wealth of noisy data are present.
6. Algorithms for building decision trees typically execute
faster than neural network or genetic learning.
7. Genetic algorithms is typically used for problems that
cannot be solved with traditional techniques.
37. 26/3/2008 37
Review question 10
Given the following training data set
Training instance Income range Credit card insurance Sex Age
1 30-40k Yes Male 30-39
2 30-40k No Female 40-49
3 50-60k No Female 30-39
4 20-30k No Female 50-59
5 20-30k No Male 20-29
6 30-40k No Male 40-49
Describe the steps needed to apply unsupervised
genetic learning to cluster the instances of the credit
card promotion database.
38. 26/3/2008 38
Tutorial Question 10Given the following training data set
Training instance Income range Credit card insurance Sex Age
1 30-40k Yes Male 30-39
2 30-40k No Female 40-49
3 50-60k No Female 30-39
4 20-30k No Female 50-59
5 20-30k No Male 20-29
6 30-40k No Male 40-49
After transforming the input data into numeric such as yes=1, no=2, male=1, female=2,
20-29=1, 30-39=2, 40-49=3, 50-59=4, 20-30k=1, 30-40k=2, 40-50k=3, 50-60k=4, the
training data set becomes:
T(1)=(2,1,1,2)
T(2)=(2,2,2,3)
T(3)=(4,2,2,2)
T(4)=(1,2,2,4)
T(5)=(1,2,1,1)
T(6)=(2,2,1,3)
Assume there are two set of initial population for two clusters as:
Solution 1 of 2 clusters centers: K1(1,1,1,1), (4,2,2,4)
Solution 2 of 2 clusters centers: K2(4,4,4,4), (2,2,1,1)
Choose the best solution based on their fitness function score by use of unsupervised
genetic learning.
39. 26/3/2008 39
Reading assignment
“Data Mining: A Tutorial-based Primer” by
Richard J Roiger and Michael W. Geatz,
published by Person Education in 2003,
pp.89-101.