Upcoming SlideShare
×

# Optimization Of Fuzzy Bexa Using Nm

578 views
537 views

Published on

This document summarizes my research work at IU in Germany.

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
578
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
6
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Optimization Of Fuzzy Bexa Using Nm

1. 1. Optimization of Fuzzy BEXA using Nelder Mead Search Algorithm<br />By: <br />Ashish Khetan (a.khetan@iitg.ernet.in )<br />Bachelor of Technology, Department of Mechanical Engineering, <br />Indian Institute of Technology Guwahati, India <br />Under the guidance of: <br />Prof Dr. Ian Cloete,<br />President IU in Germany, Germany <br />Date: 21.07.2008<br />INTRODUCTION<br />Fuzzy BEXA is a machine learning algorithm, which through some iterative process classifies a given data set and then generates rules for a given class attribute. The core algorithm of Fuzzy BEXA runs over a fuzzy attribute relation file format (farff) but it also incorporates an algorithm which is able to convert a given attribute relation file format (arff) into farff. While it converts arff into farff it uses some parameters, values of which are based on expert’s knowledge. While it generates rules using some iterative process again it uses some expert knowledge based parameters like α cut for attributes and class attribute. <br />One of the criteria to estimate the performance of Fuzzy BEXA is the percentage accuracy of instances classification from the rules generated. The percentage accuracy as an objective function depends upon these expert knowledge based parameters. <br />Nelder Mead search algorithm can be used for finding the optimum values of these parameters for which the value of objective function is maximum. The algorithm does not use the use of concept of derivative to find the maximum/minimum. This property of algorithm makes it suitable for the optimization of parameters in Fuzzy BEXA to obtain the maximum percentage accuracy of classification. <br /> <br />Literature review<br />As a part of literature review, some of the basic concepts of fuzzy set theory, arff and farff, are explained with the help of “fuzzy set covering as a new paradigm for the induction of fuzzy classification rules”, Jacobus van Zyl. A brief description of Nelder Mead search algorithm is also provided.<br />Basic Fuzzy Set Theory <br />Let U be a given universal set. Generally, a set A, A ∈ U, is defined using one of three methods: listing each element in the set, e.g. A = {a, b, c}, using a proposition to describe a property that must be satisfied by all the members of the set, e.g. A = {x|x ∈ Z, 0 < x < 10}, or using a function, usually called the characteristic function, that declares which elements are members of the set,<br />μA (u) = 1, for u ∈ A<br /> = 0, for u not belonging to A<br />where u ∈ U.<br />Fuzzy sets are a generalization of crisp sets, and are defined using the functional method, where the characteristic function is defined as<br />μA (u) : U → [0, 1] <br />The degree to which an element u, u ∈ U, belongs to the fuzzy set A is described in terms of the membership function μA (u). This degree of membership expresses the certainty or ambiguity that u belongs to A, with μA (u) = 1 meaning absolute certainty that u ∈ A, and μA (u) = 0 absolute certainty that u does not belong to A. Crisp sets are special cases of fuzzy sets, since for a crisp set, μA (u) : U → {0, 1}, i.e. the membership function is either 1 or 0, and elements can either belong to a set or not with absolute certainty. <br />Membership Function<br />Attributes are usually of two types: <br />Nominal: which takes a finite set of unordered values (e.g. attribute outlook takes the values sunny, cloudy, and rainy). <br />Real: which take values from a linearly ordered range (e.g. temperature).<br />For real attributes the fuzzy membership function maps the linear domain to membership degrees on the scale [0, 1]. The figure below shows how temperature values are mapped onto membership degrees for the term set of temp, defining the membership functions μcold, μmild and μhot. <br /> <br />Linguistic variables with an unordered input domain, for example outlook (sunny, cloudy, rainy) have no associated mapping from a linear domain to membership degrees. In this case the membership function just describes the ambiguity that an instance belongs to a certain term.<br />Attribute relation file format<br />When the data set is in the crisp format, it can be written in attribute relation file format (arff). It can be well understood using an example crisp data set.<br />Table 1: A crisp data set (arff) <br />@relation sport<br />@attribute outlook {sunny, cloudy, rainy}<br />@attribute temp real<br />@attribute humidity {humid, normal}<br />@attribute wind real<br />@attribute activity {volleyball, swimming, weights}<br />@data<br />sunny, 30, humid, 26, swimming ;1<br />sunny, 26,normal, 5, volleyball ;2<br />cloudy, 28, normal, 12, swimming ;3<br />cloudy, 23, normal, 14, volleyball ;4<br />rainy, 28, normal, 20,weights ;5<br />cloudy, 13, humid, 24, weights ;6<br />rainy, 10, normal, 10, weights ;7<br />cloudy, 12, normal, 14, volleyball ;8<br />sunny, 33, humid, 22, swimming ;9<br />sunny, 13, normal, 33, weights ;10<br />sunny, 31, humid, 0, swimming ;11<br />An example of a Vl1 (Variable Valued logic system 1) concept description:<br />IF antecedent THEN consequent: <br />([outlook = sunny ∨ cloudy] ∧ [temp = 13]) ∨ ([humidity = normal] ∧ [temp = 28]) → weights<br />∧: Symbol of conjunction; two expressions must be true simultaneously. <br />∨: Symbol of disjunction; either of the two expressions must be true. <br />A set of values of each attributes is called an instance. Instances classified correctly under a given rule are called positive instances otherwise negative. When concept descriptions for a particular class attribute, suppose weights, are written all together it forms the rule for that particular class attribute. <br /> <br />Fuzzy attribute relation file format<br />When the same (Table 1) kind of data is in fuzzy format it can be written in fuzzy attribute relation file format. It can be well understood using an example data set. <br />Table 2: A fuzzy data set (farff)<br />@relation sport<br />@attribute outlook {sunny, cloudy, rainy}<br />@attribute temp {hot, mild, cold}<br />@attribute humidity {humid, normal}<br />@attribute wind {windy, calm}<br />@attribute activity {volleyball, swimming, weights}<br />@data<br />(.9 .1 .0), (1. .0 .0), (.8 .2), (.4 .6), (.0 .8 .2) ;1<br />(.8 .2 .0), (.6 .4 .0), (.0 1.), (.4 .6), (1. .7 .2) ;2<br />(.0 .7 .3), (.8 .2 .0), (.1 .9), (.2 .8), (.3 .6 .1) ;3<br />(.2 .7 .1), (.3 .7 .0), (.2 .8), (.3 .7), (.9 .1 .0) ;4<br />(.0 .1 .9), (.7 .3 .0), (.5 .5), (.5 .5), (.0 .0 1.) ;5<br />(.0 .7 .3), (.0 .3 .7), (.7 .3), (.4 .6), (.2 .0 .8) ;6<br />(.0 .3 .7), (.0 .0 1.), (.0 1.), (.1 .9), (.0 .0 1.) ;7<br />(.0 1. .0), (.0 .2 .8), (.2 .8), (.0 1.), (.7 .0 .3) ;8<br />(1. .0 .0), (1. .0 .0), (.6 .4), (.7 .3), (.2 .8 .0) ;9<br />(.9 .1 .0), (.0 .3 .7), (.0 1.), (.9 .1), (.0 .3 .7) ;10<br />(.7 .3 .0), (1. .0 .0), (1. .0), (.2 .8), (.4 .7 .0) ;11<br />(.2 .6 .2), (.0 1. .0), (.3 .7), (.3 .7), (.7 .2 .1) ;12<br />(.9 .1 .0), (.2 .8 .0), (.1 .9), (1. .0), (.0 .0 1.) ;13<br />(.0 .9 .1), (.0 .9 .1), (.1 .9), (.7 .3), (.0 .0 1.) ;14<br />(.0 .0 1.), (.0 .0 1.), (1. .0), (.8 .2), (.0 .0 1.) ;15<br />(1. .0 .0), (.5 .5 .0), (.0 1.), (.0 1.), (.8 .6 .0) ;16<br />Rules generated by Fuzzy BEXA over farff data set are of the same format “IF antecedent THEN consequent” where the antecedent is a conjunction in Fuzzy AL, and the concept is a linguistic term from the set of class variables. For example consider the rule <br />IF [sunny, cloudy][mild]@0.7 THEN weights@0.8 <br />The number following the antecedent is the value of αa, the antecedent threshold, and the number following the consequent is the value of αc, the concept threshold. The threshold values decide whether the membership of that particular attribute is true or not. Let us take an example to understand αa and αc.. <br />Let the concept be activity.volleyball and the conjunction c = [sunny][normal]. Let αc = αa = 0.8, then XT(c) (training set which depends upon αa ) = {2, 10, 13, 16}, P (positive instances which depends upon αc) = {2, 4, 16}, XP (c) (positive instances out of training set which satisfies both αc and αa)= {2, 16}, N = T − P, and XN(c) = XT (c) – XP (c) = {10, 13}, where we list the instances by their instance numbers in the table.<br />Nelder Mead search algorithm <br />It is a numerical method for minimizing an objective function in a multi dimensional space. In other words, the algorithm gives the optimum value of parameters involved in a system while minimizing or maximizing the objective function. The most important property of the algorithm is that it does not use the concept of derivative while most other algorithms of this class do use the concept. <br />To find the set of optimum values of the parameters, the algorithm starts with a set of random initial values and generates an initial simplex of certain size with N+1points in an N dimensional space to optimize the N parameters for a given objective function. The algorithm converges with the simplex becoming smaller and smaller while approaching towards the optimum point. <br />The algorithm works iteratively while the initial simplex changes its shape and size and moves towards the optimum point. At each step of iteration, it sorts vertices of simplex in order of the value of the objective function at that point. Then it replaces the worst point with a point reflected through the centroid of the remaining N points. If this point is better than the best current point, then it tries to stretch out/in along this line. If this new point is not much better than the previous point, then it is steeping across a valley, so it shrinks the simplex towards the best point. Sometimes it gets stuck in a rut, and then the algorithm restarts from the new initial random point.<br />The algorithm can only find the local minimum depending upon the position and size of the initial simplex with which it starts. <br />Approach<br />While the Fuzzy BEXA generates rules from a given data set for a particular class attribute, it uses some of the expert knowledge based parameters. Whenever there are some expert knowledge based parameters in a system, upon which the performance of the system depends, Nelder Mead search algorithm can always be used to optimize the values of parameters to obtain the best performance. <br />In my studies I found that there are mainly two places in the Fuzzy BEXA where the expert knowledge based parameters come into picture. First, when the arff file is converted into farff and the second when it presumes some value for the threshold cut parameters αa (for attributes) and αc (for class attribute).<br />While the Fuzzy BEXA converts arff file into farff, first using expert’s knowledge it decides how many linguistic variables should be created for each real valued attribute. The number of linguistic variables for each real valued attribute is the first parameter which can be optimized. In the Fuzzy BEXA it is usually taken as three or five depending upon the range of data and apparent number of clusters in the data, values of the attribute across all the instances. <br />Then it comes to the second stage when the shape of the membership function is decided using expert’s knowledge. It can be a continuously smooth function or a piecewise linear function. The membership function can be different for different attributes. Fuzzy BEXA takes a piecewise linear bell shape function for all the real valued attributes. The shape of the membership function is a parameter with discrete values which can be optimized at the second stage. <br />A piecewise linear bell shape function approximation.<br />Referring to the above figure the points of discontinuity in the piecewise linear bell shape function used in the Fuzzy BEXA are the parameters at the third stage which can also be optimized. <br />The algorithm used in the Fuzzy BEXA for the generation of membership function<br />Referring to the algorithm above, the values 0.25 and 0.125, decide the point of discontinuity on the horizontal axis and the value 0.8 decides the same on the vertical axis for the membership function. These three values are three parameters which can also be optimized. <br />Any change in the value of parameters at the above mentioned three stages, causes changes in the definition of membership function for the attribute and hence the farff file generated from the arff file is changed. It ultimately leads into the generation of different rules as the data set (farff file) itself is changed and hence the accuracy of classification of instances changes.<br />The threshold cut parameters αa (for attributes) and αc (for class attribute) are the parameters which come into picture while the rules are generated from a given farff file. If the values of these parameters are changed it also leads into the change in the accuracy of classification of instances. <br /> <br />Implementation in JAVA<br />As already explained Nelder Mead search algorithm works over an N+1 points simplex in an N dimensional space, the dimensions of which are the N parameters to be optimized to obtain the maximum or minimum value of the objective function which depends upon those parameters. <br />All parameters in a system can be optimized together at a time or one by one while optimizing only similar parameters at a time. Similar parameters mean, referring to the above mentioned parameters, (αa and αc) and (0.25, 0.125 and 0.8) which decides the points of discontinuity in the piecewise linear function.<br />A piece of JAVA code is developed to optimize the two parameters (αa and αc) involved in the Fuzzy BEXA taking percentage accuracy of instances classification as the objective function. The algorithm of the code is explained below.<br /><ul><li>Take initial random values of αa and αc from the user to generate the three point simplex in a two dimensional space of αa and αc.
2. 2. Write a set of values of αa and αc at the appropriate place in the data file, and call Fuzzy BEXA through a batch file, the fuzzy BEXA writes the percentage accuracy in a text file, read it from there.
3. 3. Repeat step 2 to get the value of objective function at all the three points.
4. 4. Run the core Nelder Mead algorithm over the initial simplex. At each iteration Nelder Mead algorithm calls Fuzzy BEXA two to four times using step 2.
5. 5. Depending upon the tolerance limit given, with the smallest size of simplex while each vertex of simplex approaches the optimum point. </li></ul>Conclusion <br />Nelder Mead Search algorithm can be applied effectively to optimize the parameters involved in the Fuzzy BEXA algorithm. The same has been tested by optimizing the two parameters αa and αc to obtain the maximum accuracy of classification for the instances. The algorithm can also be applied to optimize other parameters, number of linguistic variables, shape of membership function, and points of discontinuity in the membership function. These parameters can be optimized separately for each attribute but it would be feasible if there are only few attributes. <br />Convergence of the Nelder Mead algorithm depends upon the initial guess of the parameters using which the initial simplex is created. The algorithm gets stuck in one of the local minima and cannot move ahead to find the global minima. Hence it is necessary to run the algorithm number of times with different initial guesses to get the global minima for the objective function. <br />