Meta heuristic based clustering of two-dimensional data using-2

224 views
171 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
224
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Meta heuristic based clustering of two-dimensional data using-2

  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME93META-HEURISTIC BASED CLUSTERING OF TWO-DIMENSIONALDATA USING NEIGHBOURHOOD SEARCH WITH DATA MININGTECHNIQUE AS AN APPLICATION OF P-MEDIAN PROBLEM1D.Srinivas Reddy, 2Dr A.Govardhan, 3S.S.V.N Sharma1Vaageswari College of Engineering, Karimnagar, India2JNTUH University, Director of Evaluation, Hyderabad, India3Director of Vaagdevi College of Engineering, IndiaABSTRACTThe p-median difficulty is the well known facility allocation essence. Theneighborhood search (NS) algorithm provides a solution to that problem. This NS algorithmis hybridized with data mining (DM) technique to attain a solution that is better than NSapproach [1]. Two new Metaheuristic clustering algorithms HDMNS (Hybrid Data Miningbased Neighborhood Search) and HMDMNS (Hybrid Multiple Data Mining basedNeighborhood Search) are proposed schemes which are hybrid versions of NS algorithm [10,11 and 17]. The clustering technique is the process of dividing the points into related groups.The proposed NS with DM technique method can also be used as a clustering algorithmbased on the nature of the p-median problem. The proposed algorithm has two phases; Firstphase in both algorithms constructs the basic solution using the NS approach. In HDMNSapproach DM technique is used to locate the improved solution which gives the betterclustering of two-dimensional data space. In HMDMNS the second phase is also uses theDM technique to get the best quality cluster based on the idea to get better results withmining if exists and the results are compared with the well known K-means clusteringalgorithm, results proves that both the methods out performs k-means algorithm.Index Terms- HDMNS (Hybrid Data Mining based Neighborhood Search), HMDMNS(Hybrid Multiple Data Mining based Neighborhood Search, neighborhood search (NS), Datamining (DM), ClusteringINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING& TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 3, May-June (2013), pp. 93-100© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2013): 6.1302 (Calculated by GISI)www.jifactor.comIJCET© I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME94I. INTRODUCTIONThe p-median problem which is a combinatorial optimization problem in naturalworld is NP-Hard that realizes facilitators that serve the maximum locations [7, 8]. The p-median problem will be practical in several applications such as growing marketing strategiesin the realm of management sciences and locating server positions in computer networks(CNs) [12]. The use the p-median problem as a clustering technique. In this work ametaheuristic method k-Mean-GRASP is proposed to solve the p-median problem which isdiscussed in detail. The arrangement of similar objects into different groups is known asClustering, i.e., the segregation of information into subsets (clusters), so that the elements ofeach subset possess some common mannerism. Data clustering is an important Data Miningtask and is a general mechanism for analyzing statistical data, which is used in several otherareas; such as mechanical process industries, machine learning, pattern recognition, imageanalysis and bioinformatics [15]. Metaheuristics embody a principal category of approximatepractice for decipher of hard combinatorial optimization problems, for simplification ofwhich the use of exact methods is impractical. There are several general purpose high-levelprocedures that can be instantiated to explore the solution space of a specific optimizationproblem efficiently. Earlier, metaheuristics, like genetic algorithms, tabu search, simulatedannealing, ant systems, GRASP, and others, have been introduced and are applied to real-lifeproblems in several areas of science. Many optimization problems are successfully applied tosolve the GRASP (Greedy Randomized Adaptive Search Procedures) metaheuristic. Thesearch process for identifying the solution employed by GRASP is iterative and each passconsists of two phases: construction and enhancement phase. In the construction phase afeasible solution is built, and then its neighbourhood is determined by the enhancement phaseto find an improved one. The outcome is the paramount solution found over all iterations.Figure 1: K-Mean GRASP Procedure Figure 2: K-Means algorithmThe NS technique reveals all possible combinations with the elements in theneighbourhood of individual elements in the solution and determine the optimal solution i.e.,which serves the maximum locations so that the sum of the total distance from the eachelement to the facilities is minimized. The NS Approach Metaheuristic is an iterative onewhich contains two phases [16]. Construction phase is the first phase that structure the initialsolution and based on the initial solution the second phase gawk at for the optimal solutionbased on NS approach and then the feasible solution space is computed to obtain the optimalProcedure-Mean-GRASP()1. Initialize best_sol as Æ2. repeat3. sol ßk-Means(data points);4. best_sol ßEnhancement (sol);5. if cost(sol) > cost(best_sol)6. best_sol ßsol;7. end if8. until Termination criterion;9. return best_sol;Procedure-means (data points)1. Initialize k-points as cluster centers2. Assign each data point to the nearestcluster center3. Recompute the cluster centers for eachcluster as the mean of the cluster.4. Repeat steps 2 and 3 until there is notany more change in the value of themeans
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME95Procedure HDMNS ()1. Initialize sol, optimal_sol, Mined_solΦ2. Initialize sol_space Φ3. Initialize USS, FIS, UFIS Φ4. Read list, p5. Sol NSApproach(list, p)6. USS Update_sol_space (sol)7. FIS Generate_frequent_items(USS)8. UFIS Update_supportcount(FIS, USS)9. Mined _ sol Generate _ mined_solution(UFIS)10. Update optimal_solFigure 4: Hybridized Data Mining NS algorithmsolution. The NS Approach mechanism is illustrated in the Figure 3. It is having two phases:the construction phase and NB Search phase.II. HYBRIDIZED DATA MINING NEIGHBOURHOOD SEARCH APPROACH ANDHYBRID MULTIPLE DATA MINING BASED NEIGHBORHOOD SEARCHThe HDMNS () procedure shown in the Figure 4 consists of three phases. The firstphase NS Approach (), computes the initial solution using NS approach by considering thegiven list and user specified ‘P’ [18]. The second phase is the application of DM technique tothe basic realistic solution. The obtained solution in first phase is input for the second phasewhich is used to generate the updated solution space (USS) that consists of all the probablesolutions generated using the result obtained in the NSApproach (). Based on USS, theFrequent item set (FIS) are generated which consists the set of all individual items that arepresent in the USS [4, 5 and 9]. Currently, support count is calculated and updated for eachitem in the FIS. After updating support count, sort the frequent items in the decreasing orderof support count and then the mined solution is constructed by deliberating the items withhigh support count until the size of the solution or the number of items exactly equals to ‘P’.The final phase is used to obtain the optimal solution. The optimal solution is updated usingenhancement phase described in NSApproach which examines the global optimal solutionthat optimizes the objective of the given p-median problem [13, 14]. HMDMNS described inthe Figure 5, is the Metaheuristic based on Neighbourhood Search (NS) hybridized withData Mining Technique (HDMNS) multiple times iteratively with Frequent Mining toprovide an updated solution to p-median problem if exits in each iteration [6]. The resultinglocal optimal solution from NS method serves as a basis for identification of practicalsolution space that holds different possible solutions of similar size and by the application offrequent mining technique on it results in identification of frequent items. Basing on thesupport count, most practical solution is identified. The frequent mining technique is appliedonly one time in HDMNS and yields a better solution. In HMDMNS it is applied constantlyseveral times based on the inspiration that on applying the mining technique one time derivesa better solution. Then applying the same over and over again will definitely yield optimalone.Procedure NSApproach ()1. optml_sol ∅2. sol Construction(data points);3. best_sol NBSearch(sol);4. if cost(sol) > cost(optml_sol)5. optml_sol sol;6. end if7. until Termination criterion;8. return optml_sol;Figure 3: NS Approach procedure
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME96Procedure HMDMNS ()1. sol, optimal_sol, Mined_sol Φ2. sol_space Φ3. USS, FIS, UFIS Φ4. Read list, p5. Sol NSApproach(list, p)6. Repeat7. USS Update_sol_space (sol)8. FIS Generate_frequent_items(USS)9. UFIS Update_supportcount(FIS, USS)10.Mined _ sol Generate _ mined_solution(UFIS)11.Update optimal_sol12. Go to step 6 until user specified timesFigure 5: Hybridized Data Mining NS algorithmFigure 6:III. HDMNS AND HMDMNS APPROACHES AS CLUSTERING METHODS,K-MEANSOf the most popular heuristics for solving clustering problems by k-means clusteringAlgorithm (CA) which is simple and widely used [2]. This algorithm segregates the data intok disjoint clusters. The center of each cluster is labeled as centroid. It partitions the objects soas to minimize the sum total of the squared distances between the centroid of the clusters andtheir objects. The k-means algorithm is described in the Figure 6. The cluster qualitycomputed for k-means is same as p-median problem objective function value. So, the solutionapproaches proposed for p-median can also be considered as new clustering mechanismswhich are efficient than k-means. And the cluster quality estimation for the k-meansalgorithm is the squared error criterion is given by ‫ܧ‬ ൌ ∑ ∑ |‫݌‬ െ ݉௜|ଶ௣‫א‬஼೔௞௜ୀଵ where E is thesum of square-error for all objects in the database, ‘P’ is the point in the space representingthe given object, and mi is the mean of the cluster Ci. This criterion makes the resulting kclusters as solid.IV. RESULTSThe experimental results acquired for k-GRASP and k-MEANS are presented in thissection, and the results are com-pared on the bases of solution quality against k. Experimentsare conducted on data sets with 50, 75, 100 points. Results are tabulated and graphs areplotted. The data sets under study are taken from the web site of Professor Eric Taillarrd,University of Applied Sciences of Western Switzerland. The companion website for p-median problems instances is http://mistic.heigvd.ch/taillard/problemes.dir/location.html.Here quality of the solution (cluster) i.e., sum of the distances from each customer location toits closest facility (cluster center) is measured for both k-means algorithm and the hybridizedk-mean-GRASP algorithm. In Figure 7 solution/cluster quality is compared using bothalgorithms k-Means and k-Mean-GRASP for the data set of size 50 with number of facilitylocations (cluster centers) incremented by 5.In Figure 8 solution/cluster quality is comparedusing both algorithms k-Means and k-Mean-GRASP for the data set of size 75 with numberof facility locations (cluster centers) incremented by 10.Procedure k-means (data points)1. Initialize k-points as cluster centers2. Assign each data point to the nearestcluster center3. Recompute the cluster centers for eachcluster as the mean of the cluster.4. Repeat steps 2 and 3 until there is notany more change in the value of themeans
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, MayFigure 7: k-Mean Vs K-Mean GRASThe experimental results obtained foranalyzed in this section, and the results are evaluated on the basis of quality of the solutionagainst ‘P’. Experiments are carried out on data sets with 15, 25, 50are tabularized and graphs are outlined. The origin of data sets under study is acquiredfrom the web site of Professor Eric Taillarrd, KentWestern Switzerland. The associated website for phttp://mistic.heig-vd.ch/taillard/problemes.dir/location.html.compared for algorithms HDMNS, HMDMNS and kp and n=25. It is identified that HMDMNS is resulting with good quality cluster than theother two.International Journal of Computer Engineering and Technology (IJCET), ISSN 09766375(Online) Volume 4, Issue 3, May – June (2013), © IAEME97Mean GRASP Figure 8: k-Mean Vs k-Mean GRASPThe experimental results obtained for NS approach, HDMNS and HMand the results are evaluated on the basis of quality of the solution. Experiments are carried out on data sets with 15, 25, 50and 75 points. Resultsare tabularized and graphs are outlined. The origin of data sets under study is acquiredthe web site of Professor Eric Taillarrd, Kent University of Applied Sciences ofWestern Switzerland. The associated website for p-median problem instances isvd.ch/taillard/problemes.dir/location.html. In Graph-1 the clusterHDMNS, HMDMNS and k-means algorithms with intervals 5 forp and n=25. It is identified that HMDMNS is resulting with good quality cluster than theInternational Journal of Computer Engineering and Technology (IJCET), ISSN 0976-June (2013), © IAEMEMean GRASPMDMNS areand the results are evaluated on the basis of quality of the solutionpoints. Resultsare tabularized and graphs are outlined. The origin of data sets under study is acquiredUniversity of Applied Sciences ofmedian problem instances iscluster quality ismeans algorithms with intervals 5 forp and n=25. It is identified that HMDMNS is resulting with good quality cluster than the
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, MayInternational Journal of Computer Engineering and Technology (IJCET), ISSN 09766375(Online) Volume 4, Issue 3, May – June (2013), © IAEME98International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-June (2013), © IAEME
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME99The same is observed in the remaining graphs plotted for cluster quality with n=50, 15and 75 with p= 15, 3 and 20 respectively in Graphs:2, 3 and 4. Execution times are alsocompared for all the three algorithms for p=5, 15 and 3 and identified that HMDMNS canalso be used as a better clustering algorithms than existing k-means algorithm. It is depictedin Graphs 5 and 6.V. CONCLUSIONIt is observed that in all the test cases, Hybrid Multiple Data Mining NeighbourhoodSearch (HMDMNS) Metaheuristic performs much better as a clustering algorithm (CA) whencompared with the efficient existing method k-means. So instead of using k-means;HMDMNS can be used as a new clustering mechanism.REFERENCES[1] R. Agrwal and R. Srikanth, Fast algorithms for mining association rules, Proceedings ofthe Very Large Data Bases Conference, pp. 487-499, 1994.[2] T. A. Feo and M. G. C. Resende, A probabilistic heuristic for a computationallydifficult set covering problem, Operational Research Letters, 8 (1989), pp.67-71.[3] M. D. H. Gamal and Salhi, A cellular heuristic for the multisource Weber Problem,computers & Operations Research, 30 (2003), pp.1609-1624.[4] B. Goethals and M. J. Zaki, Advances in Frequent Item set Mining Implementations:Introduction to FIMI03, Proceedings of the IEEEICDM workshop on Frequent Item setMining Implementations, 2003.[5] G. Grahne and J. Zhu, Efficiently using prefix-trees in mining frequent item-sets,Proceedings of the IEEEICDM Workshop on Frequent Item set MiningImplementations, 2003.[6] J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2ndEd., MorganKaufman Publishers, 2006[7] O. Kariv and L. Hakimi, An algorithmic approach to network location problems, part ii:the p-medians, SIAM Journal of Applied Mathematics, 37 (1979), pp.539-560[8] N. Mladenovic, J. Brimberg, P. Hansen and Jose A. Moreno-Perez, The p-medianproblem: A survey of metaheuristic approaches, European Journal of OperationalResearch, 179 (2007), pp.927-939.[9] S. Orlando, P. Palmerimi and R. Perego, Adaptive and resource- aware mining offrequent sets, Proceedings of the IEEE International conference on Data Mining,pp.338-345, 2002[10] M. H. F. Ribeiro, V. F. Trindade, A. Plastino and S. L. Martins, Hybradization ofGRASP metaheuristic with datamining techniquess, Proceedings of the ECAIWorkshop on Hybrid Metaheuristics, pp.69-78, 2004.[11] E. G. Talbi, A taxonomy of hybrid metaheuristics, Journal of Heuristics, 8 (2002),pp.541-564.[12] B. C. Tansel, R. L. Fransis, and T. J. Lowe. Location on networks: A survey,Management Science, 29 (1983), pp.482-511.[13] Moh’d Belal Al-Zoubi, Ahmed Sharieh, Nedal Al-Hanbali and Ali Al-Dahoud, AHybrid Heuristic Algorithm for Solving the P-Median Problem. Journal of ComputerScience (Special Issue) 80-83, 2005, Science Publications.
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME100[14] Alexndre Plastino, Eric R Fonseca, Richard Fuchshuber, Simone de L Martins,Alex.A.Freitas, Martino Luis and Said Salhi A Hybrid Datamining Metaheuristic forthe p-median problem. Proceedings of SIAM journal (Data Mining), 2009.[15] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, a Nodal Relational Approach toData Virtualization. International conference on Cloud Computing and E-governance,(Bangkok, Thiland).[16] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Metaheuristic Approach basedon Neighborhood Search for Solving p-Median Problem, IOSR Journal of ComputerEngineering (IOSRJCE), Volume 7, Issue 1 (Nov-Dec. 2012), PP. 01-05.[17] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Hybrid K-mean GRAP forPartition based Clustering of Two Dimensional Data Space as an application of p-Median Problem, International Journal of Engineering Sciences Research (IJESR),Volume 3, Issue 12, December-2012.[18] D.Srinivas Reddy, Dr A.Govardhan, S.S.V.N Sharma, Hybridization of NeighborhoodSearch Metaheuristic with Data Mining Technique to solve p-median Problem,International Journal of Computational Engineering Research (IJCER), Vol. 2, Issue 7,November-2012.[19] R. Lakshman Naik, D. Ramesh and B. Manjula, “Instances Selection using AdvanceData Mining Techniques”, International Journal of Computer Engineering &Technology (IJCET), Volume 3, Issue 2, 2012, pp. 47 - 53, ISSN Print: 0976 – 6367,ISSN Online: 0976 – 6375.[20] M. Karthikeyan, M. Suriya Kumar and Dr. S. Karthikeyan, “A Literature Review on theData Mining and Information Security”, International Journal of Computer Engineering& Technology (IJCET), Volume 3, Issue 1, 2012, pp. 141 - 146, ISSN Print:0976 – 6367, ISSN Online: 0976 – 6375.

×