Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Application of Clustering in Data S... by Edureka! 31620 views
- Cluster analysis for market segment... by Vishal Tandel 21802 views
- Cluster analysis by Jewel Refran 90834 views
- Three case studies deploying cluste... by Greg Makowski 2871 views
- Machine Learning and Real-World App... by MachinePulse 11362 views
- Customer Clustering For Retail Mark... by Jonathan Sedar 18417 views

9,154 views

Published on

The Members of the Project group were -

Mansi Kulkarni,

Nikhil Ingole,

Prasad Mohite,

Varad Meru

Vishal Bhavsar.

Wonderful Experience !!!

No Downloads

Total views

9,154

On SlideShare

0

From Embeds

0

Number of Embeds

188

Shares

0

Downloads

0

Comments

0

Likes

9

No embeds

No notes for slide

- 1. K-Means, its Variants and its Applications Group 9 ------------------------------- Varad Meru, Nikhil Ingole, Mansi Kulkarni, Vishal Bhavsar, Prasad Mohite ------------------------------- Guided By: Mrs. V. S. Rupnar ------------------------------- Department of Computer Science and Engineering D. Y. Patil College of Engineering and Technology Kolhapur 1 Monday, 29 July 13
- 2. Work Completed in the Previous Semester ✓ Selection of Topic and Preliminary Understanding of Clustering. ✓ Implementation of K-Means algorithm with Synthetic Data. ✓ Development of Graphical Representation of Clusters. ✓ Understanding and Implementation of Rough Set Clustering. ✓ Real World Data : Data Collection based on Surveys. ✓ Implementation of Conventional Clustering on Input Surveys Details for Cluster Generation and Recommender System. ✓ Implementing Rough-Set Clustering on Input Surveys Details for Cluster Generation and Recommender System. 2 Monday, 29 July 13
- 3. Work Completed in this Semester ✓ Study of Genetic Algorithms and its Implementation issues. ✓ Adaption of JavaGAlib for K-Means Clustering. ✓ Veriﬁcation and Validation of Cluster Quality with all the following Processes : ➡ K-Means, Rough K-Means, GA Rough K-Means. ✓ Recommender System Design and Initial Prototype Evaluation based on K-Means Algorithm. ✓ Veriﬁcation and Validation of Recommendations and Applying Heuristics on the Results of the Recommendations for Precision ✓ Recommender System Design and Initial Prototype Evaluation based on Rough K-Means Algorithm. 3 Monday, 29 July 13
- 4. Introduction to Clustering • Organizing data into clusters such that there is • high intra-cluster similarity • low inter-cluster similarity • Informally, ﬁnding natural groupings among objects. • Applications of clustering range from various ﬁelds • Data Compression, Data Modeling, Expression Analysis and other Fields of Applications. 4 Monday, 29 July 13
- 5. Introduction to K-Means Algorithm • It was proposed in the year 1956 by Hugo Steinhaus. • It ﬁnds partitions such that the Squared Error between the Empirical Mean of a Cluster and the Points in that Cluster is Minimized • Squared Error is deﬁned as : • The Goal of K-Means is to minimize the sum of the Squared Error over all the K-Clusters. • Minimizing this Objective Function is known to be an NP-Hard Problem (even for K=2). 5 Monday, 29 July 13
- 6. K-Means Clustering Algorithm Stop Start Input: K, no. of Clusters to be Formed Centroid Initialization Find Distance of Objects to Centroids Partition based on Minimum distance New Additions in Group ? Yes No 6 Monday, 29 July 13
- 7. Graph of Clusters in Synthetic DataResult of K-Means Algorithm 6 Lingras Fig. 2. Synthetic data 7 Monday, 29 July 13
- 8. 10 20 30 40 50 10 20 30 40 50 Visual Representation of Clusters Formed. k=2 k=6 k=4 k=1 Monday, 29 July 13
- 9. Demo K-Means Algorithm 9 Monday, 29 July 13
- 10. Introduction to Rough Sets • It was proposed in the year 1991 by Zdzislaw I. Pawlak. • Formal Approximation of Crisp Sets in terms of a pair of sets. • Pairs gives the Lower and Upper Approximation of original set. • The Rough set are based on Equivalence class partitioning. • The pair A=(U,R) is called Approximation Space. • The lower bound is the union of all the elementary sets which are subsets of X. • The upper bound is the union of all elementary sets which have a non-empty intersection with X • The set X{ , } is the formal representation of regular set X. • It is not possible to diﬀerentiate the elements within the same equivalence class. Monday, 29 July 13
- 11. Adaptation of Rough Sets into K-Means Clustering • We consider the upper and lower bounds for only a few subsets of U. • It is not possible to verify all the properties of the rough sets ( Pawlak, `82,`91). • Lingras et. al. classiﬁed these compulsory rules for rough set clustering • An object v can be part of at most one lower bound • • An object v is not part of any lower bound v belongs to two or more upper bounds. Monday, 29 July 13
- 12. Evolutionary Rough K-means 7 Fig. 3. Rough clusters for the synthetic data iﬁed criterion. The paper demonstrates the use of the proposed algorithm for a Result of Rough Set Clustering Graph of Clusters in Synthetic Data 12 Monday, 29 July 13
- 13. Lingras’s Absolute Distance Formula • If the distance given by : • Consider the Set T : - • T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds. • T = Ø, X Exists in lower bound of only one cluster. 1482 G. Peters / Pattern Reco Boundary Area Lower Approximation Upper Approximation Fig. 1. Lower, upper approximation and boundary area. Monday, 29 July 13
- 14. Peters’s Reﬁnements on Lingras’s Absolute Distance Formula • Limitations of Lingras method- • Outlier in inline position: b = az. • Outlier in an rectangular position. Monday, 29 July 13
- 15. Modiﬁed Rough K-Means • Centroid calculation in Rough Clustering • Membership Assignment on the basis of • Let , the ratio are used to determine the membership of X. • Let and . • T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds. • T = Ø, X Exists in lower bound of only one cluster. Monday, 29 July 13
- 16. Working Algorithm of Rough K-Means Implementation Monday, 29 July 13
- 17. Visual Representation of Rough K-Means Forming 3 Rough Sets Monday, 29 July 13
- 18. Demo Rough K-Means Algorithm 18 Monday, 29 July 13
- 19. Genetic Algorithm based Rough Set Clustering • Genetic Algorithms - Introduction • A search process that follows the principles of evolution through natural selection. • Important terms : Genes, Genome, Chromosomes, Populations, Generations, Fitness, Selection, Crossover, Mutation. • This paradigm has the following steps • generate initial population, G(0); evaluate G(0); for (t = 1; solution is not found; t++) generate G(t) using G(t-1); evaluate G(t); 19 Monday, 29 July 13
- 20. Genetic Algorithm based Rough Set Clustering • Genetic Algorithms for Rough set Clustering • JavaGALib : A Java Library built by Jeﬀ Smith of SoftTechDesign to support GA operations 20 p - Threshold D(n,m) - A Dataset with n objects of m dimensions k - The number of Clusters w_lower, w_upper population - The number of chromosomes to be generated generations - The number of successive populations to be generated Input Fields - A set of clusters. Each cluster is by the objects in the lower region and boundary region(upper bound) Output - • Data Structures used for Genetic Algorithms for Rough set Clustering ... Chromosomes Centroid1* Centroid2* Centroid3* Monday, 29 July 13
- 21. Genetic Algorithm based Rough Set Clustering • Constructor Description for Genetic Algorithm • super(numOfClusters*numOfDimensions,//no.of genes in a chromosome 100,//population of chromosome 0.7,//crossover probability 6,//random selection chance 50,//stop after these many generations 10,//no. of preliminary runs to build good breeding stack for finding fall run 20,//max preliminary generations 0.1,//chromosome mutation probability Crossover.ctTwoPoint,//crossover type 2,//number of decimal pts of precision false//considers only float numbers ); }//end constructor 21 • Evolve Function computeFitnessRankings(); doGeneticMating(); copyNextGenToThisGen(); Monday, 29 July 13
- 22. Demo Genetic Algorithm based Rough K-Means Algorithm 22 Monday, 29 July 13
- 23. Rough Set Clustering based on Kohonen SOM Paradigm • Kohonen network Architecture is used as an Artiﬁcial Neural Network Paradigm. • The Single level, One-Dimensional case can be seen in ﬁg. 1. • The weight vector x for a group that is closest to the pattern v is modiﬁed using • void update(int winner, int objectID) { for (int j = 0; j < weights[winner].length; j++) weights[winner][j] = (1 - alpha) * weights[winner][j] + alpha * objects[objectID][j]; • The Updates are carried over the previous weights. 23 J 0 0 1 Output Layer Input Layer Fig. 1. Kohonen Neural Network Monday, 29 July 13
- 24. Rough Set Clustering based on Kohonen SOM Paradigm • The distance metric is calculated by the following code fragment • double dist(int objectID, int weightID) { double d = 0; for (int j = 0; j < weights[0].length; j++) { double o = objects[objectID][j]; double c = weights[weightID][j]; d += (c - o) * (c - o); } if (weights[0].length == 0) return 0; return Math.sqrt(d) / weights[0].length; } • The Flow of the Kohonen K-Means Implementation is as follows • Kohonen m = new Kohonen(numOfRows, numOfCols, numOfClusters, 0.01); m.readObjects(args[0]); m.makeClusters(numOfIterations); m.writeClusters(); m.writeCentroids(); 24 X1 0 01 X2 X3 0 1 0 Monday, 29 July 13
- 25. Demo Kohonen Self-Organized Maps based K-Means Algorithm 25 Monday, 29 July 13
- 26. Recommender System based on Clustering • Recommender System is an Information Filtering Technique based System. • It applies Knowledge Discovery Techniques such as Clustering, Classiﬁcation, and Filtering to ﬁnd out Recommendations. • Exposing the most interesting items for the user saves time and energy. • Techniques include K-Nearest Neighbor and Collaborative ﬁltering to give Recommendations. • Why Clustering? • Basic feature of clustering algorithm is natural grouping. • Challenges in above two algorithms are overcome. • K-Means works on a P-Time algorithm to give crisp Clusters. 26 Monday, 29 July 13
- 27. Recommender System based on Clustering • Recommendations for K-Means Algorithm: • All the members of the cluster where the user lies are recommended. • Recommendations for Rough K-Means Algorithm: • If the user lies in lower bound of the cluster, All the members lying in lower bound of that cluster are recommended. • If the user lies in the upper bound of two or more clusters, All the members in the upper bound are recommended. Monday, 29 July 13
- 28. Recommender System based on Clustering 28 System ArchitectureUser Perspective Monday, 29 July 13
- 29. Demo Recommender System 29 Monday, 29 July 13
- 30. References • Completed: ✓ K-Means Algorithm • “Data Clustering: 50 Years Beyond K-Means”, Anil K. Jain, 2010. ✓ Rough Set based K-Means Algorithm • “Precision of Rough Set Clustering”, Pawan Lingras, Min Chen, Duoqian Miao, 2008 • “Some Reﬁnements of Rough K-means Clustering”, George Peters, 2006. • “Interval Set Clustering of Web Users with Rough K-Means”, Pawan Lingras, Chad West, 2003 ✓ Rough K-Means based on Genetic Algorithm and Kohonen Self-Organizing Maps Paradigm • “Applications of Rough Set Based K-Means, Kohonen SOM, GA Clustering”, Pawan Lingras, 2006. • “Evolutionary Rough K-Means Clustering”, Pawan Lingras, 2009. 30 Monday, 29 July 13
- 31. References (Contd.) • Recommender System • “Enhanced K-means-Based Mobile Recommender System”, Gamal Hussein, International Journal of Information Studies, April 2010. • “Clustering Social Networks”, Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert E. Tarjan, 2006 • K-Means based on Genetic Algorithms • “Genetic K-Means Algorithm”, K. Krishna and M. Narasimha Murty, IEEE Transactions on Systems, Man and Cybernetics, 1999. • “Initializing K-Means using Genetic Algorithms”, Bashar Al-Shboul, and Sung-Hyon Myaeng, World Academy of Science, Engineering and Technology, 2009. • Advanced Topics • “FGKA- A Fast Genetic K-means Clustering Algorithm”,Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, Susan J. Brown, 2004. • “Incremental genetic K-means algorithm and its application in gene expression data analysis”, Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, Susan J. Brown, 2004. • “A Genetic Algorithm for Clustering on Image Data”, Qin Ding and Jim Gasvoda, International Journal of Computational Intelligence,2004. 31 Monday, 29 July 13
- 32. Thank You Group 9 Have a Nice Day !!! 32 Monday, 29 July 13

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment