SlideShare a Scribd company logo
K-Means, its Variants and its Applications
Group 9
-------------------------------
Varad Meru, Nikhil Ingole, Mansi Kulkarni, Vishal Bhavsar, Prasad Mohite
-------------------------------
Guided By: Mrs. V. S. Rupnar
-------------------------------
Department of Computer Science and Engineering
D. Y. Patil College of Engineering and Technology
Kolhapur
1
Monday, 29 July 13
Work Completed in the Previous Semester
✓ Selection of Topic and Preliminary Understanding of Clustering.
✓ Implementation of K-Means algorithm with Synthetic Data.
✓ Development of Graphical Representation of Clusters.
✓ Understanding and Implementation of Rough Set Clustering.
✓ Real World Data : Data Collection based on Surveys.
✓ Implementation of Conventional Clustering on Input Surveys Details for Cluster Generation and Recommender System.
✓ Implementing Rough-Set Clustering on Input Surveys Details for Cluster Generation and Recommender System.
2
Monday, 29 July 13
Work Completed in this Semester
✓ Study of Genetic Algorithms and its Implementation issues.
✓ Adaption of JavaGAlib for K-Means Clustering.
✓ Verification and Validation of Cluster Quality with all the following Processes :
➡ K-Means, Rough K-Means, GA Rough K-Means.
✓ Recommender System Design and Initial Prototype Evaluation based on K-Means Algorithm.
✓ Verification and Validation of Recommendations and Applying Heuristics on the Results of the Recommendations for
Precision
✓ Recommender System Design and Initial Prototype Evaluation based on Rough K-Means Algorithm.
3
Monday, 29 July 13
Introduction to Clustering
• Organizing data into clusters such that there is
• high intra-cluster similarity
• low inter-cluster similarity
• Informally, finding natural groupings among
objects.
• Applications of clustering range from various fields
• Data Compression, Data Modeling, Expression
Analysis and other Fields of Applications.
4
Monday, 29 July 13
Introduction to K-Means Algorithm
• It was proposed in the year 1956 by Hugo Steinhaus.
• It finds partitions such that the Squared Error between the Empirical Mean of a Cluster and the Points in that
Cluster is Minimized
• Squared Error is defined as :
• The Goal of K-Means is to minimize the sum of the Squared Error over all the K-Clusters.
• Minimizing this Objective Function is known to be an NP-Hard Problem (even for K=2).
5
Monday, 29 July 13
K-Means Clustering Algorithm
Stop
Start
Input: K, no. of Clusters
to be Formed
Centroid Initialization
Find Distance of
Objects
to Centroids
Partition based on
Minimum distance
New Additions
in Group ?
Yes
No
6
Monday, 29 July 13
Graph of Clusters in Synthetic DataResult of K-Means Algorithm
6 Lingras
Fig. 2. Synthetic data
7
Monday, 29 July 13
10 20 30 40 50
10
20
30
40
50
Visual Representation of Clusters Formed.
k=2
k=6
k=4
k=1
Monday, 29 July 13
Demo
K-Means Algorithm
9
Monday, 29 July 13
Introduction to Rough Sets
• It was proposed in the year 1991 by Zdzislaw I. Pawlak.
• Formal Approximation of Crisp Sets in terms of a pair of sets.
• Pairs gives the Lower and Upper Approximation of original set.
• The Rough set are based on Equivalence class partitioning.
• The pair A=(U,R) is called Approximation Space.
• The lower bound is the union of all the elementary sets which are subsets of X.
• The upper bound is the union of all elementary sets which have a non-empty intersection with X
• The set X{ , } is the formal representation of regular set X.
• It is not possible to differentiate the elements within the same equivalence class.
Monday, 29 July 13
Adaptation of Rough Sets into K-Means Clustering
• We consider the upper and lower bounds for only a few subsets of U.
• It is not possible to verify all the properties of the rough sets ( Pawlak, `82,`91).
• Lingras et. al. classified these compulsory rules for rough set clustering
• An object v can be part of at most one lower bound
•
• An object v is not part of any lower bound v belongs to two or more upper bounds.
Monday, 29 July 13
Evolutionary Rough K-means 7
Fig. 3. Rough clusters for the synthetic data
ified criterion. The paper demonstrates the use of the proposed algorithm for a
Result of Rough Set Clustering Graph of Clusters in Synthetic Data
12
Monday, 29 July 13
Lingras’s Absolute Distance Formula
• If the distance given by :
• Consider the Set T : -
• T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds.
• T = Ø, X Exists in lower bound of only one cluster.
1482 G. Peters / Pattern Reco
Boundary
Area
Lower Approximation
Upper Approximation
Fig. 1. Lower, upper approximation and boundary area.
Monday, 29 July 13
Peters’s Refinements on Lingras’s Absolute Distance Formula
• Limitations of Lingras method-
• Outlier in inline position: b = az.
• Outlier in an rectangular position.
Monday, 29 July 13
Modified Rough K-Means
• Centroid calculation in Rough Clustering
• Membership Assignment on the basis of
• Let , the ratio are used to determine the membership of X.
• Let and .
• T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds.
• T = Ø, X Exists in lower bound of only one cluster.
Monday, 29 July 13
Working Algorithm of Rough
K-Means Implementation
Monday, 29 July 13
Visual Representation of Rough K-Means Forming 3 Rough Sets
Monday, 29 July 13
Demo
Rough K-Means Algorithm
18
Monday, 29 July 13
Genetic Algorithm based Rough Set Clustering
• Genetic Algorithms - Introduction
• A search process that follows the principles of evolution through natural selection.
• Important terms : Genes, Genome, Chromosomes, Populations, Generations, Fitness, Selection, Crossover, Mutation.
• This paradigm has the following steps
• generate initial population, G(0);
	 	 evaluate G(0);
	 	 for (t = 1; solution is not found; t++)
	 	 	 	 generate G(t) using G(t-1);
	 	 	 	 evaluate G(t);
19
Monday, 29 July 13
Genetic Algorithm based Rough Set Clustering
• Genetic Algorithms for Rough set Clustering
• JavaGALib : A Java Library built by Jeff Smith of SoftTechDesign to support GA operations
20
p - Threshold
D(n,m) - A Dataset with n objects of m dimensions
k - The number of Clusters
w_lower, w_upper
population - The number of chromosomes to be generated
generations - The number of successive populations to be
generated
Input Fields -
A set of clusters. Each cluster is by the objects in the lower
region and boundary region(upper bound)
Output -
• Data Structures used for Genetic Algorithms for Rough set Clustering
...
Chromosomes
Centroid1* Centroid2* Centroid3*
Monday, 29 July 13
Genetic Algorithm based Rough Set Clustering
• Constructor Description for Genetic Algorithm
• super(numOfClusters*numOfDimensions,//no.of genes in a chromosome
	 	 	 	 100,//population of chromosome
	 	 	 	 0.7,//crossover probability	 	 	 	
	 	 	 	 6,//random selection chance
	 	 	 	 50,//stop after these many generations
	 	 	 	 10,//no. of preliminary runs to build good breeding stack for finding fall run
	 	 	 	 20,//max preliminary generations
	 	 	 	 0.1,//chromosome mutation probability
	 	 	 	 Crossover.ctTwoPoint,//crossover type
	 	 	 	 2,//number of decimal pts of precision
	 	 	 	 false//considers only float numbers
	 	 	 	 );
	 }//end constructor
21
• Evolve Function
computeFitnessRankings();
	 doGeneticMating();
	 copyNextGenToThisGen();
Monday, 29 July 13
Demo
Genetic Algorithm based Rough K-Means
Algorithm
22
Monday, 29 July 13
Rough Set Clustering based on Kohonen SOM Paradigm
• Kohonen network Architecture is used as an Artificial Neural Network Paradigm.
• The Single level, One-Dimensional case can be seen in fig. 1.
• The weight vector x for a group that is closest to the pattern v is modified using
• void update(int winner, int objectID) {
	 	 for (int j = 0; j < weights[winner].length; j++)
	 	 	 weights[winner][j] = (1 - alpha) * weights[winner][j] + alpha
	 	 	 	 	 * objects[objectID][j];
• The Updates are carried over the previous weights.
23
J
0 0
1
Output Layer
Input Layer
Fig. 1. Kohonen
Neural Network
Monday, 29 July 13
Rough Set Clustering based on Kohonen SOM Paradigm
• The distance metric is calculated by the following code fragment
• double dist(int objectID, int weightID) {
	 	 double d = 0;
	 	 for (int j = 0; j < weights[0].length; j++) {
	 	 	 double o = objects[objectID][j]; double c = weights[weightID][j];
	 	 	 d += (c - o) * (c - o);
	 	 } if (weights[0].length == 0)
	 	 	 return 0;
	 	 return Math.sqrt(d) / weights[0].length;
	 }
• The Flow of the Kohonen K-Means Implementation is as follows
• Kohonen m = new Kohonen(numOfRows, numOfCols, numOfClusters, 0.01);
	 	 m.readObjects(args[0]);
	 	 m.makeClusters(numOfIterations);
	 	 m.writeClusters();
	 	 m.writeCentroids();
24
X1
0 01
X2
X3
0 1 0
Monday, 29 July 13
Demo
Kohonen Self-Organized Maps based K-Means
Algorithm
25
Monday, 29 July 13
Recommender System based on Clustering
• Recommender System is an Information Filtering Technique based System.
• It applies Knowledge Discovery Techniques such as Clustering, Classification, and Filtering to find out
Recommendations.
• Exposing the most interesting items for the user saves time and energy.
• Techniques include K-Nearest Neighbor and Collaborative filtering to give Recommendations.
• Why Clustering?
• Basic feature of clustering algorithm is natural grouping.
• Challenges in above two algorithms are overcome.
• K-Means works on a P-Time algorithm to give crisp Clusters.
26
Monday, 29 July 13
Recommender System based on Clustering
• Recommendations for K-Means Algorithm:
• All the members of the cluster where the user lies are recommended.
• Recommendations for Rough K-Means Algorithm:
• If the user lies in lower bound of the cluster, All the members lying in lower bound of that cluster are
recommended.
• If the user lies in the upper bound of two or more clusters, All the members in the upper bound are
recommended.
Monday, 29 July 13
Recommender System based on Clustering
28
System ArchitectureUser Perspective
Monday, 29 July 13
Demo
Recommender System
29
Monday, 29 July 13
References
• Completed:
✓ K-Means Algorithm
• “Data Clustering: 50 Years Beyond K-Means”, Anil K. Jain, 2010.
✓ Rough Set based K-Means Algorithm
• “Precision of Rough Set Clustering”, Pawan Lingras, Min Chen, Duoqian Miao, 2008
• “Some Refinements of Rough K-means Clustering”, George Peters, 2006.
• “Interval Set Clustering of Web Users with Rough K-Means”, Pawan Lingras, Chad West, 2003
✓ Rough K-Means based on Genetic Algorithm and Kohonen Self-Organizing Maps Paradigm
• “Applications of Rough Set Based K-Means, Kohonen SOM, GA Clustering”, Pawan Lingras, 2006.
• “Evolutionary Rough K-Means Clustering”, Pawan Lingras, 2009.
30
Monday, 29 July 13
References (Contd.)
• Recommender System
• “Enhanced K-means-Based Mobile Recommender System”, Gamal Hussein, International Journal of
Information Studies, April 2010.
• “Clustering Social Networks”, Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert E. Tarjan, 2006
• K-Means based on Genetic Algorithms
• “Genetic K-Means Algorithm”, K. Krishna and M. Narasimha Murty, IEEE Transactions on Systems, Man and
Cybernetics, 1999.
• “Initializing K-Means using Genetic Algorithms”, Bashar Al-Shboul, and Sung-Hyon Myaeng, World Academy
of Science, Engineering and Technology, 2009.
• Advanced Topics
• “FGKA- A Fast Genetic K-means Clustering Algorithm”,Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng,
Susan J. Brown, 2004.
• “Incremental genetic K-means algorithm and its application in gene expression data analysis”, Yi Lu, Shiyong
Lu, Farshad Fotouhi, Youping Deng, Susan J. Brown, 2004.
• “A Genetic Algorithm for Clustering on Image Data”, Qin Ding and Jim Gasvoda, International Journal of
Computational Intelligence,2004.
31
Monday, 29 July 13
Thank You
Group 9
Have a Nice Day !!!
32
Monday, 29 July 13

More Related Content

What's hot

K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
ijscmc
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Clustering - K-Means, DBSCAN
Clustering - K-Means, DBSCANClustering - K-Means, DBSCAN
Clustering - K-Means, DBSCAN
Medicaps University
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
Muhammad Rasel
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
Rupak Roy
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
Roshan86572
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
Lakshmi Sarvani Videla
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
Mohamed Talaat
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
DataminingTools Inc
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Sung Yub Kim
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
Krish_ver2
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
Stockholm University
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
singh7599
 

What's hot (20)

Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Clustering - K-Means, DBSCAN
Clustering - K-Means, DBSCANClustering - K-Means, DBSCAN
Clustering - K-Means, DBSCAN
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 

Similar to K-Means, its Variants and its Applications

Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
Subrata Kumer Paul
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
vikassingh569137
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
IJCSIS Research Publications
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
Madan Golla
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
Vahid Mirjalili
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
19526YuvaKumarIrigi
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
SueMiu
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means ClusteringJunghoon Kim
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
YatharthKhichar1
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
nikshaikh786
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
VitAnhNguyn94
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
LPrashanthi
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
VoidVampire
 
Presentation Template__TY_AIML_IE2_Project (1).pptx
Presentation Template__TY_AIML_IE2_Project (1).pptxPresentation Template__TY_AIML_IE2_Project (1).pptx
Presentation Template__TY_AIML_IE2_Project (1).pptx
SYETB202RandhirBhosa
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
 

Similar to K-Means, its Variants and its Applications (20)

Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
Presentation Template__TY_AIML_IE2_Project (1).pptx
Presentation Template__TY_AIML_IE2_Project (1).pptxPresentation Template__TY_AIML_IE2_Project (1).pptx
Presentation Template__TY_AIML_IE2_Project (1).pptx
 
RBF2.ppt
RBF2.pptRBF2.ppt
RBF2.ppt
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 

More from Varad Meru

Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
Varad Meru
 
Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep Learning
Varad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Varad Meru
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Varad Meru
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
Varad Meru
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
Varad Meru
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
Varad Meru
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An Overview
Varad Meru
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Varad Meru
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
Varad Meru
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Varad Meru
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Varad Meru
 
Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...
Varad Meru
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
Varad Meru
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducation
Varad Meru
 

More from Varad Meru (16)

Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Generating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep LearningGenerating Musical Notes and Transcription using Deep Learning
Generating Musical Notes and Transcription using Deep Learning
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Kakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction ProblemKakuro: Solving the Constraint Satisfaction Problem
Kakuro: Solving the Constraint Satisfaction Problem
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
Cloud Computing: An Overview
Cloud Computing: An OverviewCloud Computing: An Overview
Cloud Computing: An Overview
 
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Live Wide-Area Migration of Virtual Machines including Local Persistent State.
Live Wide-Area Migration of Virtual Machines including Local Persistent State.
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...Big Data, Hadoop, NoSQL and more ...
Big Data, Hadoop, NoSQL and more ...
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
OpenSourceEducation
OpenSourceEducationOpenSourceEducation
OpenSourceEducation
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

K-Means, its Variants and its Applications

  • 1. K-Means, its Variants and its Applications Group 9 ------------------------------- Varad Meru, Nikhil Ingole, Mansi Kulkarni, Vishal Bhavsar, Prasad Mohite ------------------------------- Guided By: Mrs. V. S. Rupnar ------------------------------- Department of Computer Science and Engineering D. Y. Patil College of Engineering and Technology Kolhapur 1 Monday, 29 July 13
  • 2. Work Completed in the Previous Semester ✓ Selection of Topic and Preliminary Understanding of Clustering. ✓ Implementation of K-Means algorithm with Synthetic Data. ✓ Development of Graphical Representation of Clusters. ✓ Understanding and Implementation of Rough Set Clustering. ✓ Real World Data : Data Collection based on Surveys. ✓ Implementation of Conventional Clustering on Input Surveys Details for Cluster Generation and Recommender System. ✓ Implementing Rough-Set Clustering on Input Surveys Details for Cluster Generation and Recommender System. 2 Monday, 29 July 13
  • 3. Work Completed in this Semester ✓ Study of Genetic Algorithms and its Implementation issues. ✓ Adaption of JavaGAlib for K-Means Clustering. ✓ Verification and Validation of Cluster Quality with all the following Processes : ➡ K-Means, Rough K-Means, GA Rough K-Means. ✓ Recommender System Design and Initial Prototype Evaluation based on K-Means Algorithm. ✓ Verification and Validation of Recommendations and Applying Heuristics on the Results of the Recommendations for Precision ✓ Recommender System Design and Initial Prototype Evaluation based on Rough K-Means Algorithm. 3 Monday, 29 July 13
  • 4. Introduction to Clustering • Organizing data into clusters such that there is • high intra-cluster similarity • low inter-cluster similarity • Informally, finding natural groupings among objects. • Applications of clustering range from various fields • Data Compression, Data Modeling, Expression Analysis and other Fields of Applications. 4 Monday, 29 July 13
  • 5. Introduction to K-Means Algorithm • It was proposed in the year 1956 by Hugo Steinhaus. • It finds partitions such that the Squared Error between the Empirical Mean of a Cluster and the Points in that Cluster is Minimized • Squared Error is defined as : • The Goal of K-Means is to minimize the sum of the Squared Error over all the K-Clusters. • Minimizing this Objective Function is known to be an NP-Hard Problem (even for K=2). 5 Monday, 29 July 13
  • 6. K-Means Clustering Algorithm Stop Start Input: K, no. of Clusters to be Formed Centroid Initialization Find Distance of Objects to Centroids Partition based on Minimum distance New Additions in Group ? Yes No 6 Monday, 29 July 13
  • 7. Graph of Clusters in Synthetic DataResult of K-Means Algorithm 6 Lingras Fig. 2. Synthetic data 7 Monday, 29 July 13
  • 8. 10 20 30 40 50 10 20 30 40 50 Visual Representation of Clusters Formed. k=2 k=6 k=4 k=1 Monday, 29 July 13
  • 10. Introduction to Rough Sets • It was proposed in the year 1991 by Zdzislaw I. Pawlak. • Formal Approximation of Crisp Sets in terms of a pair of sets. • Pairs gives the Lower and Upper Approximation of original set. • The Rough set are based on Equivalence class partitioning. • The pair A=(U,R) is called Approximation Space. • The lower bound is the union of all the elementary sets which are subsets of X. • The upper bound is the union of all elementary sets which have a non-empty intersection with X • The set X{ , } is the formal representation of regular set X. • It is not possible to differentiate the elements within the same equivalence class. Monday, 29 July 13
  • 11. Adaptation of Rough Sets into K-Means Clustering • We consider the upper and lower bounds for only a few subsets of U. • It is not possible to verify all the properties of the rough sets ( Pawlak, `82,`91). • Lingras et. al. classified these compulsory rules for rough set clustering • An object v can be part of at most one lower bound • • An object v is not part of any lower bound v belongs to two or more upper bounds. Monday, 29 July 13
  • 12. Evolutionary Rough K-means 7 Fig. 3. Rough clusters for the synthetic data ified criterion. The paper demonstrates the use of the proposed algorithm for a Result of Rough Set Clustering Graph of Clusters in Synthetic Data 12 Monday, 29 July 13
  • 13. Lingras’s Absolute Distance Formula • If the distance given by : • Consider the Set T : - • T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds. • T = Ø, X Exists in lower bound of only one cluster. 1482 G. Peters / Pattern Reco Boundary Area Lower Approximation Upper Approximation Fig. 1. Lower, upper approximation and boundary area. Monday, 29 July 13
  • 14. Peters’s Refinements on Lingras’s Absolute Distance Formula • Limitations of Lingras method- • Outlier in inline position: b = az. • Outlier in an rectangular position. Monday, 29 July 13
  • 15. Modified Rough K-Means • Centroid calculation in Rough Clustering • Membership Assignment on the basis of • Let , the ratio are used to determine the membership of X. • Let and . • T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds. • T = Ø, X Exists in lower bound of only one cluster. Monday, 29 July 13
  • 16. Working Algorithm of Rough K-Means Implementation Monday, 29 July 13
  • 17. Visual Representation of Rough K-Means Forming 3 Rough Sets Monday, 29 July 13
  • 19. Genetic Algorithm based Rough Set Clustering • Genetic Algorithms - Introduction • A search process that follows the principles of evolution through natural selection. • Important terms : Genes, Genome, Chromosomes, Populations, Generations, Fitness, Selection, Crossover, Mutation. • This paradigm has the following steps • generate initial population, G(0); evaluate G(0); for (t = 1; solution is not found; t++) generate G(t) using G(t-1); evaluate G(t); 19 Monday, 29 July 13
  • 20. Genetic Algorithm based Rough Set Clustering • Genetic Algorithms for Rough set Clustering • JavaGALib : A Java Library built by Jeff Smith of SoftTechDesign to support GA operations 20 p - Threshold D(n,m) - A Dataset with n objects of m dimensions k - The number of Clusters w_lower, w_upper population - The number of chromosomes to be generated generations - The number of successive populations to be generated Input Fields - A set of clusters. Each cluster is by the objects in the lower region and boundary region(upper bound) Output - • Data Structures used for Genetic Algorithms for Rough set Clustering ... Chromosomes Centroid1* Centroid2* Centroid3* Monday, 29 July 13
  • 21. Genetic Algorithm based Rough Set Clustering • Constructor Description for Genetic Algorithm • super(numOfClusters*numOfDimensions,//no.of genes in a chromosome 100,//population of chromosome 0.7,//crossover probability 6,//random selection chance 50,//stop after these many generations 10,//no. of preliminary runs to build good breeding stack for finding fall run 20,//max preliminary generations 0.1,//chromosome mutation probability Crossover.ctTwoPoint,//crossover type 2,//number of decimal pts of precision false//considers only float numbers ); }//end constructor 21 • Evolve Function computeFitnessRankings(); doGeneticMating(); copyNextGenToThisGen(); Monday, 29 July 13
  • 22. Demo Genetic Algorithm based Rough K-Means Algorithm 22 Monday, 29 July 13
  • 23. Rough Set Clustering based on Kohonen SOM Paradigm • Kohonen network Architecture is used as an Artificial Neural Network Paradigm. • The Single level, One-Dimensional case can be seen in fig. 1. • The weight vector x for a group that is closest to the pattern v is modified using • void update(int winner, int objectID) { for (int j = 0; j < weights[winner].length; j++) weights[winner][j] = (1 - alpha) * weights[winner][j] + alpha * objects[objectID][j]; • The Updates are carried over the previous weights. 23 J 0 0 1 Output Layer Input Layer Fig. 1. Kohonen Neural Network Monday, 29 July 13
  • 24. Rough Set Clustering based on Kohonen SOM Paradigm • The distance metric is calculated by the following code fragment • double dist(int objectID, int weightID) { double d = 0; for (int j = 0; j < weights[0].length; j++) { double o = objects[objectID][j]; double c = weights[weightID][j]; d += (c - o) * (c - o); } if (weights[0].length == 0) return 0; return Math.sqrt(d) / weights[0].length; } • The Flow of the Kohonen K-Means Implementation is as follows • Kohonen m = new Kohonen(numOfRows, numOfCols, numOfClusters, 0.01); m.readObjects(args[0]); m.makeClusters(numOfIterations); m.writeClusters(); m.writeCentroids(); 24 X1 0 01 X2 X3 0 1 0 Monday, 29 July 13
  • 25. Demo Kohonen Self-Organized Maps based K-Means Algorithm 25 Monday, 29 July 13
  • 26. Recommender System based on Clustering • Recommender System is an Information Filtering Technique based System. • It applies Knowledge Discovery Techniques such as Clustering, Classification, and Filtering to find out Recommendations. • Exposing the most interesting items for the user saves time and energy. • Techniques include K-Nearest Neighbor and Collaborative filtering to give Recommendations. • Why Clustering? • Basic feature of clustering algorithm is natural grouping. • Challenges in above two algorithms are overcome. • K-Means works on a P-Time algorithm to give crisp Clusters. 26 Monday, 29 July 13
  • 27. Recommender System based on Clustering • Recommendations for K-Means Algorithm: • All the members of the cluster where the user lies are recommended. • Recommendations for Rough K-Means Algorithm: • If the user lies in lower bound of the cluster, All the members lying in lower bound of that cluster are recommended. • If the user lies in the upper bound of two or more clusters, All the members in the upper bound are recommended. Monday, 29 July 13
  • 28. Recommender System based on Clustering 28 System ArchitectureUser Perspective Monday, 29 July 13
  • 30. References • Completed: ✓ K-Means Algorithm • “Data Clustering: 50 Years Beyond K-Means”, Anil K. Jain, 2010. ✓ Rough Set based K-Means Algorithm • “Precision of Rough Set Clustering”, Pawan Lingras, Min Chen, Duoqian Miao, 2008 • “Some Refinements of Rough K-means Clustering”, George Peters, 2006. • “Interval Set Clustering of Web Users with Rough K-Means”, Pawan Lingras, Chad West, 2003 ✓ Rough K-Means based on Genetic Algorithm and Kohonen Self-Organizing Maps Paradigm • “Applications of Rough Set Based K-Means, Kohonen SOM, GA Clustering”, Pawan Lingras, 2006. • “Evolutionary Rough K-Means Clustering”, Pawan Lingras, 2009. 30 Monday, 29 July 13
  • 31. References (Contd.) • Recommender System • “Enhanced K-means-Based Mobile Recommender System”, Gamal Hussein, International Journal of Information Studies, April 2010. • “Clustering Social Networks”, Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert E. Tarjan, 2006 • K-Means based on Genetic Algorithms • “Genetic K-Means Algorithm”, K. Krishna and M. Narasimha Murty, IEEE Transactions on Systems, Man and Cybernetics, 1999. • “Initializing K-Means using Genetic Algorithms”, Bashar Al-Shboul, and Sung-Hyon Myaeng, World Academy of Science, Engineering and Technology, 2009. • Advanced Topics • “FGKA- A Fast Genetic K-means Clustering Algorithm”,Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, Susan J. Brown, 2004. • “Incremental genetic K-means algorithm and its application in gene expression data analysis”, Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, Susan J. Brown, 2004. • “A Genetic Algorithm for Clustering on Image Data”, Qin Ding and Jim Gasvoda, International Journal of Computational Intelligence,2004. 31 Monday, 29 July 13
  • 32. Thank You Group 9 Have a Nice Day !!! 32 Monday, 29 July 13