SlideShare a Scribd company logo
1 of 16
k-means Clustering
Papers We Love
Bucharest Chapter
February 22, 2016
I am Adrian Florea
Architect Lead @ IBM Bucharest Software Lab
Hello!
What is clustering?
◉branch of Machine Learning (unsupervised learning i.e. find
underlying patterns in the data)
◉method for identifying data items that closely resemble one
another, assembling them into clusters [ODS]
Clustering applications
◉Customer segmentation
Clustering applications
◉Google News
Definitions and notations
◉data to be clustered (N data points)
◉iteratively refined clustering solution
◉cluster membership vector
◉closeness cost function
◉Minkowski distance
p=1 (Manhattan), p=2 (Euclidian)
Voronoi diagrams
◉Euclidian distance ◉Manhattan distance
K-means algorithm
◉input: dataset D, number of clusters k
◉output: cluster solution C, cluster membership m
◉initialize C by randomly choose k data points sets from D
◉repeat
reasign points in D to closest cluster mean
update m
update C such that is mean of points in cluster,
◉until convergence of KM(D, C)
K-means iterations example
N=12, k=2
randomly
choose
2 clusters
calculate
cluster
means
reassign points to
closest cluster mean
update
cluster
means
loop if needed
Poor initialization/Poor clustering
Initial (I) Final (I) Initial (II) Final (II)
Initial (III) Final (III)
Pros
◉simple
◉common in practice
◉easy to adapt
◉relatively fast
◉sensitive to initial partitions
◉optimal k is difficult to choose
◉restricted to data which has
the notion of a center
◉cannot handle well non-
convex clusters
◉does not identify outliers
(because mean is not a “robust”
statistic function)
Cons
Tips & tricks
◉run the algorithm multiple times with different initial centroids
and select the best result
◉if possible, use the knowledge about the dataset in choosing k
◉trying several k and choosing the one based on closeness cost
function is naive
◉use a more appropriate distance measure for the dataset
◉use k-means together with another algorithm
◉Exploit the triangular inequality to avoid to compare each data
point with all centroids
Tips & tricks - continued
◉recursively split the least compact cluster in 2
clusters by using 2-means
◉remove outliers in preprocessing (anomaly
detection)
◉eliminate small clusters or merge close clusters
at postprocessing
◉in case of empty clusters reinitialize their
centroids or steal points from the largest cluster
Tools & frameworks
◉Frontline Systems XLMiner (Excel Add-in)
◉SciPy K-means Clustering and Vector
Quantization Modules (scipy.cluster.vq.kmeans)
◉stats package in R
◉Azure Machine Learning
Bibliography
◉ H.-H. Bock, "Origins and extensions of the k-means algorithm in cluster analysis", JEHPS, Vol.
4, No. 2, December 2008
◉ D. Chappell, “Understanding Machine Learning”, PluralSight.com, 4 February 2016
◉ J. Ghosh, A. Liu, "K-Means", pp. 21-35 in X. Wu, V. Kumar (Eds.), "The Top Ten Algorithms in
Data Mining", Chapman & Hall/CRC, 2009
◉ G.J. Hamerly, "Learning structure and concepts in data through data clustering", PhD Thesis,
University of California, San Diego, 2003
◉ J. Han, M. Kamber, J. Pei, "Chapter 10. Cluster Analysis: Basic Concepts and Methods" in "Data
Mining: Concepts and Techniques", Morgan Kaufmann, 2011
◉ S.P. Lloyd, "Least Squares Quantization in PCM", IEEE Transactions on Information Theory, Vol.
28, No. 2, 1982
◉ J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations”, 5th
Berkeley Symposium, 1967
◉ J. McCaffrey, "Machine Learning Using C# Succinctly", Syncfusion, 2014
◉ [ODS] G. Upton, I. Cook, “A Dictionary of Statistics”, 3rd Ed., Oxford University Press, 2014
◉ ***, “k-means clustering”, Wikipedia
◉ ***, “Voronoi diagram”, Wikipedia
Any questions ?
You can find me at
Thanks!

More Related Content

What's hot

Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering108kaushik
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering TypesAshwin Shenoy M
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 

What's hot (20)

Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Kmeans
KmeansKmeans
Kmeans
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
K means clustering
K means clusteringK means clustering
K means clustering
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Data clustering
Data clustering Data clustering
Data clustering
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Clique
Clique Clique
Clique
 

Viewers also liked

Melanoma Image Segmentation using Self Organized Features Maps
Melanoma Image Segmentation using Self Organized Features MapsMelanoma Image Segmentation using Self Organized Features Maps
Melanoma Image Segmentation using Self Organized Features MapsMunnangi Anirudh
 
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...CSCJournals
 
Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...Timo Honkela
 
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...Timo Honkela
 
Timo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining PerspectivesTimo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining PerspectivesTimo Honkela
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...Daksh Raj Chopra
 
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IINeural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IIEdgar Carrillo
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)Mostafa G. M. Mostafa
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthbiinoida
 
A Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster AnalysisA Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster AnalysisIterative Path
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail MarketingJonathan Sedar
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin Rshanelynn
 

Viewers also liked (20)

Melanoma Image Segmentation using Self Organized Features Maps
Melanoma Image Segmentation using Self Organized Features MapsMelanoma Image Segmentation using Self Organized Features Maps
Melanoma Image Segmentation using Self Organized Features Maps
 
Systolic array
Systolic arraySystolic array
Systolic array
 
Lec2 final
Lec2 finalLec2 final
Lec2 final
 
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
 
Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...
 
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
 
Timo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining PerspectivesTimo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
 
Self Organizing Maps
Self Organizing MapsSelf Organizing Maps
Self Organizing Maps
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
 
Ppt
PptPpt
Ppt
 
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IINeural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 month
 
A Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster AnalysisA Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster Analysis
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 
XL-MINER: Data Exploration
XL-MINER: Data ExplorationXL-MINER: Data Exploration
XL-MINER: Data Exploration
 
Introduction To XL-Miner
Introduction To XL-MinerIntroduction To XL-Miner
Introduction To XL-Miner
 
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin RSelf-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
Self-Organising Maps for Customer Segmentation using R - Shane Lynn - Dublin R
 

Similar to "k-means-clustering" presentation @ Papers We Love Bucharest

DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptHathiramN1
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningssuserb02eff
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...Madan Golla
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel basedIJITCA Journal
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxAnupama Kate
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 

Similar to "k-means-clustering" presentation @ Papers We Love Bucharest (20)

DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.ppt
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learning
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 

Recently uploaded

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

"k-means-clustering" presentation @ Papers We Love Bucharest

  • 1. k-means Clustering Papers We Love Bucharest Chapter February 22, 2016
  • 2. I am Adrian Florea Architect Lead @ IBM Bucharest Software Lab Hello!
  • 3. What is clustering? ◉branch of Machine Learning (unsupervised learning i.e. find underlying patterns in the data) ◉method for identifying data items that closely resemble one another, assembling them into clusters [ODS]
  • 6. Definitions and notations ◉data to be clustered (N data points) ◉iteratively refined clustering solution ◉cluster membership vector ◉closeness cost function ◉Minkowski distance p=1 (Manhattan), p=2 (Euclidian)
  • 8. K-means algorithm ◉input: dataset D, number of clusters k ◉output: cluster solution C, cluster membership m ◉initialize C by randomly choose k data points sets from D ◉repeat reasign points in D to closest cluster mean update m update C such that is mean of points in cluster, ◉until convergence of KM(D, C)
  • 9. K-means iterations example N=12, k=2 randomly choose 2 clusters calculate cluster means reassign points to closest cluster mean update cluster means loop if needed
  • 10. Poor initialization/Poor clustering Initial (I) Final (I) Initial (II) Final (II) Initial (III) Final (III)
  • 11. Pros ◉simple ◉common in practice ◉easy to adapt ◉relatively fast ◉sensitive to initial partitions ◉optimal k is difficult to choose ◉restricted to data which has the notion of a center ◉cannot handle well non- convex clusters ◉does not identify outliers (because mean is not a “robust” statistic function) Cons
  • 12. Tips & tricks ◉run the algorithm multiple times with different initial centroids and select the best result ◉if possible, use the knowledge about the dataset in choosing k ◉trying several k and choosing the one based on closeness cost function is naive ◉use a more appropriate distance measure for the dataset ◉use k-means together with another algorithm ◉Exploit the triangular inequality to avoid to compare each data point with all centroids
  • 13. Tips & tricks - continued ◉recursively split the least compact cluster in 2 clusters by using 2-means ◉remove outliers in preprocessing (anomaly detection) ◉eliminate small clusters or merge close clusters at postprocessing ◉in case of empty clusters reinitialize their centroids or steal points from the largest cluster
  • 14. Tools & frameworks ◉Frontline Systems XLMiner (Excel Add-in) ◉SciPy K-means Clustering and Vector Quantization Modules (scipy.cluster.vq.kmeans) ◉stats package in R ◉Azure Machine Learning
  • 15. Bibliography ◉ H.-H. Bock, "Origins and extensions of the k-means algorithm in cluster analysis", JEHPS, Vol. 4, No. 2, December 2008 ◉ D. Chappell, “Understanding Machine Learning”, PluralSight.com, 4 February 2016 ◉ J. Ghosh, A. Liu, "K-Means", pp. 21-35 in X. Wu, V. Kumar (Eds.), "The Top Ten Algorithms in Data Mining", Chapman & Hall/CRC, 2009 ◉ G.J. Hamerly, "Learning structure and concepts in data through data clustering", PhD Thesis, University of California, San Diego, 2003 ◉ J. Han, M. Kamber, J. Pei, "Chapter 10. Cluster Analysis: Basic Concepts and Methods" in "Data Mining: Concepts and Techniques", Morgan Kaufmann, 2011 ◉ S.P. Lloyd, "Least Squares Quantization in PCM", IEEE Transactions on Information Theory, Vol. 28, No. 2, 1982 ◉ J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations”, 5th Berkeley Symposium, 1967 ◉ J. McCaffrey, "Machine Learning Using C# Succinctly", Syncfusion, 2014 ◉ [ODS] G. Upton, I. Cook, “A Dictionary of Statistics”, 3rd Ed., Oxford University Press, 2014 ◉ ***, “k-means clustering”, Wikipedia ◉ ***, “Voronoi diagram”, Wikipedia
  • 16. Any questions ? You can find me at Thanks!

Editor's Notes

  1. - partition the customer database into several groups, where customers in each group have similar purchasing history and demographics