SlideShare a Scribd company logo
1 of 19
Cluster Analysis Advanced Quantitative Data Analysis
Supervised and Unsupervised Learning Logistic regression and Fisher’s LDA and QDA are examples of supervised learning. This means that there is a ‘training set’ which contains known classifications into groups that can be used to derive a classification rule. This can be then evaluated on a ‘test set’, or this can be done repeatedly using cross validation.
Unsupervised Learning Unsupervised learning means (in this instance) that we are trying to discover a division of objects into classes without any training set of known classes, without knowing in advance what the classes are, or even how many classes there are. ‘Cluster analysis’, or simply ‘clustering’ is a collection of methods for unsupervised class discovery
Distance Measures It turns out that the most crucial decision to make in choosing a clustering method is defining what it means for two vectors to be close or far. There are other components to the choice, but these are all secondary Often the distance measure is implicit in the choice of method, but a wise decision maker knows what he/she is choosing.
A true distance, or metric, is a function defined on pairs of objects that satisfies a number of properties: D(x,y) = D(y,x)  D(x,y) ≥ 0 D(x,y) = 0  x = y  D(x,y) + D(y,z) ≥ D(x,z) (triangle inequality) The classic example of a metric is Euclidean distance. If x = (x1,x2,…xp), and y=(y1,y2,…yp) , are vectors, the Euclidean distance is [(x1-y1)2+(xp-yp)2]
Euclidean Distance y = (y1,y2) D(x,y) |x2-y2| |x1-y1| x = (x1,x2)
Mahalanobis Distance Mahalanobis distance is a kind of weighted Euclidean distance It produces distance contours of the same shape as a data distribution It is often more appropriate than Euclidean distance
Agglomerative Hierarchical Clustering We start with all data items as individuals In step 1, we join the two closest individuals In each subsequent step, we join the two closest individuals or clusters This requires defining the distance between two groups as a number that can be compared to the distance between individuals We can use the R command hclust
Group Distances Complete link clustering defines the distance between two groups as the maximum distance between any element of one group and any of the other Single link clustering defines the distance between two groups as the minimum distance between any element of one group and any of the other Average link clustering defines the distance between two groups as the mean distance between elements of one group and elements of the other
A graphical example Single linkage Complete linkage Average linkage
iris.d<- dist(iris[,1:4]) iris.hc<- hclust(iris.d) plot(iris.hc)
Divisive Clustering Divisive clustering begins with the whole data set as a cluster, and considers dividing it into k clusters. Usually this is done to optimize some criterion such as the ratio of the within cluster variation to the between cluster variation The choice of k is important
K-means is a widely used divisive algorithm (R command kmeans) Its major weakness is that it uses Euclidean distance
iris.km <- kmeans(iris[,1:4],3) plot(prcomp(iris[,1:4])$x,col=iris.km$cluster) table(iris.km$cluster,iris[,5])     setosa versicolor virginica   1  0     48         14          2  0      2         36          3 50      0          0
heatmap(as.matrix(swiss))
Multidimensional Scaling mds<- isoMDS(dist(USArrests)) plot(mds$points, type = "n ") text(mds$points, labels = as.character(rownames(USArrests)))

More Related Content

What's hot

Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering TypesSuryakumar Thangarasu
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy dataVivek Gandhi
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Program_Cluster_Analysis
Program_Cluster_AnalysisProgram_Cluster_Analysis
Program_Cluster_AnalysisSammya Sengupta
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with RAkira Murakami
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...Raed Aldahdooh
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)snegacmr
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingAmuthamca
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7Birat Sharma
 

What's hot (20)

Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering Types
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy data
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Program_Cluster_Analysis
Program_Cluster_AnalysisProgram_Cluster_Analysis
Program_Cluster_Analysis
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
08 clustering
08 clustering08 clustering
08 clustering
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Lect4
Lect4Lect4
Lect4
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)Discretization and concept hierarchy(os)
Discretization and concept hierarchy(os)
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7
 

Similar to Clustering

An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster AnalysisSSA KPI
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentssriharipatilin
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.pptsatvikpatil5
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxSyed Ejaz
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdfLibya Thomas
 

Similar to Clustering (20)

An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Clustering
ClusteringClustering
Clustering
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
PR07.pdf
PR07.pdfPR07.pdf
PR07.pdf
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
similarities-knn.pptx
similarities-knn.pptxsimilarities-knn.pptx
similarities-knn.pptx
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
 
Cs345 cl
Cs345 clCs345 cl
Cs345 cl
 
Classifiers
ClassifiersClassifiers
Classifiers
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.ppt
 
Clustering
ClusteringClustering
Clustering
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdf
 

More from Alberto Labarga

El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017Alberto Labarga
 
Shokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanShokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanAlberto Labarga
 
Genome visualization challenges
Genome visualization challengesGenome visualization challenges
Genome visualization challengesAlberto Labarga
 
SocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaSocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaAlberto Labarga
 
Hacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetHacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetAlberto Labarga
 
hacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligentehacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligenteAlberto Labarga
 
Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Alberto Labarga
 
Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Alberto Labarga
 
Arduino: Control de motores
Arduino: Control de motoresArduino: Control de motores
Arduino: Control de motoresAlberto Labarga
 
Entrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoEntrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoAlberto Labarga
 
Práctica con Arduino: Simon Dice
Práctica con Arduino: Simon DicePráctica con Arduino: Simon Dice
Práctica con Arduino: Simon DiceAlberto Labarga
 
Entrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoEntrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoAlberto Labarga
 
Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Alberto Labarga
 
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Alberto Labarga
 
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Alberto Labarga
 
Introducción a la impresión 3D
Introducción a la impresión 3DIntroducción a la impresión 3D
Introducción a la impresión 3DAlberto Labarga
 

More from Alberto Labarga (20)

El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017El Salto Communities - EditorsLab 2017
El Salto Communities - EditorsLab 2017
 
Shokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob DylanShokesu - Premio Nobel de Literatura a Bob Dylan
Shokesu - Premio Nobel de Literatura a Bob Dylan
 
Genome visualization challenges
Genome visualization challengesGenome visualization challenges
Genome visualization challenges
 
SocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativaSocialLearning: descubriendo contenidos educativos de manera colaborativa
SocialLearning: descubriendo contenidos educativos de manera colaborativa
 
Hacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin StreetHacksanfermin 2015 :: Dropcoin Street
Hacksanfermin 2015 :: Dropcoin Street
 
hacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligentehacksanfermin 2015 :: Parking inteligente
hacksanfermin 2015 :: Parking inteligente
 
jpd5 big data
jpd5 big datajpd5 big data
jpd5 big data
 
Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015Vidas Contadas :: Visualizar 2015
Vidas Contadas :: Visualizar 2015
 
Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9Periodismo de datos y visualización de datos abiertos #siglibre9
Periodismo de datos y visualización de datos abiertos #siglibre9
 
myHealthHackmedicine
myHealthHackmedicinemyHealthHackmedicine
myHealthHackmedicine
 
Big Data y Salud
Big Data y SaludBig Data y Salud
Big Data y Salud
 
Arduino: Control de motores
Arduino: Control de motoresArduino: Control de motores
Arduino: Control de motores
 
Entrada/salida analógica con Arduino
Entrada/salida analógica con ArduinoEntrada/salida analógica con Arduino
Entrada/salida analógica con Arduino
 
Práctica con Arduino: Simon Dice
Práctica con Arduino: Simon DicePráctica con Arduino: Simon Dice
Práctica con Arduino: Simon Dice
 
Entrada/Salida digital con Arduino
Entrada/Salida digital con ArduinoEntrada/Salida digital con Arduino
Entrada/Salida digital con Arduino
 
Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014Presentación Laboratorio de Fabricación Digital UPNA 2014
Presentación Laboratorio de Fabricación Digital UPNA 2014
 
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
 
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
 
Introducción a la impresión 3D
Introducción a la impresión 3DIntroducción a la impresión 3D
Introducción a la impresión 3D
 
Vidas Contadas
Vidas ContadasVidas Contadas
Vidas Contadas
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Clustering

  • 1. Cluster Analysis Advanced Quantitative Data Analysis
  • 2. Supervised and Unsupervised Learning Logistic regression and Fisher’s LDA and QDA are examples of supervised learning. This means that there is a ‘training set’ which contains known classifications into groups that can be used to derive a classification rule. This can be then evaluated on a ‘test set’, or this can be done repeatedly using cross validation.
  • 3. Unsupervised Learning Unsupervised learning means (in this instance) that we are trying to discover a division of objects into classes without any training set of known classes, without knowing in advance what the classes are, or even how many classes there are. ‘Cluster analysis’, or simply ‘clustering’ is a collection of methods for unsupervised class discovery
  • 4. Distance Measures It turns out that the most crucial decision to make in choosing a clustering method is defining what it means for two vectors to be close or far. There are other components to the choice, but these are all secondary Often the distance measure is implicit in the choice of method, but a wise decision maker knows what he/she is choosing.
  • 5. A true distance, or metric, is a function defined on pairs of objects that satisfies a number of properties: D(x,y) = D(y,x) D(x,y) ≥ 0 D(x,y) = 0  x = y D(x,y) + D(y,z) ≥ D(x,z) (triangle inequality) The classic example of a metric is Euclidean distance. If x = (x1,x2,…xp), and y=(y1,y2,…yp) , are vectors, the Euclidean distance is [(x1-y1)2+(xp-yp)2]
  • 6. Euclidean Distance y = (y1,y2) D(x,y) |x2-y2| |x1-y1| x = (x1,x2)
  • 7. Mahalanobis Distance Mahalanobis distance is a kind of weighted Euclidean distance It produces distance contours of the same shape as a data distribution It is often more appropriate than Euclidean distance
  • 8.
  • 9.
  • 10.
  • 11. Agglomerative Hierarchical Clustering We start with all data items as individuals In step 1, we join the two closest individuals In each subsequent step, we join the two closest individuals or clusters This requires defining the distance between two groups as a number that can be compared to the distance between individuals We can use the R command hclust
  • 12. Group Distances Complete link clustering defines the distance between two groups as the maximum distance between any element of one group and any of the other Single link clustering defines the distance between two groups as the minimum distance between any element of one group and any of the other Average link clustering defines the distance between two groups as the mean distance between elements of one group and elements of the other
  • 13. A graphical example Single linkage Complete linkage Average linkage
  • 14. iris.d<- dist(iris[,1:4]) iris.hc<- hclust(iris.d) plot(iris.hc)
  • 15. Divisive Clustering Divisive clustering begins with the whole data set as a cluster, and considers dividing it into k clusters. Usually this is done to optimize some criterion such as the ratio of the within cluster variation to the between cluster variation The choice of k is important
  • 16. K-means is a widely used divisive algorithm (R command kmeans) Its major weakness is that it uses Euclidean distance
  • 17. iris.km <- kmeans(iris[,1:4],3) plot(prcomp(iris[,1:4])$x,col=iris.km$cluster) table(iris.km$cluster,iris[,5]) setosa versicolor virginica 1 0 48 14 2 0 2 36 3 50 0 0
  • 19. Multidimensional Scaling mds<- isoMDS(dist(USArrests)) plot(mds$points, type = "n ") text(mds$points, labels = as.character(rownames(USArrests)))