SlideShare a Scribd company logo
1 of 25
Data Science
CSE-4075
Unsupervised Learning
(Hierarchical Clustering)
Unsupervised Learning
• Clustering
– Unsupervised classification, that is, without the
class attribute
– Want to discover the classes
• Association Rule Discovery
– Discover correlation
• Hard vs Fuzzy
– Hard clustering assigns each instance to one cluster
– fuzzy clustering assigns degree of membership
• Monothetic vs Polythetic
– Polythetic: all attributes are used simultaneously, e.g., to calculate
distance (most algorithms)
– Monothetic: attributes are considered one at a time
• Incremental vs Non-Incremental
– With large data sets it may be necessary to consider only part of the
data at a time (data mining)
– Incremental works instance by instance
Technique Characteristics
Hierarchical Clustering
• Agglomerative vs Divisive
– Agglomerative: each instance is its own cluster and the
algorithm merges clusters
– Divisive: begins with all instances in one cluster and divides
it up
Agglomerative Divisive
A hierarchical clustering is a set of nested clusters that are organized as a tree.
Dendrogram
• A tree that shows how clusters are
merged/split hierarchically
• Each node on the tree is a cluster; each leaf
node is a singleton cluster
Dendrogram
• A clustering of the data objects is obtained by
cutting the dendrogram at the desired level,
then each connected component forms a
cluster
Agglomerative Clustering Algorithm
• More popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Compute the distance matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the distance matrix
6. Until only a single cluster remains
• Key operation is the computation of the distance between
two clusters
• Different approaches to defining the distance between clusters
distinguish the different algorithms
Starting Situation
Start with clusters of individual points and a distance matrix
Intermediate Situation
• After some merging steps, we have some clusters
• Choose two clusters that has the smallest distance (largest
similarity) to merge
Intermediate Situation
• We want to merge the two closest clusters (C2 and C5) and update
the distance matrix.
After Merging
The question is “How do we update the distance matrix?”
How to Define Inter-Cluster Distance
• Single link method (Min)
• Complete link method (Max)
• Average link (group Average)
• Centroid method (Distance between centriods)
13
Single link method (Min)
• The distance between two clusters is represented
by the distance of the closest pair of data objects
belonging to different clusters.
• Determined by one pair of points, i.e., by one link
in the proximity graph
14
Single link method (Min)
15
Single link method (Min)
Can handle non-elliptical shapes
16
Single link method (Min)
Sensitive to noise and outliers
Complete link method (Max)
• The distance between two clusters is represented by
the distance of the farthest pair of data objects
belonging to different clusters
Complete link method (Max)
Complete link method (Max)
Less susceptible to noise and outliers
Complete link method (Max)
Tends to break large clusters
Complete link method (Max)
Biased towards globular clusters
Average link (Group Average)
• The distance between two clusters is represented by the average
distance of all pairs of data objects belonging to different clusters
• Determined by all pairs of points in the two clusters
Average link (Group Average)
Centroid method (Distance between centroids)
• The distance between two clusters is represented by the
distance between the centers of the clusters
• Determined by cluster centroids
Comparison

More Related Content

What's hot

Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering Ashek Farabi
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringChakrit Phain
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 

What's hot (20)

Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Random forest
Random forestRandom forest
Random forest
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 

Similar to Hierarchical clustering.pptx

Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptxssusere1fd42
 
Hierarchical clustering machine learning by arpit_sharma
Hierarchical clustering  machine learning by arpit_sharmaHierarchical clustering  machine learning by arpit_sharma
Hierarchical clustering machine learning by arpit_sharmaEr. Arpit Sharma
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfp_manimozhi
 
clustering-151017180103-lva1-app6892 (1).pdf
clustering-151017180103-lva1-app6892 (1).pdfclustering-151017180103-lva1-app6892 (1).pdf
clustering-151017180103-lva1-app6892 (1).pdfprasad761467
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining TechniquesSulman Ahmed
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousingSwetha544947
 
clustering ppt.pptx
clustering ppt.pptxclustering ppt.pptx
clustering ppt.pptxchmeghana1
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptxssuser2e437f
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.pptssuser6b3336
 

Similar to Hierarchical clustering.pptx (20)

Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
Hierarchical clustering machine learning by arpit_sharma
Hierarchical clustering  machine learning by arpit_sharmaHierarchical clustering  machine learning by arpit_sharma
Hierarchical clustering machine learning by arpit_sharma
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdf
 
clustering-151017180103-lva1-app6892 (1).pdf
clustering-151017180103-lva1-app6892 (1).pdfclustering-151017180103-lva1-app6892 (1).pdf
clustering-151017180103-lva1-app6892 (1).pdf
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
 
Data Mining Lecture_7.pptx
Data Mining Lecture_7.pptxData Mining Lecture_7.pptx
Data Mining Lecture_7.pptx
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousing
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
clustering ppt.pptx
clustering ppt.pptxclustering ppt.pptx
clustering ppt.pptx
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Machine Learning - Clustering
Machine Learning - ClusteringMachine Learning - Clustering
Machine Learning - Clustering
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptx
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.ppt
 
Cluster Analysis.pptx
Cluster Analysis.pptxCluster Analysis.pptx
Cluster Analysis.pptx
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

Recently uploaded

Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Celine George
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfNirmal Dwivedi
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
What is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxWhat is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxCeline George
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesSHIVANANDaRV
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 

Recently uploaded (20)

Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
What is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxWhat is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptx
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food Additives
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 

Hierarchical clustering.pptx

  • 2. Unsupervised Learning • Clustering – Unsupervised classification, that is, without the class attribute – Want to discover the classes • Association Rule Discovery – Discover correlation
  • 3. • Hard vs Fuzzy – Hard clustering assigns each instance to one cluster – fuzzy clustering assigns degree of membership • Monothetic vs Polythetic – Polythetic: all attributes are used simultaneously, e.g., to calculate distance (most algorithms) – Monothetic: attributes are considered one at a time • Incremental vs Non-Incremental – With large data sets it may be necessary to consider only part of the data at a time (data mining) – Incremental works instance by instance Technique Characteristics
  • 4. Hierarchical Clustering • Agglomerative vs Divisive – Agglomerative: each instance is its own cluster and the algorithm merges clusters – Divisive: begins with all instances in one cluster and divides it up Agglomerative Divisive A hierarchical clustering is a set of nested clusters that are organized as a tree.
  • 5. Dendrogram • A tree that shows how clusters are merged/split hierarchically • Each node on the tree is a cluster; each leaf node is a singleton cluster
  • 6. Dendrogram • A clustering of the data objects is obtained by cutting the dendrogram at the desired level, then each connected component forms a cluster
  • 7. Agglomerative Clustering Algorithm • More popular hierarchical clustering technique • Basic algorithm is straightforward 1. Compute the distance matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the distance matrix 6. Until only a single cluster remains • Key operation is the computation of the distance between two clusters • Different approaches to defining the distance between clusters distinguish the different algorithms
  • 8. Starting Situation Start with clusters of individual points and a distance matrix
  • 9. Intermediate Situation • After some merging steps, we have some clusters • Choose two clusters that has the smallest distance (largest similarity) to merge
  • 10. Intermediate Situation • We want to merge the two closest clusters (C2 and C5) and update the distance matrix.
  • 11. After Merging The question is “How do we update the distance matrix?”
  • 12. How to Define Inter-Cluster Distance • Single link method (Min) • Complete link method (Max) • Average link (group Average) • Centroid method (Distance between centriods)
  • 13. 13 Single link method (Min) • The distance between two clusters is represented by the distance of the closest pair of data objects belonging to different clusters. • Determined by one pair of points, i.e., by one link in the proximity graph
  • 15. 15 Single link method (Min) Can handle non-elliptical shapes
  • 16. 16 Single link method (Min) Sensitive to noise and outliers
  • 17. Complete link method (Max) • The distance between two clusters is represented by the distance of the farthest pair of data objects belonging to different clusters
  • 19. Complete link method (Max) Less susceptible to noise and outliers
  • 20. Complete link method (Max) Tends to break large clusters
  • 21. Complete link method (Max) Biased towards globular clusters
  • 22. Average link (Group Average) • The distance between two clusters is represented by the average distance of all pairs of data objects belonging to different clusters • Determined by all pairs of points in the two clusters
  • 24. Centroid method (Distance between centroids) • The distance between two clusters is represented by the distance between the centers of the clusters • Determined by cluster centroids