Hierarchical clustering.pptx

•Download as PPTX, PDF•

0 likes•663 views

This document discusses hierarchical clustering, an unsupervised learning technique. It describes different types of hierarchical clustering including agglomerative versus divisive approaches. It also discusses dendrograms, which show how clusters are merged or split hierarchically. The document focuses on the agglomerative clustering algorithm and different methods for defining the distance between clusters when they are merged, including single link, complete link, average link, and centroid methods.

Education

Data Science
CSE-4075
Unsupervised Learning
(Hierarchical Clustering)

Unsupervised Learning
• Clustering
– Unsupervised classification, that is, without the
class attribute
– Want to discover the classes
• Association Rule Discovery
– Discover correlation

• Hard vs Fuzzy
– Hard clustering assigns each instance to one cluster
– fuzzy clustering assigns degree of membership
• Monothetic vs Polythetic
– Polythetic: all attributes are used simultaneously, e.g., to calculate
distance (most algorithms)
– Monothetic: attributes are considered one at a time
• Incremental vs Non-Incremental
– With large data sets it may be necessary to consider only part of the
data at a time (data mining)
– Incremental works instance by instance
Technique Characteristics

Hierarchical Clustering
• Agglomerative vs Divisive
– Agglomerative: each instance is its own cluster and the
algorithm merges clusters
– Divisive: begins with all instances in one cluster and divides
it up
Agglomerative Divisive
A hierarchical clustering is a set of nested clusters that are organized as a tree.

Dendrogram
• A tree that shows how clusters are
merged/split hierarchically
• Each node on the tree is a cluster; each leaf
node is a singleton cluster

Dendrogram
• A clustering of the data objects is obtained by
cutting the dendrogram at the desired level,
then each connected component forms a
cluster

Agglomerative Clustering Algorithm
• More popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Compute the distance matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the distance matrix
6. Until only a single cluster remains
• Key operation is the computation of the distance between
two clusters
• Different approaches to defining the distance between clusters
distinguish the different algorithms

Starting Situation
Start with clusters of individual points and a distance matrix

Intermediate Situation
• After some merging steps, we have some clusters
• Choose two clusters that has the smallest distance (largest
similarity) to merge

Intermediate Situation
• We want to merge the two closest clusters (C2 and C5) and update
the distance matrix.

After Merging
The question is “How do we update the distance matrix?”

How to Define Inter-Cluster Distance
• Single link method (Min)
• Complete link method (Max)
• Average link (group Average)
• Centroid method (Distance between centriods)

13
Single link method (Min)
• The distance between two clusters is represented
by the distance of the closest pair of data objects
belonging to different clusters.
• Determined by one pair of points, i.e., by one link
in the proximity graph

15
Single link method (Min)
Can handle non-elliptical shapes

16
Single link method (Min)
Sensitive to noise and outliers

Complete link method (Max)
• The distance between two clusters is represented by
the distance of the farthest pair of data objects
belonging to different clusters

Complete link method (Max)
Less susceptible to noise and outliers

Complete link method (Max)
Tends to break large clusters

Complete link method (Max)
Biased towards globular clusters

Average link (Group Average)
• The distance between two clusters is represented by the average
distance of all pairs of data objects belonging to different clusters
• Determined by all pairs of points in the two clusters

Centroid method (Distance between centroids)
• The distance between two clusters is represented by the
distance between the centers of the clusters
• Determined by cluster centroids

What's hot

Clusters techniquesrajshreemuthiah

K means Clustering AlgorithmKasun Ranga Wijeweera

Understanding Bagging and BoostingMohit Rajput

Random forestUjjawal

2.3 bayesian classificationKrish_ver2

3.2 partitioning methodsKrish_ver2

Dbscan algorithomMahbubur Rahman Shimul

Cluster analysisKamalakshi Deshmukh-Samag

Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony

Classification techniques in data miningKamal Acharya

Introduction to Clustering algorithmhadifar

5.3 mining sequential patternsKrish_ver2

Classification in data mining Sulman Ahmed

Density Based ClusteringSSA KPI

Hierarchical clustering Ashek Farabi

Clustering in data Mining (Data Mining)Mustafa Sherazi

3.3 hierarchical methodsKrish_ver2

Feature selectionDong Guo

Hierarchical clusteringChakrit Phain

Chap8 basic cluster_analysisguru_prasadg

What's hot (20)

Clusters techniques

K means Clustering Algorithm

Understanding Bagging and Boosting

Random forest

2.3 bayesian classification

3.2 partitioning methods

Dbscan algorithom

Cluster analysis

Classification Based Machine Learning Algorithms

Classification techniques in data mining

Introduction to Clustering algorithm

5.3 mining sequential patterns

Classification in data mining

Density Based Clustering

Hierarchical clustering

Clustering in data Mining (Data Mining)

3.3 hierarchical methods

Feature selection

Hierarchical clustering

Chap8 basic cluster_analysis

Similar to Hierarchical clustering.pptx

Unsupervised learning Modi.pptxssusere1fd42

ClusteryanamNagasuri Bala Venkateswarlu

Hierarchical clustering machine learning by arpit_sharmaEr. Arpit Sharma

Hierachical clusteringTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

clustering_hierarchical ckustering notes.pdfp_manimozhi

clustering-151017180103-lva1-app6892 (1).pdfprasad761467

Data mining TechniquesSulman Ahmed

Data Mining Lecture_7.pptxSubrata Kumer Paul

Data mining and warehousingSwetha544947

Clustering on DSSEnaam Alotaibi

clustering ppt.pptxchmeghana1

Clustering in Data MiningArchana Swaminathan

Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya

26-Clustering MTech-2017.pptvikassingh569137

Machine Learning - ClusteringDarío Garigliotti

Unsupervised learning (clustering)Pravinkumar Landge

clustering and distance metrics.pptxssuser2e437f

Cluster_saumitra.pptssuser6b3336

Cluster Analysis.pptxAdityaRajput317826

Cluster analysisPushkar Mishra

Similar to Hierarchical clustering.pptx (20)

Unsupervised learning Modi.pptx

Clusteryanam

Hierarchical clustering machine learning by arpit_sharma

Hierachical clustering

clustering_hierarchical ckustering notes.pdf

clustering-151017180103-lva1-app6892 (1).pdf

Data mining Techniques

Data Mining Lecture_7.pptx

Data mining and warehousing

Clustering on DSS

clustering ppt.pptx

Clustering in Data Mining

Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...

26-Clustering MTech-2017.ppt

Machine Learning - Clustering

Unsupervised learning (clustering)

clustering and distance metrics.pptx

Cluster_saumitra.ppt

Cluster Analysis.pptx

Cluster analysis

Recently uploaded

Model Attribute _rec_name in the Odoo 17Celine George

FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU

HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva

UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfNirmal Dwivedi

Jamworks pilot and AI at Jisc (20/03/2024)Jisc

How to Add New Custom Addons Path in Odoo 17Celine George

Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva

Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma

What is 3 Way Matching Process in Odoo 17.pptxCeline George

Our Environment Class 10 Science Notes pdfVivekanand Anglo Vedic Academy

Economic Importance Of Fungi In Food AdditivesSHIVANANDaRV

REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda

How to setup Pycharm environment for Odoo 17.pptxCeline George

On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection

Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva

Simple, Complex, and Compound Sentences Exercises.pdfstareducators107

Recently uploaded (20)

Model Attribute _rec_name in the Odoo 17

FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...

HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...

UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf

Jamworks pilot and AI at Jisc (20/03/2024)

How to Add New Custom Addons Path in Odoo 17

Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx

Python Notes for mca i year students osmania university.docx

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

What is 3 Way Matching Process in Odoo 17.pptx

Our Environment Class 10 Science Notes pdf

Economic Importance Of Fungi In Food Additives

REMIFENTANIL: An Ultra short acting opioid.pptx

How to setup Pycharm environment for Odoo 17.pptx

On National Teacher Day, meet the 2024-25 Kenan Fellows

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...

Interdisciplinary_Insights_Data_Collection_Methods.pptx

Simple, Complex, and Compound Sentences Exercises.pdf

Hierarchical clustering.pptx

1. Data Science CSE-4075 Unsupervised Learning (Hierarchical Clustering)

2. Unsupervised Learning • Clustering – Unsupervised classification, that is, without the class attribute – Want to discover the classes • Association Rule Discovery – Discover correlation

3. • Hard vs Fuzzy – Hard clustering assigns each instance to one cluster – fuzzy clustering assigns degree of membership • Monothetic vs Polythetic – Polythetic: all attributes are used simultaneously, e.g., to calculate distance (most algorithms) – Monothetic: attributes are considered one at a time • Incremental vs Non-Incremental – With large data sets it may be necessary to consider only part of the data at a time (data mining) – Incremental works instance by instance Technique Characteristics

4. Hierarchical Clustering • Agglomerative vs Divisive – Agglomerative: each instance is its own cluster and the algorithm merges clusters – Divisive: begins with all instances in one cluster and divides it up Agglomerative Divisive A hierarchical clustering is a set of nested clusters that are organized as a tree.

5. Dendrogram • A tree that shows how clusters are merged/split hierarchically • Each node on the tree is a cluster; each leaf node is a singleton cluster

6. Dendrogram • A clustering of the data objects is obtained by cutting the dendrogram at the desired level, then each connected component forms a cluster

7. Agglomerative Clustering Algorithm • More popular hierarchical clustering technique • Basic algorithm is straightforward 1. Compute the distance matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the distance matrix 6. Until only a single cluster remains • Key operation is the computation of the distance between two clusters • Different approaches to defining the distance between clusters distinguish the different algorithms

8. Starting Situation Start with clusters of individual points and a distance matrix

9. Intermediate Situation • After some merging steps, we have some clusters • Choose two clusters that has the smallest distance (largest similarity) to merge

10. Intermediate Situation • We want to merge the two closest clusters (C2 and C5) and update the distance matrix.

11. After Merging The question is “How do we update the distance matrix?”

12. How to Define Inter-Cluster Distance • Single link method (Min) • Complete link method (Max) • Average link (group Average) • Centroid method (Distance between centriods)

13. 13 Single link method (Min) • The distance between two clusters is represented by the distance of the closest pair of data objects belonging to different clusters. • Determined by one pair of points, i.e., by one link in the proximity graph

14. 14 Single link method (Min)

15. 15 Single link method (Min) Can handle non-elliptical shapes

16. 16 Single link method (Min) Sensitive to noise and outliers

17. Complete link method (Max) • The distance between two clusters is represented by the distance of the farthest pair of data objects belonging to different clusters

18. Complete link method (Max)

19. Complete link method (Max) Less susceptible to noise and outliers

20. Complete link method (Max) Tends to break large clusters

21. Complete link method (Max) Biased towards globular clusters

22. Average link (Group Average) • The distance between two clusters is represented by the average distance of all pairs of data objects belonging to different clusters • Determined by all pairs of points in the two clusters

23. Average link (Group Average)

24. Centroid method (Distance between centroids) • The distance between two clusters is represented by the distance between the centers of the clusters • Determined by cluster centroids

25. Comparison

Hierarchical clustering.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hierarchical clustering.pptx

Similar to Hierarchical clustering.pptx (20)

Recently uploaded

Recently uploaded (20)

Hierarchical clustering.pptx