SlideShare a Scribd company logo
1 of 16
MACHINE LEARNING
JASON TSENG
INFORMATION ENGINEERING, KSU
UNSUPERVISED LEARNING-CLUSTERING
ALGORITHMS
• These methods are used to find similarity and/or features as well as the relationship
patterns among data samples, and then cluster those samples into groups having
similarity based on features, i.e. How are they grouped?
• Clustering is important because it determines the intrinsic grouping among the
present unlabeled data.
• They basically make some assumptions about data points to constitute their similarity.
Each assumption will construct differently but equally valid clusters.
TRADITIONAL CLUSTERING
ALGORITHMS
A Comprehensive Survey of Clustering Algorithms Xu, D. & Tian, Y. Ann. Data. Sci. (2015) 2: 165.
CLUSTERING ALGORITHMS-HIERARCHY
BASED
• The clusters forms a tree type structure based
on the hierarchy where new clusters are formed
using the previously formed one.
• It is divided into two category:
Agglomerative (bottom up approach)
Divisive (top down approach) .
• Examples CURE (Clustering Using
Representatives), BIRCH (Balanced Iterative
Reducing Clustering and using
Hierarchies) etc.
A vertebrate is an animal with a spinal cord surrounded
by cartilage or bone.
CATEGORIES OF HIERARCHICAL
ALGORITHMS
• Agglomerative hierarchical algorithms − each data point is
treated as a single cluster and then successively merge or
agglomerate (bottom-up approach) the pairs of clusters. The
hierarchy of the clusters is represented as a dendrogram or tree
structure.
• Divisive hierarchical algorithms − all the data points are treated
as one big cluster and the process of clustering involves
dividing (Top-down approach) the one big cluster into various
small clusters.
STEPS TO PERFORM AGGLOMERATIVE
HIERARCHICAL CLUSTERING
• The steps to perform the same is as follows −
• Step 1 − Treat each data point as single cluster. Hence, we will be
having, say K clusters at start. The number of data points will also be
K at start.
• Step 2
• Step 2.1 − a big cluster by joining two closet datapoints. This will result
in total of K-1 clusters.
• Step 2.2 − To form more clusters, we need to join two closet clusters.
This will result in total of K-2 clusters.
• Step 2.3 − To form one big cluster repeat the above three steps until K
would become 0 i.e. no more data points left to join.
• Step 3 − After making one single big cluster, dendrograms will be
used to divide into multiple clusters depending upon the problem.
1
2
3
4
5
6
7
8
9
10
(1,3,5,7,9) identify the two clusters that are closest together (Euclidean distance)
(2,4,6,8,10) merge the two most similar clusters.
The main output of Hierarchical Clustering is a
dendrogram
METRICS BETWEEN CLUSTERS
• Measures of distance (similarity): the distance between two clusters
• computed based on length of the straight line drawn from one cluster to another.
• This is commonly referred to as the Euclidean distance. Many other distance
metrics have been developed.
• Linkage Criteria: determine from where distance is computed.
• single-linkage : computed between the two most similar parts of a cluster
• complete-linkage: computed between the two least similar bits of a cluster
• mean or average-linkage : the center of the clusters
• some other criterion.
• Where there are no clear theoretical justifications for choice of linkage
criteria, Ward’s method is the sensible default. This method works out
which observations to group based on reducing the sum of squared
distances of each observation from the average observation in a cluster.
AGGLOMERATIVE VERSUS DIVISIVE
ALGORITHMS
• Hierarchical clustering typically works by sequentially merging similar
clusters, as shown above. This is known as agglomerative hierarchical
clustering (button-up).
• By initially grouping all the observations into one cluster, and then
successively splitting these clusters (top-down). This is known as divisive
hierarchical clustering. Divisive clustering is rarely done in practice.
WHAT ARE THE STRENGTHS AND
WEAKNESSES OF HIERARCHICAL
CLUSTERING?
• The strengths of hierarchical clustering are that it is easy to understand and
easy to do. There are four types of clustering algorithms in widespread
use: hierarchical clustering, k-means cluster analysis, latent class
analysis, and self-organizing maps. The math of hierarchical clustering is the
easiest to understand.
• The weaknesses are that it rarely provides the best solution, it involves lots of
arbitrary decisions, it does not work with missing data, it works poorly with
mixed data types, it does not work well on very large data sets, and its main
output, the dendrogram, is commonly misinterpreted.
ROLE OF DENDROGRAMS IN
AGGLOMERATIVE HIERARCHICAL
CLUSTERING
original datapoints distribution dendrograms of these datapoints
ROLE OF DENDROGRAMS IN
AGGLOMERATIVE HIERARCHICAL
CLUSTERING
 once the big cluster is formed, the longest vertical
distance is selected, which is then drawn through it.
 As the horizontal line crosses the blue line at two
points, the number of clusters would be two.
The above diagram shows the two
clusters from our datapoints.
DISCUSSION
 Basically the horizontal line is a
threshold, which defines the minimum
distance required to be a separate
cluster. If we draw a line further down,
the threshold required to be a new
cluster will be decreased and more
clusters will be formed as see in the
image right.
 In the plot, the horizontal line passes
through four vertical lines resulting in
four clusters: cluster of points 6,7,8, 10,
cluster of points 3,2,4, 1, points 9, and
5 will be treated as single point clusters.
EX. CLUSTERS OF THE DATA POINT IN PIMA
INDIAN DIABETES DATASET
Pima Indian Diabetes Dataset Prediction by Hierarchy-based algorithm
EX. CLUSTERS OF THE DATA POINT IN
SHOPPING TRENDS DATASET
Draw a horizontal line that passes through longest
vertical distances without a horizontal line, we get
5 clusters.
Original data set.
DISCUSSION
 The data points in the bottom right belong
to the customers with high salaries but low
spending. These are the customers that
spend their money carefully.
 The customers at top right (green data
points), these are the customers with high
salaries and high spending. These are the
type of customers that companies target.
 The customers in the middle (blue data
points) are the ones with average income
and average salaries. The highest numbers
of customers belong to this category.
Salary index
Spending
index

More Related Content

Similar to Unsupervised Learning-Clustering Algorithms.pptx

01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptxniawiya
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousingSwetha544947
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSandinoBerutu1
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptImXaib
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptxssuser2e437f
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
Hierarchical clustering machine learning by arpit_sharma
Hierarchical clustering  machine learning by arpit_sharmaHierarchical clustering  machine learning by arpit_sharma
Hierarchical clustering machine learning by arpit_sharmaEr. Arpit Sharma
 
K-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptxK-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptxJebaRaj26
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
Different Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptxDifferent Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptxAzad988896
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 

Similar to Unsupervised Learning-Clustering Algorithms.pptx (20)

Clustering
ClusteringClustering
Clustering
 
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousing
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptx
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Hierarchical clustering machine learning by arpit_sharma
Hierarchical clustering  machine learning by arpit_sharmaHierarchical clustering  machine learning by arpit_sharma
Hierarchical clustering machine learning by arpit_sharma
 
K-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptxK-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptx
 
Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Different Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptxDifferent Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptx
 
Data mining
Data miningData mining
Data mining
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 

Recently uploaded

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 

Recently uploaded (20)

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 

Unsupervised Learning-Clustering Algorithms.pptx

  • 2. UNSUPERVISED LEARNING-CLUSTERING ALGORITHMS • These methods are used to find similarity and/or features as well as the relationship patterns among data samples, and then cluster those samples into groups having similarity based on features, i.e. How are they grouped? • Clustering is important because it determines the intrinsic grouping among the present unlabeled data. • They basically make some assumptions about data points to constitute their similarity. Each assumption will construct differently but equally valid clusters.
  • 3. TRADITIONAL CLUSTERING ALGORITHMS A Comprehensive Survey of Clustering Algorithms Xu, D. & Tian, Y. Ann. Data. Sci. (2015) 2: 165.
  • 4. CLUSTERING ALGORITHMS-HIERARCHY BASED • The clusters forms a tree type structure based on the hierarchy where new clusters are formed using the previously formed one. • It is divided into two category: Agglomerative (bottom up approach) Divisive (top down approach) . • Examples CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies) etc. A vertebrate is an animal with a spinal cord surrounded by cartilage or bone.
  • 5. CATEGORIES OF HIERARCHICAL ALGORITHMS • Agglomerative hierarchical algorithms − each data point is treated as a single cluster and then successively merge or agglomerate (bottom-up approach) the pairs of clusters. The hierarchy of the clusters is represented as a dendrogram or tree structure. • Divisive hierarchical algorithms − all the data points are treated as one big cluster and the process of clustering involves dividing (Top-down approach) the one big cluster into various small clusters.
  • 6. STEPS TO PERFORM AGGLOMERATIVE HIERARCHICAL CLUSTERING • The steps to perform the same is as follows − • Step 1 − Treat each data point as single cluster. Hence, we will be having, say K clusters at start. The number of data points will also be K at start. • Step 2 • Step 2.1 − a big cluster by joining two closet datapoints. This will result in total of K-1 clusters. • Step 2.2 − To form more clusters, we need to join two closet clusters. This will result in total of K-2 clusters. • Step 2.3 − To form one big cluster repeat the above three steps until K would become 0 i.e. no more data points left to join. • Step 3 − After making one single big cluster, dendrograms will be used to divide into multiple clusters depending upon the problem.
  • 7. 1 2 3 4 5 6 7 8 9 10 (1,3,5,7,9) identify the two clusters that are closest together (Euclidean distance) (2,4,6,8,10) merge the two most similar clusters. The main output of Hierarchical Clustering is a dendrogram
  • 8. METRICS BETWEEN CLUSTERS • Measures of distance (similarity): the distance between two clusters • computed based on length of the straight line drawn from one cluster to another. • This is commonly referred to as the Euclidean distance. Many other distance metrics have been developed. • Linkage Criteria: determine from where distance is computed. • single-linkage : computed between the two most similar parts of a cluster • complete-linkage: computed between the two least similar bits of a cluster • mean or average-linkage : the center of the clusters • some other criterion. • Where there are no clear theoretical justifications for choice of linkage criteria, Ward’s method is the sensible default. This method works out which observations to group based on reducing the sum of squared distances of each observation from the average observation in a cluster.
  • 9. AGGLOMERATIVE VERSUS DIVISIVE ALGORITHMS • Hierarchical clustering typically works by sequentially merging similar clusters, as shown above. This is known as agglomerative hierarchical clustering (button-up). • By initially grouping all the observations into one cluster, and then successively splitting these clusters (top-down). This is known as divisive hierarchical clustering. Divisive clustering is rarely done in practice.
  • 10. WHAT ARE THE STRENGTHS AND WEAKNESSES OF HIERARCHICAL CLUSTERING? • The strengths of hierarchical clustering are that it is easy to understand and easy to do. There are four types of clustering algorithms in widespread use: hierarchical clustering, k-means cluster analysis, latent class analysis, and self-organizing maps. The math of hierarchical clustering is the easiest to understand. • The weaknesses are that it rarely provides the best solution, it involves lots of arbitrary decisions, it does not work with missing data, it works poorly with mixed data types, it does not work well on very large data sets, and its main output, the dendrogram, is commonly misinterpreted.
  • 11. ROLE OF DENDROGRAMS IN AGGLOMERATIVE HIERARCHICAL CLUSTERING original datapoints distribution dendrograms of these datapoints
  • 12. ROLE OF DENDROGRAMS IN AGGLOMERATIVE HIERARCHICAL CLUSTERING  once the big cluster is formed, the longest vertical distance is selected, which is then drawn through it.  As the horizontal line crosses the blue line at two points, the number of clusters would be two. The above diagram shows the two clusters from our datapoints.
  • 13. DISCUSSION  Basically the horizontal line is a threshold, which defines the minimum distance required to be a separate cluster. If we draw a line further down, the threshold required to be a new cluster will be decreased and more clusters will be formed as see in the image right.  In the plot, the horizontal line passes through four vertical lines resulting in four clusters: cluster of points 6,7,8, 10, cluster of points 3,2,4, 1, points 9, and 5 will be treated as single point clusters.
  • 14. EX. CLUSTERS OF THE DATA POINT IN PIMA INDIAN DIABETES DATASET Pima Indian Diabetes Dataset Prediction by Hierarchy-based algorithm
  • 15. EX. CLUSTERS OF THE DATA POINT IN SHOPPING TRENDS DATASET Draw a horizontal line that passes through longest vertical distances without a horizontal line, we get 5 clusters. Original data set.
  • 16. DISCUSSION  The data points in the bottom right belong to the customers with high salaries but low spending. These are the customers that spend their money carefully.  The customers at top right (green data points), these are the customers with high salaries and high spending. These are the type of customers that companies target.  The customers in the middle (blue data points) are the ones with average income and average salaries. The highest numbers of customers belong to this category. Salary index Spending index