SlideShare a Scribd company logo
1 of 14
Nadar Saraswathi college of arts
and science
Theni
Department of CS & IT
Subject:Data Mining and Ware Housing
Topic: Hierarchical methods distance, based agglomerative and divisible clustering
Presented by: S.Swetha
M.sc(IT)
Introduction
• Distance or proximity measures are used to determine the similarity or
“closeness” between similar objects in the dataset. The goal of proximity
measures is to find similar objects and to group them in the same cluster.
• The goal of proximity measures is to find similar objects and to group them
in the same cluster.
• Some common examples of distance measures that can be used to compute
the proximity matrix in hierarchical clustering, including the following:
Continue…..
• Euclidean Distance
• Mahalanobis Distance
• Minkowski Distance
• I would avoid the mathematical formulas involved in explaining the distances
but rather summarise the uses and functions of these distance measures. A
simple google search can provide tons of examples and research papers if
you are interested in learning the underlying principles which support these
calculations.
Agglomerative hierarchical clustering:
• Agglomerative clustering is one of the most common types of hierarchical
clustering used to group similar objects in clusters. Agglomerative clustering
is also known as AGNES (Agglomerative Nesting). In agglomerative
clustering, each data point act as an individual cluster and at each step, data
objects are grouped in a bottom-up method. Initially, each data object is in its
cluster. At each iteration, the clusters are combined with different clusters
until one cluster is formed.
• With the help of given demonstration, we can understand that how the
actual algorithm work. Here no calculation has been done below all the
proximity among the clusters are assumed.
• Let’s suppose we have six different data points P, Q, R, S, T, V.
Diagram:
Steps
• Consider each alphabet (P, Q, R, S, T, V) as an individual cluster and find the
distance between the individual cluster from all other clusters.
• Now, merge the comparable clusters in a single cluster. Let’s say cluster Q
and Cluster R are similar to each other so that we can merge them in the
second step. Finally, we get the clusters [ (P), (QR), (ST), (V)].
• Here, we recalculate the proximity as per the algorithm and combine the two
closest clusters [(ST), (V)] together to form new clusters as [(P), (QR),
(STV)].
Continue…..
• Repeat the same process. The clusters STV and PQ are comparable and
combined together to form a new cluster. Now we have [(P), (QQRSTV)].
• Finally, the remaining two clusters are merged together to form a single
cluster [(PQRSTV)].
Divisive Hierarchical Clustering:
• Divisive hierarchical clustering is exactly the opposite of Agglomerative
Hierarchical clustering. In Divisive Hierarchical clustering, all the data points
are considered an individual cluster, and in every iteration, the data points
that are not similar are separated from the cluster. The separated data points
are treated as an individual cluster. Finally, we are left with N clusters.
Advantages of Hierarchical clustering:
• It is simple to implement and gives the best output in some cases.
• It is easy and results in a hierarchy, a structure that contains more
information.
• It does not need us to pre-specify the number of clusters.
Disadvantages of hierarchical clustering:
• It breaks the large clusters.
• It is Difficult to handle different sized clusters and convex shapes.
• It is sensitive to noise and outliers.
• The algorithm can never be changed or deleted once it was done previously.
Data mining and warehousing

More Related Content

Similar to Data mining and warehousing

Similar to Data mining and warehousing (20)

K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Clustering
ClusteringClustering
Clustering
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Hierarchical methods navdeep kaur newww.pptx
Hierarchical methods navdeep kaur newww.pptxHierarchical methods navdeep kaur newww.pptx
Hierarchical methods navdeep kaur newww.pptx
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 

Recently uploaded

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Data mining and warehousing

  • 1. Nadar Saraswathi college of arts and science Theni Department of CS & IT Subject:Data Mining and Ware Housing Topic: Hierarchical methods distance, based agglomerative and divisible clustering Presented by: S.Swetha M.sc(IT)
  • 2. Introduction • Distance or proximity measures are used to determine the similarity or “closeness” between similar objects in the dataset. The goal of proximity measures is to find similar objects and to group them in the same cluster. • The goal of proximity measures is to find similar objects and to group them in the same cluster. • Some common examples of distance measures that can be used to compute the proximity matrix in hierarchical clustering, including the following:
  • 3. Continue….. • Euclidean Distance • Mahalanobis Distance • Minkowski Distance • I would avoid the mathematical formulas involved in explaining the distances but rather summarise the uses and functions of these distance measures. A simple google search can provide tons of examples and research papers if you are interested in learning the underlying principles which support these calculations.
  • 4. Agglomerative hierarchical clustering: • Agglomerative clustering is one of the most common types of hierarchical clustering used to group similar objects in clusters. Agglomerative clustering is also known as AGNES (Agglomerative Nesting). In agglomerative clustering, each data point act as an individual cluster and at each step, data objects are grouped in a bottom-up method. Initially, each data object is in its cluster. At each iteration, the clusters are combined with different clusters until one cluster is formed.
  • 5. • With the help of given demonstration, we can understand that how the actual algorithm work. Here no calculation has been done below all the proximity among the clusters are assumed. • Let’s suppose we have six different data points P, Q, R, S, T, V.
  • 6.
  • 8. Steps • Consider each alphabet (P, Q, R, S, T, V) as an individual cluster and find the distance between the individual cluster from all other clusters. • Now, merge the comparable clusters in a single cluster. Let’s say cluster Q and Cluster R are similar to each other so that we can merge them in the second step. Finally, we get the clusters [ (P), (QR), (ST), (V)]. • Here, we recalculate the proximity as per the algorithm and combine the two closest clusters [(ST), (V)] together to form new clusters as [(P), (QR), (STV)].
  • 9. Continue….. • Repeat the same process. The clusters STV and PQ are comparable and combined together to form a new cluster. Now we have [(P), (QQRSTV)]. • Finally, the remaining two clusters are merged together to form a single cluster [(PQRSTV)].
  • 10. Divisive Hierarchical Clustering: • Divisive hierarchical clustering is exactly the opposite of Agglomerative Hierarchical clustering. In Divisive Hierarchical clustering, all the data points are considered an individual cluster, and in every iteration, the data points that are not similar are separated from the cluster. The separated data points are treated as an individual cluster. Finally, we are left with N clusters.
  • 11.
  • 12. Advantages of Hierarchical clustering: • It is simple to implement and gives the best output in some cases. • It is easy and results in a hierarchy, a structure that contains more information. • It does not need us to pre-specify the number of clusters.
  • 13. Disadvantages of hierarchical clustering: • It breaks the large clusters. • It is Difficult to handle different sized clusters and convex shapes. • It is sensitive to noise and outliers. • The algorithm can never be changed or deleted once it was done previously.