SlideShare a Scribd company logo
1 of 16
Unsupervised learning:
Clustering
Unsupervised Learning
• Unsupervised Learning: The data has no target
attribute.
– We want to explore the data to find some
intrinsic structures (Patterns) in them.
• Clustering is a technique for finding similarity
groups in data, called Clusters i.e.,
– data instances that are similar to (near) each
other are in the same cluster
– data instances that are very different (far away)
from each other fall in different clusters.
A few clustering applications
• In marketing, segment customers according to
their similarities
• Given a collection of text documents, organize
them according to their content similarities
DISTANCE METRICS
Distance for numeric attributes
• Denote distance with: dist(xi, xj), where xi and
xj are data points
• h= 2 is Euclidean and h=1 is Manhattan distance
• When attributes have similar scale: (1,2), (2,1)
- Manhattan = Abs(1-2) + Abs(2-1)
- Euclidean = Squareroot((1-2)^2+(2-1)^2)
Choosing the distance metric:
• When attributes have different ranges (10,
100), (50, 500)
– Manhattan = 440
– Euclidean= 401.99
• Manhattan is more stable than Euclidean
• Chebychev Distance :
HOW DO WE EMPLOY
DISTANCE IN A CLUSTER
Distance between Clusters
• Single link: smallest distance between an element in
one cluster and an element in the other cluster
• Complete link: largest distance between an element in
one cluster and an element in the other cluster
• Average: average distance between an element in one
cluster and an element in the other cluster
• Centroid: distance between the centroids of two
clusters
• Medoid: distance between the medoids of two clusters
– Medoid: one chosen, centrally located object in the
cluster
Representation and Naming Clusters
• Use the centroid of each cluster to represent
the cluster
• Using frequent values to represent cluster
• Using classification representation
TYPES OF CLUSTERING
Algorithms
• Hierarchical approach:
– Create a hierarchical decomposition of the
set of data (or objects) using some criterion
• Partitioning approach:
– Construct various partitions and then
evaluate them by some criterion, e.g.,
minimizing the sum of square errors
Agglomerative Clustering (Hierarchical)
• Assign each item to its own cluster, so that if you
have N items, you now have N clusters, each
containing just one item.
• Merge most similar clusters into a single cluster, so
that now you have one less cluster.
• Compute distances (similarities) between the new
cluster and each of the old clusters.
• Repeat steps 2 and 3 until all items are clustered into
a single cluster of size N.
K-means Clustering (Partitional)
• K-means is a partitional clustering algorithm
as it partitions the given data into k-clusters.
– Each cluster has a cluster center, called
centroid.
– “k” is specified by the user
K-means Algorithm
• Given k, the k-means algorithm works as follows:
1. Randomly choose k data points (seeds) to be the initial
centroids, cluster centers
2. Assign each data point to the closest centroid
3. Re-compute the centroids using the current cluster
memberships.
4. If a convergence criterion is not met, go to 2.
• Stopping/convergence criterion
- no (or minimum) re-assignments of data points to
different clusters,
- no (or minimum) change of centroids
• Strengths of k-means
– Simple and efficient: Time complexity: O(tkn),
– n is the number of data points,
– k is the number of clusters, and
– t is the number of iterations.
– Since both k and t are small. k-means is considered
a linear algorithm.
• Problem with K-Means?
– The k-means algorithm is sensitive to outliers !!
– K-Medoids: Instead of taking the mean value of the
object in a cluster as a reference point, medoids can be
used, which is the most centrally located object in a
cluster.
• Problem with Medoids?
– More robust than k-means in the presence of noise and
outliers because a medoid is less influenced by outliers
or other extreme values than a mean
– Works efficiently for small data sets but does not scale
well for large data sets.

More Related Content

Similar to Unsupervised learning Modi.pptx

Similar to Unsupervised learning Modi.pptx (20)

CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Cluster Analysis.pptx
Cluster Analysis.pptxCluster Analysis.pptx
Cluster Analysis.pptx
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Clustering
ClusteringClustering
Clustering
 
clustering-151017180103-lva1-app6892 (1).pdf
clustering-151017180103-lva1-app6892 (1).pdfclustering-151017180103-lva1-app6892 (1).pdf
clustering-151017180103-lva1-app6892 (1).pdf
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.ppt
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Hierarchical clustering.pptx
Hierarchical clustering.pptxHierarchical clustering.pptx
Hierarchical clustering.pptx
 
Clustering
ClusteringClustering
Clustering
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 

Recently uploaded

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

Unsupervised learning Modi.pptx

  • 2. Unsupervised Learning • Unsupervised Learning: The data has no target attribute. – We want to explore the data to find some intrinsic structures (Patterns) in them. • Clustering is a technique for finding similarity groups in data, called Clusters i.e., – data instances that are similar to (near) each other are in the same cluster – data instances that are very different (far away) from each other fall in different clusters.
  • 3. A few clustering applications • In marketing, segment customers according to their similarities • Given a collection of text documents, organize them according to their content similarities
  • 5. Distance for numeric attributes • Denote distance with: dist(xi, xj), where xi and xj are data points • h= 2 is Euclidean and h=1 is Manhattan distance • When attributes have similar scale: (1,2), (2,1) - Manhattan = Abs(1-2) + Abs(2-1) - Euclidean = Squareroot((1-2)^2+(2-1)^2)
  • 6. Choosing the distance metric: • When attributes have different ranges (10, 100), (50, 500) – Manhattan = 440 – Euclidean= 401.99 • Manhattan is more stable than Euclidean • Chebychev Distance :
  • 7. HOW DO WE EMPLOY DISTANCE IN A CLUSTER
  • 8. Distance between Clusters • Single link: smallest distance between an element in one cluster and an element in the other cluster • Complete link: largest distance between an element in one cluster and an element in the other cluster • Average: average distance between an element in one cluster and an element in the other cluster • Centroid: distance between the centroids of two clusters • Medoid: distance between the medoids of two clusters – Medoid: one chosen, centrally located object in the cluster
  • 9. Representation and Naming Clusters • Use the centroid of each cluster to represent the cluster • Using frequent values to represent cluster • Using classification representation
  • 11. Algorithms • Hierarchical approach: – Create a hierarchical decomposition of the set of data (or objects) using some criterion • Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors
  • 12. Agglomerative Clustering (Hierarchical) • Assign each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. • Merge most similar clusters into a single cluster, so that now you have one less cluster. • Compute distances (similarities) between the new cluster and each of the old clusters. • Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.
  • 13. K-means Clustering (Partitional) • K-means is a partitional clustering algorithm as it partitions the given data into k-clusters. – Each cluster has a cluster center, called centroid. – “k” is specified by the user
  • 14. K-means Algorithm • Given k, the k-means algorithm works as follows: 1. Randomly choose k data points (seeds) to be the initial centroids, cluster centers 2. Assign each data point to the closest centroid 3. Re-compute the centroids using the current cluster memberships. 4. If a convergence criterion is not met, go to 2. • Stopping/convergence criterion - no (or minimum) re-assignments of data points to different clusters, - no (or minimum) change of centroids
  • 15. • Strengths of k-means – Simple and efficient: Time complexity: O(tkn), – n is the number of data points, – k is the number of clusters, and – t is the number of iterations. – Since both k and t are small. k-means is considered a linear algorithm.
  • 16. • Problem with K-Means? – The k-means algorithm is sensitive to outliers !! – K-Medoids: Instead of taking the mean value of the object in a cluster as a reference point, medoids can be used, which is the most centrally located object in a cluster. • Problem with Medoids? – More robust than k-means in the presence of noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean – Works efficiently for small data sets but does not scale well for large data sets.