Successfully reported this slideshow.

Lecture17

1,920 views

Published on

Published in: Education, Technology
  • Be the first to comment

Lecture17

  1. 1. Introduction to Machine Learning Lecture 17 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  2. 2. Recap of Lectures 5-16 Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lectures 5-16 Data classification Labeled data Build a model that covers all the space Association rule analysis Unlabeled data Get the most frequent/important associations Slide 3 Artificial Intelligence Machine Learning
  4. 4. Today’s Agenda What’s clustering? What’s a good clustering solution? Components of a clustering task Types of Clustering Hierarchical Clustering Slide 4 Artificial Intelligence Machine Learning
  5. 5. What’s Clustering Clustering g The goal of clustering is to separate a finite unlabeled data set into a finite and discrete se of “natural,” hidden data s uc u es o e a d d sc e e set o a u a , dde da a structures As a data mining task, data clustering aims at the identification o clusters, o de se y populated eg o s, acco d g o some of c us e s, or densely popu a ed regions, according to so e measurement or similarity function Studied and applied in many fields Statistics S Spatial database Machine learning (unsupervised learning) Data mining Slide 5 Artificial Intelligence Machine Learning
  6. 6. What’s a Good Clustering Sol.? in cluster analysis a g y group of objects is split up into a number of more or less p j p p homogeneous subgroups on the basis of an often subjectively chosen measure of similarity (i.e., chosen subjectively based on its ability to create “interesting” clusters), such that the similarity between objects within a subgroup is larger than the similarity between objects belonging to different subgroups Slide 6 Artificial Intelligence Machine Learning
  7. 7. What’s a Good Clustering Sol.? Do you thing this is g y g good? Slide 7 Artificial Intelligence Machine Learning
  8. 8. What’s a Good Clustering Sol.? Do you thing this is better? Slide 8 Artificial Intelligence Machine Learning
  9. 9. What’s a Good Clustering Sol.? Do you thing this is better? Slide 9 Artificial Intelligence Machine Learning
  10. 10. Good Clustering Sols. So, we got the point visually. Can we express more , g p y p formally when a clustering solution is good? Homogeneity and separation principles Homogeneity: Elements within a cluster are close to each other Separation: Elements in different clusters are further apart from each other …clustering is not an easy task! clustering Slide 10 Artificial Intelligence Machine Learning
  11. 11. Components of a Clustering Task Slide 11 Artificial Intelligence Machine Learning
  12. 12. Types of Clustering Hard partitional clustering p g Organize elements into disjoin g oups groups Hierarchical clustering Organize elements i O i l into a tree, leaves represent genes and the length of the paths between leaves represents the distances between genes. Similar genes lie within the same subtrees Also classified as Agglomerative: Start with every element in its own cluster, and cluster iteratively join clusters together Divisive: Start with one cluster and iteratively divide it into smaller clusters Slide 12 Artificial Intelligence Machine Learning
  13. 13. Types of Clustering HIERARCHICAL CLUSTERING Slide 13 Artificial Intelligence Machine Learning
  14. 14. Example of Hierarchical Clust. Slide 14 Artificial Intelligence Machine Learning
  15. 15. Example of Hierarchical Clust. Slide 15 Artificial Intelligence Machine Learning
  16. 16. Example of Hierarchical Clust. Slide 16 Artificial Intelligence Machine Learning
  17. 17. Example of Hierarchical Clust. Slide 17 Artificial Intelligence Machine Learning
  18. 18. Example of Hierarchical Clust. Slide 18 Artificial Intelligence Machine Learning
  19. 19. Example of Hierarchical Clust. Hierarchical clustering is sometimes used to reveal g evolutionary history It provides very informative descriptions and visualization for the potential data clustering structures, especially when real hierarchical relations exist in the data. Slide 19 Artificial Intelligence Machine Learning
  20. 20. Pseudocode Hierarchical Clustering (d , n) Form n clusters each with one element 1. Construct a graph T by assigning one vertex to each cluster 2. while there is more than one cluster 3. 3 Find the two closest clusters C1 and C2 1. Merge C1 and C2 into new cluster C with |C1| +|C2| elements 2. Compute di t C t distance f from C t all other clusters to ll th lt 3. if they are close 4. Add a new vertex C to T and connect to vertices C1 and C2 1. Remove rows and columns of d corresponding to C1 and C2 2. Add a row & column to d corresponding to the new cluster C 3. return T 4. The algorithm takes a nxn distance matrix d of pairwise distances between points as an input. Slide 20 Artificial Intelligence Machine Learning
  21. 21. Are They Similar? Slide 21 Artificial Intelligence Machine Learning
  22. 22. Distance Functions How close? Distance between two clusters is the smallest distance be ee any pair of e elements between a y pa o their e e e s davg(C, C*) = (1 / |C*||C|) ∑ d(x,y) d (C d( ) for all elements x in C and y in C* Distance between two clusters is the average distance between all pairs of their elements Slide 22 Artificial Intelligence Machine Learning
  23. 23. Distance Functions Slide 23 Artificial Intelligence Machine Learning
  24. 24. Some remarks The common criticism HC algorithms lack robustness, since they are sensitive to noise a d ou e s o se and outliers Once an object is assigned to a cluster is never reconsidered l it is, t least, O(N2) Computational complexity i at l C t ti l t Recently New improvements to deal with large data sets E.g.: CURE, ROCK, Chameleon and BIRCH Slide 24 Artificial Intelligence Machine Learning
  25. 25. Next Class More topics in clustering: K-means Slide 25 Artificial Intelligence Machine Learning
  26. 26. Introduction to Machine Learning Lecture 17 Clustering Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

×