Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 1728707 views
- AI and Machine Learning Demystified... by Carol Smith 3858060 views
- 10 facts about jobs in the future by Pew Research Cent... 830744 views
- Harry Surden - Artificial Intellige... by Harry Surden 750086 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1339376 views
- Pinot: Realtime Distributed OLAP da... by Kishore Gopalakri... 606400 views

389 views

Published on

Published in:
Education

No Downloads

Total views

389

On SlideShare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

29

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Prof. Pier Luca Lanzi Density Based Clustering Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
- 2. Prof. Pier Luca Lanzi
- 3. Prof. Pier Luca Lanzi
- 4. Prof. Pier Luca Lanzi What is density-based clustering? • Clustering based on density (local cluster criterion), such as density-connected points • Major features: §Discover clusters of arbitrary shape §Handle noise §One scan §Need density parameters as termination condition • Several interesting studies: §DBSCAN: Ester, et al. (KDD’96) §OPTICS: Ankerst, et al (SIGMOD’99). §DENCLUE: Hinneburg & D. Keim (KDD’98) §CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based) 4
- 5. Prof. Pier Luca Lanzi DBSCAN: Basic Concepts • The neighborhood within a radius ε of a given object is called the ε-neighborhood of the object • If the ε-neighborhood of an object contains at least MinPts objects, then the object is a core object • An object p is directly density-reachable from object q if p is within the ε-neighborhood of q and q is a core object • An object p is density-reachable from object q if there is a chain of object p1, …, pn where p1=p and pn=q such that pi+1 is directly density reachable from pi • An object p is density-connected to q with respect to ε and MinPts if there is an object o such that both p and q are density reachable from o 5
- 6. Prof. Pier Luca Lanzi DBSCAN: Basic Concepts • Density = number of points within a specified radius (Eps) • A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point • A noise point is any point that is not a core point or a border point • A density-based cluster is a set of density-connected objects that is maximal with respect to density-reachability 6
- 7. Prof. Pier Luca Lanzi Density-Reachable & Density-Connected • Directly density-reachable • Density-reachable • Density-connected p q p1 p q o p q MinPts = 5 Eps = 1 cm 7
- 8. Prof. Pier Luca Lanzi DBSCAN: Core, Border, and Noise Points 8
- 9. Prof. Pier Luca Lanzi DBSCAN Density Based Spatial Clustering • Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density-connected points • Discovers clusters of arbitrary shape in spatial databases with noise • The Algorithm §Arbitrary select a point p §Retrieve all points density-reachable from p given Eps and MinPts. §If p is a core point, a cluster is formed. §If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database §Continue the process until all of the points have been processed 9
- 10. Prof. Pier Luca Lanzi Core, Border and Noise Points Eps = 10, MinPts = 4 10 Original Points Point types: core, border and noise
- 11. Prof. Pier Luca Lanzi When DBSCAN Works Well • Resistant to Noise • Can handle clusters of different shapes and sizes Original Points Clusters 11
- 12. Prof. Pier Luca Lanzi When DBSCAN May Fail? • Varying densities • High-dimensional data Original Points (MinPts=4, Eps=9.75). (MinPts=4, Eps=9.92) 12
- 13. Prof. Pier Luca Lanzi Run the python notebook on density-based clustering
- 14. Prof. Pier Luca Lanzi Examples using R 14
- 15. Prof. Pier Luca Lanzi Density-Based Clustering in R library(fpc) set.seed(665544) n <- 600 x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,sd=0.2)) par(bg="grey40") ds <- dbscan(x, 0.2, showplot=1) 15
- 16. Prof. Pier Luca Lanzi Density-Based Clustering in R library(fpc) set.seed(665544) x <- seq(0,6.28,0.1) y <- sin(x) xd <- x+rnorm(630,sd=0.2) yd <- y+rnorm(630,sd=0.2) plot(xd,yd) par(bg="grey40") d <- cbind(xd,yd) # this works nicely since the epsilon is # the same size of the standard deviation (0.2) # used to generate the data ds <- dbscan(d, 0.2, showplot=1) # this does not work so nicely ds <- dbscan(d, 0.1, showplot=1) 16
- 17. Prof. Pier Luca Lanzi Clustering Comparisons on Sin Data 17 hierarchical clustering kmeans clustering
- 18. Prof. Pier Luca Lanzi Clustering Comparisons on Sin Data (k-means with 10 clusters) 18
- 19. Prof. Pier Luca Lanzi http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Density-Based_Clustering Software Packages

No public clipboards found for this slide

Be the first to comment