SlideShare a Scribd company logo
1 of 16
Download to read offline
Clustering
Lecture 4: Density-based Methods
Jing Gao
SUNY Buffalo
1
Outline
• Basics
– Motivation, definition, evaluation
• Methods
– Partitional
– Hierarchical
– Density-based
– Mixture model
– Spectral methods
• Advanced topics
– Clustering ensemble
– Clustering in MapReduce
– Semi-supervised clustering, subspace clustering, co-clustering,
etc.
2
Density-based Clustering
• Basic idea
– Clusters are dense regions in the data space,
separated by regions of lower object density
– A cluster is defined as a maximal set of density-
connected points
– Discovers clusters of arbitrary shape
• Method
– DBSCAN
3
Density Definition
• -Neighborhood – Objects within a radius of  from
an object.
• “High density” - ε-Neighborhood of an object contains
at least MinPts of objects.
q p
ε
ε
ε-Neighborhood of p
ε-Neighborhood of q
Density of p is “high” (MinPts = 4)
Density of q is “low” (MinPts = 4)
}
)
,
(
|
{
:
)
( 
 
q
p
d
q
p
N
4
Core, Border & Outlier
Given  and MinPts,
categorize the objects into
three exclusive groups.
 = 1unit, MinPts = 5
Core
Border
Outlier
A point is a core point if it has more than a
specified number of points (MinPts) within
Eps—These are points that are at the
interior of a cluster.
A border point has fewer than MinPts
within Eps, but is in the neighborhood
of a core point.
A noise point is any point that is not a
core point nor a border point.
5
Example
Original Points Point types: core,
border and outliers
 = 10, MinPts = 4
6
Density-reachability
• Directly density-reachable
• An object q is directly density-reachable from object p
if p is a core object and q is in p’s -neighborhood.
q p
ε
ε
• q is directly density-reachable from p
• p is not directly density-reachable from
q
• Density-reachability is asymmetric
MinPts = 4
7
Density-reachability
• Density-Reachable (directly and indirectly):
– A point p is directly density-reachable from p2
– p2 is directly density-reachable from p1
– p1 is directly density-reachable from q
– p  p2  p1 q form a chain
p
q
p2
• p is (indirectly) density-reachable
from q
• q is not density-reachable from p
p1
MinPts = 7
8
DBSCAN Algorithm: Example
• Parameter
•  = 2 cm
• MinPts = 3
for each o  D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
9
DBSCAN Algorithm: Example
• Parameter
•  = 2 cm
• MinPts = 3
for each o  D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
10
DBSCAN Algorithm: Example
• Parameter
•  = 2 cm
• MinPts = 3
for each o  D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
11
DBSCAN: Sensitive to Parameters
12
DBSCAN: Determining EPS and MinPts
• Idea is that for points in a cluster, their kth nearest
neighbors are at roughly the same distance
• Noise points have the kth nearest neighbor at farther
distance
• So, plot sorted distance of every point to its kth nearest
neighbor
13
When DBSCAN Works Well
Original Points Clusters
• Resistant to Noise
• Can handle clusters of different shapes and sizes
14
When DBSCAN Does NOT Work Well
Original Points
(MinPts=4, Eps=9.92).
(MinPts=4, Eps=9.75)
• Cannot handle varying densities
• sensitive to parameters—hard to
determine the correct set of
parameters
15
Take-away Message
• The basic idea of density-based clustering
• The two important parameters and the definitions of
neighborhood and density in DBSCAN
• Core, border and outlier points
• DBSCAN algorithm
• DBSCAN’s pros and cons
16

More Related Content

Similar to clustering density technidques in machine learning

26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptxssusere1fd42
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfp_manimozhi
 
Graph and Density Based Clustering
Graph and Density Based ClusteringGraph and Density Based Clustering
Graph and Density Based ClusteringAyushAnand105
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringPier Luca Lanzi
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural NetworksJunho Cho
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageMLconf
 

Similar to clustering density technidques in machine learning (20)

26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
KNN
KNNKNN
KNN
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdf
 
Graph and Density Based Clustering
Graph and Density Based ClusteringGraph and Density Based Clustering
Graph and Density Based Clustering
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based Clustering
 
DBSCAN (1) (4).pptx
DBSCAN (1) (4).pptxDBSCAN (1) (4).pptx
DBSCAN (1) (4).pptx
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

clustering density technidques in machine learning

  • 1. Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1
  • 2. Outline • Basics – Motivation, definition, evaluation • Methods – Partitional – Hierarchical – Density-based – Mixture model – Spectral methods • Advanced topics – Clustering ensemble – Clustering in MapReduce – Semi-supervised clustering, subspace clustering, co-clustering, etc. 2
  • 3. Density-based Clustering • Basic idea – Clusters are dense regions in the data space, separated by regions of lower object density – A cluster is defined as a maximal set of density- connected points – Discovers clusters of arbitrary shape • Method – DBSCAN 3
  • 4. Density Definition • -Neighborhood – Objects within a radius of  from an object. • “High density” - ε-Neighborhood of an object contains at least MinPts of objects. q p ε ε ε-Neighborhood of p ε-Neighborhood of q Density of p is “high” (MinPts = 4) Density of q is “low” (MinPts = 4) } ) , ( | { : ) (    q p d q p N 4
  • 5. Core, Border & Outlier Given  and MinPts, categorize the objects into three exclusive groups.  = 1unit, MinPts = 5 Core Border Outlier A point is a core point if it has more than a specified number of points (MinPts) within Eps—These are points that are at the interior of a cluster. A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point. A noise point is any point that is not a core point nor a border point. 5
  • 6. Example Original Points Point types: core, border and outliers  = 10, MinPts = 4 6
  • 7. Density-reachability • Directly density-reachable • An object q is directly density-reachable from object p if p is a core object and q is in p’s -neighborhood. q p ε ε • q is directly density-reachable from p • p is not directly density-reachable from q • Density-reachability is asymmetric MinPts = 4 7
  • 8. Density-reachability • Density-Reachable (directly and indirectly): – A point p is directly density-reachable from p2 – p2 is directly density-reachable from p1 – p1 is directly density-reachable from q – p  p2  p1 q form a chain p q p2 • p is (indirectly) density-reachable from q • q is not density-reachable from p p1 MinPts = 7 8
  • 9. DBSCAN Algorithm: Example • Parameter •  = 2 cm • MinPts = 3 for each o  D do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE 9
  • 10. DBSCAN Algorithm: Example • Parameter •  = 2 cm • MinPts = 3 for each o  D do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE 10
  • 11. DBSCAN Algorithm: Example • Parameter •  = 2 cm • MinPts = 3 for each o  D do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE 11
  • 12. DBSCAN: Sensitive to Parameters 12
  • 13. DBSCAN: Determining EPS and MinPts • Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance • Noise points have the kth nearest neighbor at farther distance • So, plot sorted distance of every point to its kth nearest neighbor 13
  • 14. When DBSCAN Works Well Original Points Clusters • Resistant to Noise • Can handle clusters of different shapes and sizes 14
  • 15. When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.92). (MinPts=4, Eps=9.75) • Cannot handle varying densities • sensitive to parameters—hard to determine the correct set of parameters 15
  • 16. Take-away Message • The basic idea of density-based clustering • The two important parameters and the definitions of neighborhood and density in DBSCAN • Core, border and outlier points • DBSCAN algorithm • DBSCAN’s pros and cons 16