SlideShare a Scribd company logo
1 of 24
By – AYUSH
Netaji Subhash engineering college, kolkata
Introduction
 The method of identifying similar groups of data in a
dataset is called clustering.
 It is one of the most popular techniques in data science.
 Entities in each group are comparatively more similar to
entities of that group than those of the other groups.
 In this presentation, I will be taking you through the
types of clustering, different clustering algorithms and a
brief view of two of the most commonly used clustering
methods i.e.,
Graph Based Clustering and Density Based Clustering.
Graph
Based
Clustering
Graph Theory :
 Graph Theory can be used for getting thorough information
about the inside structure of the data set in terms of :
- cliques (subgraph of graph such that all vertices in subgraph are
completely connected)
- clusters (highly connected group of nodes)
- centrality (measure of importance of a node in the network)
- outliers (unimportant nodes)
 Applications :
- Social Graphs (drawing edges between us and the people
and everything)
- Path Optimization Algorithms (Minimal Spanning Tree, Kruskal’s, Prim’s)
- GPS Navigation Systems (shortest path APIs)
GRAPH BASED CLUSTERING
 Graph-based clustering is a method for identifying
groups of similar cells or samples.
 It makes no prior assumptions about the clusters in the
data.
 This means the number, size, density, and shape of
clusters does not need to be known or assumed prior to
clustering.
 Consequently, graph-based clustering is useful for
identifying clustering in complex data sets such as
scRNA-seq.
IDEA :
• Graph-Based clustering uses the proximity graph
– Start with the proximity matrix
– Consider each point as a node in a graph
– Each edge between two nodes has a weight which is the
proximity between the two points
– Initially the proximity graph is fully connected
– MIN (single-link) and MAX (complete-link) can be viewed as
starting with this graph
• In the simplest case, clusters are connected components in the graph.
GRAPH CLUSTERING IDEA :
HIERARCHICAL METHOD :
1) Determining a minimal spanning tree (MST)
2) Delete branches iteratively
New Connected Components = Cluster
MINIMAL SPANNING TREE :
A minimal spanning tree of a connected graph G = (V,E) is a
connected subgraph with minimal weight that contains all nodes of
G and has no cycles.
Minimal Spanning Trees can be calculated with :-
 Prim’s Algorithm
- Prim's (also known as Jarník's) algorithm is a greedy algorithm that finds a
minimum spanning tree for a weighted undirected graph.
- This means it finds a subset of the edges that forms a tree that includes
every vertex, where the total weight of all the edges in the tree is
minimized.
 Kruskal’s Algorithm
- Kruskal's algorithm is a minimum-spanning-tree algorithm which finds an
edge of the least possible weight that connects any two trees in the forest.
- It is a greedy algorithm in graph theory as it finds a minimum spanning tree
for a connected weighted graph adding increasing cost arcs at each step.
Branch Deletion
Delete Branches – Different Strategies :-
I. Delete the branch with maximum weight.
II. Delete inconsistent branches.
III. Delete by analysis of weights.
SUMMARY :-
In graph based clustering objects are represented as
nodes in a complete or connected graph.
The distance between two objects is given by the weight
of the corresponding branch.
Hierarchical Method :
(1) Determine a minimal spanning tree(MST).
(2) Delete branches iteratively.
Visualization of information in large datasets.
DENSITY
BASED
CLUSTERING
DBSCAN :
 Density based spatial clustering of applications with noise.
 It is one of the most cited clustering algorithms in the literature.
Features : -
• Spatial data
(geomarketing, tomography, satellite images)
• Discovery of clusteres with arbitrary shape
(spherical, drawn out, linear, elongated)
• Good efficiency or large databases
(parallel programming)
• Only two parameters required.
• No prior knowledge of the number of clusters are required.
IDEA :
Clusters have a high density of points.
In the area of noise the density is lower than in any of the
clusters.
Goal :
Formalize the notions of clusters and
noise.
Density based cluster : definition
 Relies on a density-based notion of cluster: A cluster is defined as
a
maximum set of density-connected points.
 A cluster C is a subset of D satisfying
- For all p, q if p is in C, and q is density reachable from p, then
q
is also in C
- For all p, q in C: p is density connected to q
DENSITY BASED CLUSTERING: DATA
● Two Parameters:
- Eps : Maximum radius of the neighbourhood
- MinPts : Minimum number of points in an Eps-neighbourhood of that point
● Neps(p) : {q belongs to D| dist(p,q)<= Eps}
Problem :
 In each cluster there are two kinds of points :
- points inside the cluster (core points)
- points on the border (border points)
 An Eps-neighbourhood of a border point contains significantly less
points than an Eps-neighbourhood of a core point.
IDEA :
For every point p in a cluster C there is a point q ∈
C, so that
1) p is inside the Eps-neighbourhood of q and
2) Neps(q) contains at least MinPts points.
● Directly density-reachable: A point p is directly
density-reachable from point q with regard to Eps and MinPts, if
1) p ∈ to Neps (q) (reachability)
2)|Neps (q)|>= MinPts (core point condition)
DEFINITION :
Density-reachable:
 A point p is density-reachable
from a point q wrt. Eps,
MinPts if there is a chain of
points p1,...,pn,p1= q, pn = p
such that pi+1 is directly
density-reachable from pi.

Density-concerned:
 A point p is density-connected
to a wrt. Eps, MinPts if there is
a point o such that both, p and
q are density-reachable from
O wrt. Eps and MinPts.

DBSCAN (algorithm) :
Start with an arbitrary point p from the database and
retrieve all points density-reachable from p with regard to
Eps and MinPts.
If p is a core point, the procedure yields a cluster with
regards to Eps and MinPts and the point is classified.
If p is a border point, no points are density-reachable
from p and DBSCAN visits the next unclassified point in
the database.
Density based clustering – application
CONCLUSION
Clustering is a descriptive technique.
The solution is not unique and it strongly depends
upon the analyst’s choices.
We described how it is possible to combine different
results in order to obtain stable clusters, not
depending too much on the criteria selected to
analyze data.
Clustering always provides groups, even if there is no
group structure.
REFERENCES :
 A big help from Eric Kropat.
 Wikipedia , Google Searches

More Related Content

What's hot

What's hot (20)

Red black trees
Red black treesRed black trees
Red black trees
 
CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of Algorithms
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Prims & kruskal algorithms
Prims & kruskal algorithmsPrims & kruskal algorithms
Prims & kruskal algorithms
 
Cs6702 graph theory and applications 2 marks questions and answers
Cs6702 graph theory and applications 2 marks questions and answersCs6702 graph theory and applications 2 marks questions and answers
Cs6702 graph theory and applications 2 marks questions and answers
 
Prim's algorithm
Prim's algorithmPrim's algorithm
Prim's algorithm
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Community detection
Community detectionCommunity detection
Community detection
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.ppt
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
 
Heapify algorithm
Heapify algorithmHeapify algorithm
Heapify algorithm
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
prim's and kruskal's algorithm
prim's and kruskal's algorithmprim's and kruskal's algorithm
prim's and kruskal's algorithm
 
Connectivity of graph
Connectivity of graphConnectivity of graph
Connectivity of graph
 
Basis path testing
Basis path testingBasis path testing
Basis path testing
 
9. chapter 8 np hard and np complete problems
9. chapter 8   np hard and np complete problems9. chapter 8   np hard and np complete problems
9. chapter 8 np hard and np complete problems
 

Similar to Graph and Density Based Clustering

Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
ImXaib
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
prjpublications
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
Editor IJARCET
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
Editor IJARCET
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 

Similar to Graph and Density Based Clustering (20)

Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Db Scan
Db ScanDb Scan
Db Scan
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
Clustering
ClusteringClustering
Clustering
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster Analysis
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in Erlang
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 

Recently uploaded

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Recently uploaded (20)

Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

Graph and Density Based Clustering

  • 1. By – AYUSH Netaji Subhash engineering college, kolkata
  • 2. Introduction  The method of identifying similar groups of data in a dataset is called clustering.  It is one of the most popular techniques in data science.  Entities in each group are comparatively more similar to entities of that group than those of the other groups.  In this presentation, I will be taking you through the types of clustering, different clustering algorithms and a brief view of two of the most commonly used clustering methods i.e., Graph Based Clustering and Density Based Clustering.
  • 4. Graph Theory :  Graph Theory can be used for getting thorough information about the inside structure of the data set in terms of : - cliques (subgraph of graph such that all vertices in subgraph are completely connected) - clusters (highly connected group of nodes) - centrality (measure of importance of a node in the network) - outliers (unimportant nodes)  Applications : - Social Graphs (drawing edges between us and the people and everything) - Path Optimization Algorithms (Minimal Spanning Tree, Kruskal’s, Prim’s) - GPS Navigation Systems (shortest path APIs)
  • 5. GRAPH BASED CLUSTERING  Graph-based clustering is a method for identifying groups of similar cells or samples.  It makes no prior assumptions about the clusters in the data.  This means the number, size, density, and shape of clusters does not need to be known or assumed prior to clustering.  Consequently, graph-based clustering is useful for identifying clustering in complex data sets such as scRNA-seq.
  • 6. IDEA : • Graph-Based clustering uses the proximity graph – Start with the proximity matrix – Consider each point as a node in a graph – Each edge between two nodes has a weight which is the proximity between the two points – Initially the proximity graph is fully connected – MIN (single-link) and MAX (complete-link) can be viewed as starting with this graph • In the simplest case, clusters are connected components in the graph.
  • 8. HIERARCHICAL METHOD : 1) Determining a minimal spanning tree (MST) 2) Delete branches iteratively New Connected Components = Cluster MINIMAL SPANNING TREE : A minimal spanning tree of a connected graph G = (V,E) is a connected subgraph with minimal weight that contains all nodes of G and has no cycles.
  • 9. Minimal Spanning Trees can be calculated with :-  Prim’s Algorithm - Prim's (also known as Jarník's) algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. - This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized.  Kruskal’s Algorithm - Kruskal's algorithm is a minimum-spanning-tree algorithm which finds an edge of the least possible weight that connects any two trees in the forest. - It is a greedy algorithm in graph theory as it finds a minimum spanning tree for a connected weighted graph adding increasing cost arcs at each step.
  • 10. Branch Deletion Delete Branches – Different Strategies :- I. Delete the branch with maximum weight. II. Delete inconsistent branches. III. Delete by analysis of weights.
  • 11. SUMMARY :- In graph based clustering objects are represented as nodes in a complete or connected graph. The distance between two objects is given by the weight of the corresponding branch. Hierarchical Method : (1) Determine a minimal spanning tree(MST). (2) Delete branches iteratively. Visualization of information in large datasets.
  • 13. DBSCAN :  Density based spatial clustering of applications with noise.  It is one of the most cited clustering algorithms in the literature. Features : - • Spatial data (geomarketing, tomography, satellite images) • Discovery of clusteres with arbitrary shape (spherical, drawn out, linear, elongated) • Good efficiency or large databases (parallel programming) • Only two parameters required. • No prior knowledge of the number of clusters are required.
  • 14. IDEA : Clusters have a high density of points. In the area of noise the density is lower than in any of the clusters. Goal : Formalize the notions of clusters and noise.
  • 15. Density based cluster : definition  Relies on a density-based notion of cluster: A cluster is defined as a maximum set of density-connected points.  A cluster C is a subset of D satisfying - For all p, q if p is in C, and q is density reachable from p, then q is also in C - For all p, q in C: p is density connected to q
  • 16. DENSITY BASED CLUSTERING: DATA ● Two Parameters: - Eps : Maximum radius of the neighbourhood - MinPts : Minimum number of points in an Eps-neighbourhood of that point ● Neps(p) : {q belongs to D| dist(p,q)<= Eps}
  • 17. Problem :  In each cluster there are two kinds of points : - points inside the cluster (core points) - points on the border (border points)  An Eps-neighbourhood of a border point contains significantly less points than an Eps-neighbourhood of a core point.
  • 18. IDEA : For every point p in a cluster C there is a point q ∈ C, so that 1) p is inside the Eps-neighbourhood of q and 2) Neps(q) contains at least MinPts points.
  • 19. ● Directly density-reachable: A point p is directly density-reachable from point q with regard to Eps and MinPts, if 1) p ∈ to Neps (q) (reachability) 2)|Neps (q)|>= MinPts (core point condition) DEFINITION :
  • 20. Density-reachable:  A point p is density-reachable from a point q wrt. Eps, MinPts if there is a chain of points p1,...,pn,p1= q, pn = p such that pi+1 is directly density-reachable from pi.  Density-concerned:  A point p is density-connected to a wrt. Eps, MinPts if there is a point o such that both, p and q are density-reachable from O wrt. Eps and MinPts. 
  • 21. DBSCAN (algorithm) : Start with an arbitrary point p from the database and retrieve all points density-reachable from p with regard to Eps and MinPts. If p is a core point, the procedure yields a cluster with regards to Eps and MinPts and the point is classified. If p is a border point, no points are density-reachable from p and DBSCAN visits the next unclassified point in the database.
  • 22. Density based clustering – application
  • 23. CONCLUSION Clustering is a descriptive technique. The solution is not unique and it strongly depends upon the analyst’s choices. We described how it is possible to combine different results in order to obtain stable clusters, not depending too much on the criteria selected to analyze data. Clustering always provides groups, even if there is no group structure.
  • 24. REFERENCES :  A big help from Eric Kropat.  Wikipedia , Google Searches