SlideShare a Scribd company logo
1 of 37
Department of Electrical Engineering
University of Arkansas
Density-Based Spatial Clustering
Md Abul Hayat
mahayat@uark.edu
OUTLINE
• Introduction
• k-means clustering
• DBSCAN
– Definitions
– Advantages
– Limitations
• OPTICS
– Definitions
– Advantages
– Limitations
K-means Clustering
– An Unsupervised approach for partitioning a data set into K distinct, non-
overlapping clusters. [Lloyd, 1982]
– We must first specify the desired number of clusters ‘K’.
– Then the K-means algorithm will assign each observation to exactly one of the
K clusters.
– The optimization problem that defines K-means clustering,
– The problem is computationally NP –hard.
K-means : Algorithm
• Lloyd’s Algorithm
– Mathematically, this is partitioning the observations according to
the Voronoi diagram generated by the means.
How Lloyd’s Algorithm Work
Problems with K-means
– K-means partition the space in
Voronoi cells and they are convex
in nature.
– Thus k-means does not perform
good when we have non-convex
clusters
– We have to provide the number of
clusters beforehand.
– Sometimes, we want to find out
the intrinsic number of clusters
within the dataset.
– No way of handling noise
separately.
Problems with K-means
• Non-convex Clusters
• When we do not know the number of clusters.
• To solve these issues, density based clustering was introduced.
DBSCAN
• Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
• Inventors:
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu.
• Paper : “A Density-Based Algorithm for Discovering Clusters in Large Spatial
Databases with Noise”
• Presented at the International Conference of Knowledge Discovery and Data
Mining (KDD) in 1996. KDD is a SIG of ACM.
• Citations: 13,293 (till 11/04/2018)
• The ‘2014 Test of Time’ award recognized DBSCAN as an influential
contributions to SIGKDD that have withstood the test of time.
• This is an unsupervised algorithm.
Definitions
– The shape of a neighborhood is determined by the choice of a distance
measure between two points p and q, denoted by d(p,q).
– For instance, when using the Manhattan distance in 2D space, the shape of
the neighborhood is rectangular.
– For the purpose of proper visualization, all examples will be in 2D space
using the Euclidean distance.
Distance Measures
– If we have two points
– Minkowski Distance:
Definitions
•
Definitions
• Introduction
Definitions
Definitions
• Introduction
• [ Link: Funny Visualization ]
DBSCAN: Algorithm
• Introduction
DCSCAN : Examples
• Resistant to Noise (unlike k-means)
• Can handle clusters of different shapes and sizes
Original Data After DBSCAN
DBSCAN Limitations
• Introduction
(MinPts=4, Eps=9.92).
(MinPts=4, Eps=9.75)
Original Data
• Cannot handle varying densities.
• Sensitive to parameter selection.
Heuristics for Choosing DBSCAN Parameters
– Let d be the distance of a point p to its k-th nearest neighbor, then the d-
neighborhood of p contains exactly k+1 points for almost all points p.
– For a given k we define a function k-dist (= d) from the database D to the
real numbers, mapping each point to the distance from its k-th nearest
neighbor.
– When sorting the points of the database in descending order of their k-dist
values, the graph of this function gives some hints concerning the density
distribution in the database.
– If we choose an arbitrary point p, set the parameter Eps to k-dist(p) and set
the parameter MinPts to k, all points with an equal or smaller k-dist value
will be core points.
– All points with a higher k-dist value ( left of the threshold) are
considered to be noise, all other points (right of the threshold) are assigned
to some cluster
DBSCAN : Parameter Selection
– The easier-to-set parameter of DBSCAN is the minPts parameter.
– Sander et al. suggest setting it to twice the dataset dimensionality, i.e.,
minPts = 2 · dim.
– Ester et al. provide a heuristic for choosing the ε parameter based on the
distance to the fourth nearest neighbor (for two/dimensional data).
– In Generalized DBSCAN, Sander et al. suggested using the (2 · dim - 1)
nearest neighbors and minPts = 2 · dim
OPTICS
• Ordering Points To Identify the Clustering Structure (OPTICS)
– Inventors: (1999)
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander
– Paper: “OPTICS: Ordering Points To Identify the Clustering Structure”
– OPTICS requires the same ε and minPts parameters as DBSCAN,
however, the ε parameter is theoretically unnecessary and is only used for
the practical purpose of reducing the runtime complexity of the algorithm.
– While DBSCAN may be thought of as a clustering algorithm, searching
for natural groups in data, OPTICS is an augmented ordering algorithm.
– In OPTICS, we have to introduce two more definitions.
– Here, we just fix the minPts parameter and we can get the insight of the
underlying clusters using a plot called ‘Reachability Plot’.
OPTICS : Definitions
• Introduction
OPTICS : Definitions
• Introduction
ε = Generating Distance
ε’ = Core Distance
Reachability Graph
Reachability Graph : Toy Example
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
R Package & Examples
• dbscan: Density Based Clustering of Applications with Noise
(DBSCAN) and Related Algorithms
– Published: May 19, 2018
– From the order discovered by OPTICS, two ways to group points into
clusters was discussed
ξ-Cluster
ξ-Cluster
• Introduction
ξ-Cluster
ξ-Cluster
• Introduction
Conclusion
• Reachability plots are helpful to determine the number of clusters.
• Can be applied to find clusters in high dim-data (like image).
• DBSCAN and OPTICS, both are unsupervised techniques.
Questions?

More Related Content

Similar to Fa18_P2.pptx

Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learningVivek Maskara
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Hyun Wong Choi
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection aftab alam
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data miningMITS Gwalior
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 

Similar to Fa18_P2.pptx (20)

Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Db Scan
Db ScanDb Scan
Db Scan
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
DBSCAN (1) (4).pptx
DBSCAN (1) (4).pptxDBSCAN (1) (4).pptx
DBSCAN (1) (4).pptx
 
KNN
KNN KNN
KNN
 

More from Md Abul Hayat

More from Md Abul Hayat (13)

Self-supervised Learning for Astronomical Images
Self-supervised Learning for Astronomical ImagesSelf-supervised Learning for Astronomical Images
Self-supervised Learning for Astronomical Images
 
dissertation_proposal_presentation.pdf
dissertation_proposal_presentation.pdfdissertation_proposal_presentation.pdf
dissertation_proposal_presentation.pdf
 
Review_Sp23.pdf
Review_Sp23.pdfReview_Sp23.pdf
Review_Sp23.pdf
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
 
Sp18_P2.pptx
Sp18_P2.pptxSp18_P2.pptx
Sp18_P2.pptx
 
Sp18_P1.pptx
Sp18_P1.pptxSp18_P1.pptx
Sp18_P1.pptx
 
Sp20_P1.pptx
Sp20_P1.pptxSp20_P1.pptx
Sp20_P1.pptx
 
Sp19_P2.pptx
Sp19_P2.pptxSp19_P2.pptx
Sp19_P2.pptx
 
Sp19_P1.pptx
Sp19_P1.pptxSp19_P1.pptx
Sp19_P1.pptx
 
Fa19_P2.pptx
Fa19_P2.pptxFa19_P2.pptx
Fa19_P2.pptx
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
 
STAN_MS_PPT.pptx
STAN_MS_PPT.pptxSTAN_MS_PPT.pptx
STAN_MS_PPT.pptx
 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

Fa18_P2.pptx

  • 1. Department of Electrical Engineering University of Arkansas Density-Based Spatial Clustering Md Abul Hayat mahayat@uark.edu
  • 2. OUTLINE • Introduction • k-means clustering • DBSCAN – Definitions – Advantages – Limitations • OPTICS – Definitions – Advantages – Limitations
  • 3. K-means Clustering – An Unsupervised approach for partitioning a data set into K distinct, non- overlapping clusters. [Lloyd, 1982] – We must first specify the desired number of clusters ‘K’. – Then the K-means algorithm will assign each observation to exactly one of the K clusters. – The optimization problem that defines K-means clustering, – The problem is computationally NP –hard.
  • 4. K-means : Algorithm • Lloyd’s Algorithm – Mathematically, this is partitioning the observations according to the Voronoi diagram generated by the means.
  • 6. Problems with K-means – K-means partition the space in Voronoi cells and they are convex in nature. – Thus k-means does not perform good when we have non-convex clusters – We have to provide the number of clusters beforehand. – Sometimes, we want to find out the intrinsic number of clusters within the dataset. – No way of handling noise separately.
  • 7. Problems with K-means • Non-convex Clusters • When we do not know the number of clusters. • To solve these issues, density based clustering was introduced.
  • 8. DBSCAN • Density-Based Spatial Clustering of Applications with Noise (DBSCAN) • Inventors: Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. • Paper : “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise” • Presented at the International Conference of Knowledge Discovery and Data Mining (KDD) in 1996. KDD is a SIG of ACM. • Citations: 13,293 (till 11/04/2018) • The ‘2014 Test of Time’ award recognized DBSCAN as an influential contributions to SIGKDD that have withstood the test of time. • This is an unsupervised algorithm.
  • 9. Definitions – The shape of a neighborhood is determined by the choice of a distance measure between two points p and q, denoted by d(p,q). – For instance, when using the Manhattan distance in 2D space, the shape of the neighborhood is rectangular. – For the purpose of proper visualization, all examples will be in 2D space using the Euclidean distance.
  • 10. Distance Measures – If we have two points – Minkowski Distance:
  • 14. Definitions • Introduction • [ Link: Funny Visualization ]
  • 16. DCSCAN : Examples • Resistant to Noise (unlike k-means) • Can handle clusters of different shapes and sizes Original Data After DBSCAN
  • 17. DBSCAN Limitations • Introduction (MinPts=4, Eps=9.92). (MinPts=4, Eps=9.75) Original Data • Cannot handle varying densities. • Sensitive to parameter selection.
  • 18. Heuristics for Choosing DBSCAN Parameters – Let d be the distance of a point p to its k-th nearest neighbor, then the d- neighborhood of p contains exactly k+1 points for almost all points p. – For a given k we define a function k-dist (= d) from the database D to the real numbers, mapping each point to the distance from its k-th nearest neighbor. – When sorting the points of the database in descending order of their k-dist values, the graph of this function gives some hints concerning the density distribution in the database. – If we choose an arbitrary point p, set the parameter Eps to k-dist(p) and set the parameter MinPts to k, all points with an equal or smaller k-dist value will be core points. – All points with a higher k-dist value ( left of the threshold) are considered to be noise, all other points (right of the threshold) are assigned to some cluster
  • 19. DBSCAN : Parameter Selection – The easier-to-set parameter of DBSCAN is the minPts parameter. – Sander et al. suggest setting it to twice the dataset dimensionality, i.e., minPts = 2 · dim. – Ester et al. provide a heuristic for choosing the ε parameter based on the distance to the fourth nearest neighbor (for two/dimensional data). – In Generalized DBSCAN, Sander et al. suggested using the (2 · dim - 1) nearest neighbors and minPts = 2 · dim
  • 20. OPTICS • Ordering Points To Identify the Clustering Structure (OPTICS) – Inventors: (1999) Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander – Paper: “OPTICS: Ordering Points To Identify the Clustering Structure” – OPTICS requires the same ε and minPts parameters as DBSCAN, however, the ε parameter is theoretically unnecessary and is only used for the practical purpose of reducing the runtime complexity of the algorithm. – While DBSCAN may be thought of as a clustering algorithm, searching for natural groups in data, OPTICS is an augmented ordering algorithm. – In OPTICS, we have to introduce two more definitions. – Here, we just fix the minPts parameter and we can get the insight of the underlying clusters using a plot called ‘Reachability Plot’.
  • 21. OPTICS : Definitions • Introduction
  • 22. OPTICS : Definitions • Introduction ε = Generating Distance ε’ = Core Distance
  • 24. Reachability Graph : Toy Example
  • 25. Reachability Graph : Toy Example • Introduction
  • 26. Reachability Graph : Toy Example • Introduction
  • 27. Reachability Graph : Toy Example • Introduction
  • 28. Reachability Graph : Toy Example • Introduction
  • 29. Reachability Graph : Toy Example • Introduction
  • 30. Reachability Graph : Toy Example • Introduction
  • 31. R Package & Examples • dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms – Published: May 19, 2018 – From the order discovered by OPTICS, two ways to group points into clusters was discussed
  • 36. Conclusion • Reachability plots are helpful to determine the number of clusters. • Can be applied to find clusters in high dim-data (like image). • DBSCAN and OPTICS, both are unsupervised techniques.