SlideShare a Scribd company logo
Department of Electrical Engineering
University of Arkansas
Density-Based Spatial Clustering
Md Abul Hayat
mahayat@uark.edu
OUTLINE
• Introduction
• k-means clustering
• DBSCAN
– Definitions
– Advantages
– Limitations
• OPTICS
– Definitions
– Advantages
– Limitations
K-means Clustering
– An Unsupervised approach for partitioning a data set into K distinct, non-
overlapping clusters. [Lloyd, 1982]
– We must first specify the desired number of clusters ‘K’.
– Then the K-means algorithm will assign each observation to exactly one of the
K clusters.
– The optimization problem that defines K-means clustering,
– The problem is computationally NP –hard.
K-means : Algorithm
• Lloyd’s Algorithm
– Mathematically, this is partitioning the observations according to
the Voronoi diagram generated by the means.
How Lloyd’s Algorithm Work
Problems with K-means
– K-means partition the space in
Voronoi cells and they are convex
in nature.
– Thus k-means does not perform
good when we have non-convex
clusters
– We have to provide the number of
clusters beforehand.
– Sometimes, we want to find out
the intrinsic number of clusters
within the dataset.
– No way of handling noise
separately.
Problems with K-means
• Non-convex Clusters
• When we do not know the number of clusters.
• To solve these issues, density based clustering was introduced.
DBSCAN
• Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
• Inventors:
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu.
• Paper : “A Density-Based Algorithm for Discovering Clusters in Large Spatial
Databases with Noise”
• Presented at the International Conference of Knowledge Discovery and Data
Mining (KDD) in 1996. KDD is a SIG of ACM.
• Citations: 13,293 (till 11/04/2018)
• The ‘2014 Test of Time’ award recognized DBSCAN as an influential
contributions to SIGKDD that have withstood the test of time.
• This is an unsupervised algorithm.
Definitions
– The shape of a neighborhood is determined by the choice of a distance
measure between two points p and q, denoted by d(p,q).
– For instance, when using the Manhattan distance in 2D space, the shape of
the neighborhood is rectangular.
– For the purpose of proper visualization, all examples will be in 2D space
using the Euclidean distance.
Distance Measures
– If we have two points
– Minkowski Distance:
Definitions
•
Definitions
• Introduction
Definitions
Definitions
• Introduction
• [ Link: Funny Visualization ]
DBSCAN: Algorithm
• Introduction
DCSCAN : Examples
• Resistant to Noise (unlike k-means)
• Can handle clusters of different shapes and sizes
Original Data After DBSCAN
DBSCAN Limitations
• Introduction
(MinPts=4, Eps=9.92).
(MinPts=4, Eps=9.75)
Original Data
• Cannot handle varying densities.
• Sensitive to parameter selection.
Heuristics for Choosing DBSCAN Parameters
– Let d be the distance of a point p to its k-th nearest neighbor, then the d-
neighborhood of p contains exactly k+1 points for almost all points p.
– For a given k we define a function k-dist (= d) from the database D to the
real numbers, mapping each point to the distance from its k-th nearest
neighbor.
– When sorting the points of the database in descending order of their k-dist
values, the graph of this function gives some hints concerning the density
distribution in the database.
– If we choose an arbitrary point p, set the parameter Eps to k-dist(p) and set
the parameter MinPts to k, all points with an equal or smaller k-dist value
will be core points.
– All points with a higher k-dist value ( left of the threshold) are
considered to be noise, all other points (right of the threshold) are assigned
to some cluster
DBSCAN : Parameter Selection
– The easier-to-set parameter of DBSCAN is the minPts parameter.
– Sander et al. suggest setting it to twice the dataset dimensionality, i.e.,
minPts = 2 · dim.
– Ester et al. provide a heuristic for choosing the ε parameter based on the
distance to the fourth nearest neighbor (for two/dimensional data).
– In Generalized DBSCAN, Sander et al. suggested using the (2 · dim - 1)
nearest neighbors and minPts = 2 · dim
OPTICS
• Ordering Points To Identify the Clustering Structure (OPTICS)
– Inventors: (1999)
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander
– Paper: “OPTICS: Ordering Points To Identify the Clustering Structure”
– OPTICS requires the same ε and minPts parameters as DBSCAN,
however, the ε parameter is theoretically unnecessary and is only used for
the practical purpose of reducing the runtime complexity of the algorithm.
– While DBSCAN may be thought of as a clustering algorithm, searching
for natural groups in data, OPTICS is an augmented ordering algorithm.
– In OPTICS, we have to introduce two more definitions.
– Here, we just fix the minPts parameter and we can get the insight of the
underlying clusters using a plot called ‘Reachability Plot’.
OPTICS : Definitions
• Introduction
OPTICS : Definitions
• Introduction
ε = Generating Distance
ε’ = Core Distance
Reachability Graph
Reachability Graph : Toy Example
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
Reachability Graph : Toy Example
• Introduction
R Package & Examples
• dbscan: Density Based Clustering of Applications with Noise
(DBSCAN) and Related Algorithms
– Published: May 19, 2018
– From the order discovered by OPTICS, two ways to group points into
clusters was discussed
ξ-Cluster
ξ-Cluster
• Introduction
ξ-Cluster
ξ-Cluster
• Introduction
Conclusion
• Reachability plots are helpful to determine the number of clusters.
• Can be applied to find clusters in high dim-data (like image).
• DBSCAN and OPTICS, both are unsupervised techniques.
Questions?

More Related Content

Similar to Fa18_P2.pptx

Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
Dr. Syed Hassan Amin
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
Vivek Maskara
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2
Hyun Wong Choi
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
Abdullah al Mamun
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
sandeepsandy494692
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
talktoharry
 
Db Scan
Db ScanDb Scan
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
aftab alam
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
Editor IJCATR
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
MITS Gwalior
 
Answer key for pattern recognition and machine learning
Answer key for pattern recognition and machine learningAnswer key for pattern recognition and machine learning
Answer key for pattern recognition and machine learning
VijayAECE1
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
NithyananthSengottai
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Salah Amean
 
DBSCAN (1) (4).pptx
DBSCAN (1) (4).pptxDBSCAN (1) (4).pptx
DBSCAN (1) (4).pptx
ABINPMATHEW22020
 

Similar to Fa18_P2.pptx (20)

Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Db Scan
Db ScanDb Scan
Db Scan
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Answer key for pattern recognition and machine learning
Answer key for pattern recognition and machine learningAnswer key for pattern recognition and machine learning
Answer key for pattern recognition and machine learning
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
DBSCAN (1) (4).pptx
DBSCAN (1) (4).pptxDBSCAN (1) (4).pptx
DBSCAN (1) (4).pptx
 

More from Md Abul Hayat

Self-supervised Learning for Astronomical Images
Self-supervised Learning for Astronomical ImagesSelf-supervised Learning for Astronomical Images
Self-supervised Learning for Astronomical Images
Md Abul Hayat
 
dissertation_proposal_presentation.pdf
dissertation_proposal_presentation.pdfdissertation_proposal_presentation.pdf
dissertation_proposal_presentation.pdf
Md Abul Hayat
 
Review_Sp23.pdf
Review_Sp23.pdfReview_Sp23.pdf
Review_Sp23.pdf
Md Abul Hayat
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
Md Abul Hayat
 
Sp18_P2.pptx
Sp18_P2.pptxSp18_P2.pptx
Sp18_P2.pptx
Md Abul Hayat
 
Sp18_P1.pptx
Sp18_P1.pptxSp18_P1.pptx
Sp18_P1.pptx
Md Abul Hayat
 
Sp20_P1.pptx
Sp20_P1.pptxSp20_P1.pptx
Sp20_P1.pptx
Md Abul Hayat
 
Sp19_P2.pptx
Sp19_P2.pptxSp19_P2.pptx
Sp19_P2.pptx
Md Abul Hayat
 
Sp19_P1.pptx
Sp19_P1.pptxSp19_P1.pptx
Sp19_P1.pptx
Md Abul Hayat
 
Fa19_P2.pptx
Fa19_P2.pptxFa19_P2.pptx
Fa19_P2.pptx
Md Abul Hayat
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
Md Abul Hayat
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
Md Abul Hayat
 
STAN_MS_PPT.pptx
STAN_MS_PPT.pptxSTAN_MS_PPT.pptx
STAN_MS_PPT.pptx
Md Abul Hayat
 

More from Md Abul Hayat (13)

Self-supervised Learning for Astronomical Images
Self-supervised Learning for Astronomical ImagesSelf-supervised Learning for Astronomical Images
Self-supervised Learning for Astronomical Images
 
dissertation_proposal_presentation.pdf
dissertation_proposal_presentation.pdfdissertation_proposal_presentation.pdf
dissertation_proposal_presentation.pdf
 
Review_Sp23.pdf
Review_Sp23.pdfReview_Sp23.pdf
Review_Sp23.pdf
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
 
Sp18_P2.pptx
Sp18_P2.pptxSp18_P2.pptx
Sp18_P2.pptx
 
Sp18_P1.pptx
Sp18_P1.pptxSp18_P1.pptx
Sp18_P1.pptx
 
Sp20_P1.pptx
Sp20_P1.pptxSp20_P1.pptx
Sp20_P1.pptx
 
Sp19_P2.pptx
Sp19_P2.pptxSp19_P2.pptx
Sp19_P2.pptx
 
Sp19_P1.pptx
Sp19_P1.pptxSp19_P1.pptx
Sp19_P1.pptx
 
Fa19_P2.pptx
Fa19_P2.pptxFa19_P2.pptx
Fa19_P2.pptx
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
 
STAN_MS_PPT.pptx
STAN_MS_PPT.pptxSTAN_MS_PPT.pptx
STAN_MS_PPT.pptx
 

Recently uploaded

J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 

Fa18_P2.pptx

  • 1. Department of Electrical Engineering University of Arkansas Density-Based Spatial Clustering Md Abul Hayat mahayat@uark.edu
  • 2. OUTLINE • Introduction • k-means clustering • DBSCAN – Definitions – Advantages – Limitations • OPTICS – Definitions – Advantages – Limitations
  • 3. K-means Clustering – An Unsupervised approach for partitioning a data set into K distinct, non- overlapping clusters. [Lloyd, 1982] – We must first specify the desired number of clusters ‘K’. – Then the K-means algorithm will assign each observation to exactly one of the K clusters. – The optimization problem that defines K-means clustering, – The problem is computationally NP –hard.
  • 4. K-means : Algorithm • Lloyd’s Algorithm – Mathematically, this is partitioning the observations according to the Voronoi diagram generated by the means.
  • 6. Problems with K-means – K-means partition the space in Voronoi cells and they are convex in nature. – Thus k-means does not perform good when we have non-convex clusters – We have to provide the number of clusters beforehand. – Sometimes, we want to find out the intrinsic number of clusters within the dataset. – No way of handling noise separately.
  • 7. Problems with K-means • Non-convex Clusters • When we do not know the number of clusters. • To solve these issues, density based clustering was introduced.
  • 8. DBSCAN • Density-Based Spatial Clustering of Applications with Noise (DBSCAN) • Inventors: Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. • Paper : “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise” • Presented at the International Conference of Knowledge Discovery and Data Mining (KDD) in 1996. KDD is a SIG of ACM. • Citations: 13,293 (till 11/04/2018) • The ‘2014 Test of Time’ award recognized DBSCAN as an influential contributions to SIGKDD that have withstood the test of time. • This is an unsupervised algorithm.
  • 9. Definitions – The shape of a neighborhood is determined by the choice of a distance measure between two points p and q, denoted by d(p,q). – For instance, when using the Manhattan distance in 2D space, the shape of the neighborhood is rectangular. – For the purpose of proper visualization, all examples will be in 2D space using the Euclidean distance.
  • 10. Distance Measures – If we have two points – Minkowski Distance:
  • 14. Definitions • Introduction • [ Link: Funny Visualization ]
  • 16. DCSCAN : Examples • Resistant to Noise (unlike k-means) • Can handle clusters of different shapes and sizes Original Data After DBSCAN
  • 17. DBSCAN Limitations • Introduction (MinPts=4, Eps=9.92). (MinPts=4, Eps=9.75) Original Data • Cannot handle varying densities. • Sensitive to parameter selection.
  • 18. Heuristics for Choosing DBSCAN Parameters – Let d be the distance of a point p to its k-th nearest neighbor, then the d- neighborhood of p contains exactly k+1 points for almost all points p. – For a given k we define a function k-dist (= d) from the database D to the real numbers, mapping each point to the distance from its k-th nearest neighbor. – When sorting the points of the database in descending order of their k-dist values, the graph of this function gives some hints concerning the density distribution in the database. – If we choose an arbitrary point p, set the parameter Eps to k-dist(p) and set the parameter MinPts to k, all points with an equal or smaller k-dist value will be core points. – All points with a higher k-dist value ( left of the threshold) are considered to be noise, all other points (right of the threshold) are assigned to some cluster
  • 19. DBSCAN : Parameter Selection – The easier-to-set parameter of DBSCAN is the minPts parameter. – Sander et al. suggest setting it to twice the dataset dimensionality, i.e., minPts = 2 · dim. – Ester et al. provide a heuristic for choosing the ε parameter based on the distance to the fourth nearest neighbor (for two/dimensional data). – In Generalized DBSCAN, Sander et al. suggested using the (2 · dim - 1) nearest neighbors and minPts = 2 · dim
  • 20. OPTICS • Ordering Points To Identify the Clustering Structure (OPTICS) – Inventors: (1999) Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander – Paper: “OPTICS: Ordering Points To Identify the Clustering Structure” – OPTICS requires the same ε and minPts parameters as DBSCAN, however, the ε parameter is theoretically unnecessary and is only used for the practical purpose of reducing the runtime complexity of the algorithm. – While DBSCAN may be thought of as a clustering algorithm, searching for natural groups in data, OPTICS is an augmented ordering algorithm. – In OPTICS, we have to introduce two more definitions. – Here, we just fix the minPts parameter and we can get the insight of the underlying clusters using a plot called ‘Reachability Plot’.
  • 21. OPTICS : Definitions • Introduction
  • 22. OPTICS : Definitions • Introduction ε = Generating Distance ε’ = Core Distance
  • 24. Reachability Graph : Toy Example
  • 25. Reachability Graph : Toy Example • Introduction
  • 26. Reachability Graph : Toy Example • Introduction
  • 27. Reachability Graph : Toy Example • Introduction
  • 28. Reachability Graph : Toy Example • Introduction
  • 29. Reachability Graph : Toy Example • Introduction
  • 30. Reachability Graph : Toy Example • Introduction
  • 31. R Package & Examples • dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms – Published: May 19, 2018 – From the order discovered by OPTICS, two ways to group points into clusters was discussed
  • 36. Conclusion • Reachability plots are helpful to determine the number of clusters. • Can be applied to find clusters in high dim-data (like image). • DBSCAN and OPTICS, both are unsupervised techniques.