SlideShare a Scribd company logo
1 of 20
Nadar Saraswathi College of arts
and science, Theni
Density Based methods
Maximization outlier analysis
1
Department of CS & IT
Presented by
S.Vijayalakshmi I- Msc (IT)
Eick: Topics9---Clustering 2
Density based methods
 DBSCAN
 DENCLUE
2
Eick: Topics9---Clustering 2
3
Density-Based Clustering Methods
 Clustering based on density (local cluster criterion),
such as density-connected points or based on an
explicitly constructed density function
 Major features:
 Discover clusters of arbitrary shape
 Handle noise
 One scan
 Need density parameters
 Several interesting studies:
 DBSCAN: Ester, et al. (KDD’96)
 DENCLUE: Hinneburg & D. Keim (KDD’98/2006)
 OPTICS: Ankerst, et al (SIGMOD’99).
 CLIQUE: Agrawal, et al. (SIGMOD’98)
Eick: Topics9---Clustering 2
DBSCAN
(http://www2.cs.uh.edu/~ceick/7363/Papers/dbscan.pdf )
 DBSCAN is a density-based algorithm.
 Density = number of points within a specified radius r (Eps)
 A point is a core point if it has more than a specified number of
points (MinPts) within Eps
 These are points that are at the interior of a cluster
 A border point has fewer than MinPts within Eps, but is in the
neighborhood of a core point
 A noise point is any point that is not a core point or a border
point.
Eick: Topics9---Clustering 2
DBSCAN: Core, Border, and Noise Points
Eick: Topics9---Clustering 2
DBSCAN Algorithm (simplified view for teaching)
1. Create a graph whose nodes are the points to be clustered
2. For each core-point c create an edge from c to every point
p in the -neighborhood of c
3. Set N to the nodes of the graph;
4. If N does not contain any core points terminate
5. Pick a core point c in N
6. Let X be the set of nodes that can be reached from c by
going forward;
1. create a cluster containing X{c}
2. N=N/(X{c})
7. Continue with step 4
Remark: points that are not assigned to any cluster are outliers;
Eick: Topics9---Clustering 2
DBSCAN: Core, Border and Noise Points
Original Points Point types: core,
border and noise
Eps = 10, MinPts = 4
Eick: Topics9---Clustering 2
When DBSCAN Works Well
Original Points Clusters
• Resistant to Noise
• Can handle clusters of different shapes and sizes
Eick: Topics9---Clustering 2
When DBSCAN Does NOT Work Well
Original Points
(MinPts=4, Eps=9.75).
(MinPts=4, Eps=9.92)
• Varying densities
• High-dimensional data
Eick: Topics9---Clustering 2
DBSCAN: Determining EPS and MinPts
 Idea is that for points in a cluster, their kth nearest
neighbors are at roughly the same distance
 Noise points have the kth nearest neighbor at farther
distance
 So, plot sorted distance of every point to its kth nearest
neighbor
Non-Core-points
Core-points
Run K-means for Minp=4 and not fixed
Eick: Topics9---Clustering 2
 Time Complexity: O(n2)—for each point it has
to be determined if it is a core point, can be
reduced to O(n*log(n)) in lower dimensional
spaces by using efficient data structures (n is
the number of objects to be clustered);
 Space Complexity: O(n).
Complexity DBSCAN
Eick: Topics9---Clustering 2
 Good: can detect arbitrary shapes, not very
sensitive to noise, supports outlier detection,
complexity is kind of okay, beside K-means
the second most used clustering algorithm.
 Bad: does not work well in high-dimensional
datasets, parameter selection is tricky, has
problems of identifying clusters of varying
densities (SSN algorithm), density
estimation is kind of simplistic (does not
create a real density function, but rather a
graph of density-connected points)
Summary DBSCAN
Eick: Topics9---Clustering 2
DBSCAN Algorithm Revisited
 Eliminate noise points
 Perform clustering on the remaining points:
Skip!
Eick: Topics9---Clustering 2
14
DENCLUE
(http://www2.cs.uh.edu/~ceick/ML/Denclue2.pdf )
 DENsity-based CLUstEring by Hinneburg & Keim (KDD’98)
 Major features
 Solid mathematical foundation
 Good for data sets with large amounts of noise
 Allows a compact mathematical description of arbitrarily
shaped clusters in high-dimensional data sets
 Significant faster than existing algorithm (faster than
DBSCAN by a factor of up to 45) ????????
 But needs a large number of parameters
Eick: Topics9---Clustering 2
15
 Uses grid cells but only keeps information about grid cells that
do actually contain data points and manages these cells in a
tree-based access structure.
 Influence function: describes the impact of a data point within
its neighborhood.
 Overall density of the data space can be calculated as the sum
of the influence function of all data points.
 Clusters can be determined using hill climbing by identifying
density attractors; density attractors are local maximal of the
overall density function.
 Objects that are associated with the same density attractor
belong to the same cluster.
Denclue: Technical Essence
Eick: Topics9---Clustering 2
16
Gradient: The steepness of a slope
 Example



N
i
x
x
d
D
Gaussian
i
e
x
f 1
2
)
,
(
2
2
)
( 







N
i
x
x
d
i
i
D
Gaussian
i
e
x
x
x
x
f 1
2
)
,
(
2
2
)
(
)
,
( 
f x y e
Gaussian
d x y
( , )
( , )


2
2
2
Eick: Topics9---Clustering 2
17
Example: Density Computation
D={x1,x2,x3,x4}
fD
Gaussian(x)= influence(x,x1) + influence(x,x2) + influence(x,x3)
+ influence(x4)=0.04+0.06+0.08+0.6=0.78
x1
x2
x3
x4
x 0.6
0.08
0.06
0.04
y
Remark: the density value of y would be larger than the one for x
Eick: Topics9---Clustering 2
18
Density Attractor
Eick: Topics9---Clustering 2
19
Examples of DENCLUE Clusters
Eick: Topics9---Clustering 2
20
Basic Steps DENCLUE Algorithms
1. Determine density attractors
2. Associate data objects with density
attractors using hill climbing
3. Possibly, merge the initial clusters
further relying on a hierarchical
clustering approach (optional; not
covered in this lecture)

More Related Content

What's hot

Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Examplekailash shaw
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMvikas dhakane
 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network securitybabak danyal
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based ClusteringSSA KPI
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Communication primitives
Communication primitivesCommunication primitives
Communication primitivesStudent
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 

What's hot (20)

Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Example
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network security
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Communication primitives
Communication primitivesCommunication primitives
Communication primitives
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Nearest neighbor search
Nearest neighbor searchNearest neighbor search
Nearest neighbor search
 

Similar to Density based methods

3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptxNANDHINIS900805
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...Raed Aldahdooh
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberHouw Liong The
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster AnalysisSuman Mia
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATAcsandit
 
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...IJERA Editor
 
Cs221 lecture6-fall11
Cs221 lecture6-fall11Cs221 lecture6-fall11
Cs221 lecture6-fall11darwinrlo
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
 

Similar to Density based methods (20)

dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
Project PPT
Project PPTProject PPT
Project PPT
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & Kamber
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Clustering ppt
Clustering pptClustering ppt
Clustering ppt
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Db Scan
Db ScanDb Scan
Db Scan
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster Analysis
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATA
 
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
 
Cs221 lecture6-fall11
Cs221 lecture6-fall11Cs221 lecture6-fall11
Cs221 lecture6-fall11
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 

More from SVijaylakshmi

client server computing.pptx
client server computing.pptxclient server computing.pptx
client server computing.pptxSVijaylakshmi
 
small industries.pptx
small industries.pptxsmall industries.pptx
small industries.pptxSVijaylakshmi
 
pseudo Color Image.pptx
pseudo Color Image.pptxpseudo Color Image.pptx
pseudo Color Image.pptxSVijaylakshmi
 
real Time data analysis.pptx
real Time data analysis.pptxreal Time data analysis.pptx
real Time data analysis.pptxSVijaylakshmi
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSVijaylakshmi
 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computingSVijaylakshmi
 
Basic Traversal and Search Techniques
Basic Traversal and Search TechniquesBasic Traversal and Search Techniques
Basic Traversal and Search TechniquesSVijaylakshmi
 
Parallel language and compiler
Parallel language and compilerParallel language and compiler
Parallel language and compilerSVijaylakshmi
 
Basic Traversal and Search Techniques
Basic Traversal and Search TechniquesBasic Traversal and Search Techniques
Basic Traversal and Search TechniquesSVijaylakshmi
 

More from SVijaylakshmi (13)

client server computing.pptx
client server computing.pptxclient server computing.pptx
client server computing.pptx
 
small industries.pptx
small industries.pptxsmall industries.pptx
small industries.pptx
 
pseudo Color Image.pptx
pseudo Color Image.pptxpseudo Color Image.pptx
pseudo Color Image.pptx
 
hive.pptx
hive.pptxhive.pptx
hive.pptx
 
real Time data analysis.pptx
real Time data analysis.pptxreal Time data analysis.pptx
real Time data analysis.pptx
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computing
 
control structures
control structurescontrol structures
control structures
 
Network security
Network securityNetwork security
Network security
 
Swing components
Swing componentsSwing components
Swing components
 
Basic Traversal and Search Techniques
Basic Traversal and Search TechniquesBasic Traversal and Search Techniques
Basic Traversal and Search Techniques
 
Parallel language and compiler
Parallel language and compilerParallel language and compiler
Parallel language and compiler
 
Basic Traversal and Search Techniques
Basic Traversal and Search TechniquesBasic Traversal and Search Techniques
Basic Traversal and Search Techniques
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Density based methods

  • 1. Nadar Saraswathi College of arts and science, Theni Density Based methods Maximization outlier analysis 1 Department of CS & IT Presented by S.Vijayalakshmi I- Msc (IT)
  • 2. Eick: Topics9---Clustering 2 Density based methods  DBSCAN  DENCLUE 2
  • 3. Eick: Topics9---Clustering 2 3 Density-Based Clustering Methods  Clustering based on density (local cluster criterion), such as density-connected points or based on an explicitly constructed density function  Major features:  Discover clusters of arbitrary shape  Handle noise  One scan  Need density parameters  Several interesting studies:  DBSCAN: Ester, et al. (KDD’96)  DENCLUE: Hinneburg & D. Keim (KDD’98/2006)  OPTICS: Ankerst, et al (SIGMOD’99).  CLIQUE: Agrawal, et al. (SIGMOD’98)
  • 4. Eick: Topics9---Clustering 2 DBSCAN (http://www2.cs.uh.edu/~ceick/7363/Papers/dbscan.pdf )  DBSCAN is a density-based algorithm.  Density = number of points within a specified radius r (Eps)  A point is a core point if it has more than a specified number of points (MinPts) within Eps  These are points that are at the interior of a cluster  A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point  A noise point is any point that is not a core point or a border point.
  • 5. Eick: Topics9---Clustering 2 DBSCAN: Core, Border, and Noise Points
  • 6. Eick: Topics9---Clustering 2 DBSCAN Algorithm (simplified view for teaching) 1. Create a graph whose nodes are the points to be clustered 2. For each core-point c create an edge from c to every point p in the -neighborhood of c 3. Set N to the nodes of the graph; 4. If N does not contain any core points terminate 5. Pick a core point c in N 6. Let X be the set of nodes that can be reached from c by going forward; 1. create a cluster containing X{c} 2. N=N/(X{c}) 7. Continue with step 4 Remark: points that are not assigned to any cluster are outliers;
  • 7. Eick: Topics9---Clustering 2 DBSCAN: Core, Border and Noise Points Original Points Point types: core, border and noise Eps = 10, MinPts = 4
  • 8. Eick: Topics9---Clustering 2 When DBSCAN Works Well Original Points Clusters • Resistant to Noise • Can handle clusters of different shapes and sizes
  • 9. Eick: Topics9---Clustering 2 When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.75). (MinPts=4, Eps=9.92) • Varying densities • High-dimensional data
  • 10. Eick: Topics9---Clustering 2 DBSCAN: Determining EPS and MinPts  Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance  Noise points have the kth nearest neighbor at farther distance  So, plot sorted distance of every point to its kth nearest neighbor Non-Core-points Core-points Run K-means for Minp=4 and not fixed
  • 11. Eick: Topics9---Clustering 2  Time Complexity: O(n2)—for each point it has to be determined if it is a core point, can be reduced to O(n*log(n)) in lower dimensional spaces by using efficient data structures (n is the number of objects to be clustered);  Space Complexity: O(n). Complexity DBSCAN
  • 12. Eick: Topics9---Clustering 2  Good: can detect arbitrary shapes, not very sensitive to noise, supports outlier detection, complexity is kind of okay, beside K-means the second most used clustering algorithm.  Bad: does not work well in high-dimensional datasets, parameter selection is tricky, has problems of identifying clusters of varying densities (SSN algorithm), density estimation is kind of simplistic (does not create a real density function, but rather a graph of density-connected points) Summary DBSCAN
  • 13. Eick: Topics9---Clustering 2 DBSCAN Algorithm Revisited  Eliminate noise points  Perform clustering on the remaining points: Skip!
  • 14. Eick: Topics9---Clustering 2 14 DENCLUE (http://www2.cs.uh.edu/~ceick/ML/Denclue2.pdf )  DENsity-based CLUstEring by Hinneburg & Keim (KDD’98)  Major features  Solid mathematical foundation  Good for data sets with large amounts of noise  Allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets  Significant faster than existing algorithm (faster than DBSCAN by a factor of up to 45) ????????  But needs a large number of parameters
  • 15. Eick: Topics9---Clustering 2 15  Uses grid cells but only keeps information about grid cells that do actually contain data points and manages these cells in a tree-based access structure.  Influence function: describes the impact of a data point within its neighborhood.  Overall density of the data space can be calculated as the sum of the influence function of all data points.  Clusters can be determined using hill climbing by identifying density attractors; density attractors are local maximal of the overall density function.  Objects that are associated with the same density attractor belong to the same cluster. Denclue: Technical Essence
  • 16. Eick: Topics9---Clustering 2 16 Gradient: The steepness of a slope  Example    N i x x d D Gaussian i e x f 1 2 ) , ( 2 2 ) (         N i x x d i i D Gaussian i e x x x x f 1 2 ) , ( 2 2 ) ( ) , (  f x y e Gaussian d x y ( , ) ( , )   2 2 2
  • 17. Eick: Topics9---Clustering 2 17 Example: Density Computation D={x1,x2,x3,x4} fD Gaussian(x)= influence(x,x1) + influence(x,x2) + influence(x,x3) + influence(x4)=0.04+0.06+0.08+0.6=0.78 x1 x2 x3 x4 x 0.6 0.08 0.06 0.04 y Remark: the density value of y would be larger than the one for x
  • 20. Eick: Topics9---Clustering 2 20 Basic Steps DENCLUE Algorithms 1. Determine density attractors 2. Associate data objects with density attractors using hill climbing 3. Possibly, merge the initial clusters further relying on a hierarchical clustering approach (optional; not covered in this lecture)