SlideShare a Scribd company logo
 Atta Ul Mustafa
 Armgan Ali
 Ali raza
 Atif Ali
 Abdul Rehman
 (4425)
 4424)
 (4427)
 (4407)
 (4403)
 Introduction
 Clustering
 Why Clustering?
 Types of clustering
 Methods of clustering
 Applications of clustering
 Clustering is the process of making a group of
abstract objects into classes of similar objects.
Points to Remember
 A cluster of data objects can be treated as one
group.
 While doing cluster analysis, we first partition the
set of data into groups based on data similarity
and then assign the labels to the groups.
 The main advantage of clustering over
classification is that, it is adaptable to changes
and helps single out useful features that
distinguish different groups.
 High dimensionality - The clustering
algorithm should not only be able to handle
low- dimensional data but also the high
dimensional space.
 Ability to deal with noisy data - Databases
contain noisy, missing or erroneous data.
Some algorithms are sensitive to such data
and may lead to poor quality clusters.
 Interpretability - The clustering results
should be interpretable, comprehensible and
usable.
 Scalability - We need highly scalable
clustering algorithms to deal with large
databases.
 Ability to deal with different kind of
attributes Algorithms should be capable to be
applied on any kind of data such as interval
based (numerical) data, categorical, binary
data.
 Discovery of clusters with attribute shape -
The clustering algorithm should be capable of
detect cluster of arbitrary shape. It should not
be bounded to only distance measures that
tend to find spherical cluster of small size.
 Clustering can be divided into different categories
based on different criteria
 1.Hard clustering: A given data point in n-
dimensional space only belongs to one cluster. This
is also known as exclusive clustering. The K-Means
clustering mechanism is an example of hard
clustering.
 2.Soft clustering: A given data point can belong to
more than one cluster in soft clustering. This is also
known as overlapping clustering. The Fuzzy K-Means
algorithm is a good example of soft clustering.
 3.Hierarchial clustering: In hierarchical clustering, a
hierarchy of clusters is built using the top-down
(divisive) or bottom-up (agglomerative) approach.
 4. Flat clustering: Is a simple technique
where no hierarchy is present.
 5.Model-based clustering: In model-based
clustering, data is modeled using a standard
statistical model to work with different
distributions. The idea is to find a model that
best fits the data.
 Clustering analysis is broadly used in many applications
such as market research, pattern recognition, data
analysis, and image processing.
 Clustering can also help marketers discover distinct
groups in their customer base. And they can characterize
their customer groups based on the purchasing patterns.
 In the field of biology, it can be used to derive plant and
animal taxonomies, categorize genes with similar
functionalities and gain insight into structures inherent to
populations.
 Clustering also helps in identification of areas of similar
land use in an earth observation database. It also helps in
the identification of groups of houses in a city according
to house type, value, and geographic location.
 Clustering also helps in classifying
documents on the web for information
discovery.
 Clustering is also used in outlier detection
applications such as detection of credit card
fraud.
 As a data mining function, cluster analysis
serves as a tool to gain insight into the
distribution of data to observe characteristics
of each cluster.
 The following points throw light on why
clustering is required in data mining −
 Scalability − We need highly scalable clustering
algorithms to deal with large databases.
 Ability to deal with different kinds of attributes −
Algorithms should be capable to be applied on
any kind of data such as interval-based
(numerical) data, categorical, and binary data.
 Discovery of clusters with attribute shape − The
clustering algorithm should be capable of
detecting clusters of arbitrary shape. They should
not be bounded to only distance measures that
tend to find spherical cluster of small sizes.
 High dimensionality − The clustering
algorithm should not only be able to handle
low-dimensional data but also the high
dimensional space.
 Ability to deal with noisy data − Databases
contain noisy, missing or erroneous data.
Some algorithms are sensitive to such data
and may lead to poor quality clusters.
 Interpretability − The clustering results
should be interpretable, comprehensible, and
usable.
Clustering methods can be classified into the
following categories −
 Partitioning Method
 Hierarchical Method
 Density-based Method
 Grid-Based Method
 Model-Based Method
 Constraint-based Method
 Suppose we are given a database of ‘n’ objects and
the partitioning method constructs ‘k’ partition of
data. Each partition will represent a cluster and k ≤ n.
It means that it will classify the data into k groups,
which satisfy the following requirements −
 Each group contains at least one object.
 Each object must belong to exactly one group.
 Points to remember −
 For a given number of partitions (say k), the
partitioning method will create an initial partitioning.
 Then it uses the iterative relocation technique to
improve the partitioning by moving objects from one
group to other.
 This method creates a hierarchical
decomposition of the given set of data
objects. We can classify hierarchical methods
on the basis of how the hierarchical
decomposition is formed. There are two
approaches here −
 Agglomerative Approach
 Divisive Approach
Agglomerative Approach
 This approach is also known as the bottom-up
approach. In this, we start with each object forming a
separate group. It keeps on merging the objects or
groups that are close to one another. It keep on
doing so until all of the groups are merged into one
or until the termination condition holds.
Divisive Approach
 This approach is also known as the top-down
approach. In this, we start with all of the objects in
the same cluster. In the continuous iteration, a
cluster is split up into smaller clusters. It is down
until each object in one cluster or the termination
condition holds. This method is rigid, i.e., once a
merging or splitting is done, it can never be undone.
 This method is based on the notion of
density. The basic idea is to continue growing
the given cluster as long as the density in the
neighborhood exceeds some threshold, i.e.,
for each data point within a given cluster, the
radius of a given cluster has to contain at
least a minimum number of points.
 In this, the objects together form a grid. The
object space is quantized into finite number
of cells that form a grid structure.
Advantages
 The major advantage of this method is fast
processing time.
 It is dependent only on the number of cells in
each dimension in the quantized space.
 In this method, a model is hypothesized for
each cluster to find the best fit of data for a
given model. This method locates the clusters
by clustering the density function. It reflects
spatial distribution of the data points.
 This method also provides a way to
automatically determine the number of
clusters based on standard statistics, taking
outlier or noise into account. It therefore
yields robust clustering methods.
 In this method, the clustering is performed by
the incorporation of user or application-
oriented constraints. A constraint refers to
the user expectation or the properties of
desired clustering results. Constraints
provide us with an interactive way of
communication with the clustering process.
Constraints can be specified by the user or
the application requirement.
Clustering in data Mining (Data Mining)

More Related Content

What's hot

Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
Valerii Klymchuk
 
data mining
data miningdata mining
data mining
manasa polu
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
Houw Liong The
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
rajshreemuthiah
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph mining
Krish_ver2
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Kamalakshi Deshmukh-Samag
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Data clustring
Data clustring Data clustring
Data clustring
Salman Memon
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
Khwaja Aamer
 

What's hot (20)

Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
data mining
data miningdata mining
data mining
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph mining
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data clustring
Data clustring Data clustring
Data clustring
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 

Similar to Clustering in data Mining (Data Mining)

Data mining
Data miningData mining
Data mining
EmaSushan
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
igeabroad
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
IRJET Journal
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
DrGnaneswariG
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
Datamining Tools
 
Mastering Hierarchical Clustering: A Comprehensive Guide
Mastering Hierarchical Clustering: A Comprehensive GuideMastering Hierarchical Clustering: A Comprehensive Guide
Mastering Hierarchical Clustering: A Comprehensive Guide
cyberprosocial
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Pushkar Mishra
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
IJRAT
 
METHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptxMETHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptx
agniva pradhan
 
1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx
agniva pradhan
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
 
A0360109
A0360109A0360109
A0360109
iosrjournals
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
Pulkit Chhabra
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
IOSR Journals
 
Clustering - K-Means, DBSCAN
Clustering - K-Means, DBSCANClustering - K-Means, DBSCAN
Clustering - K-Means, DBSCAN
Medicaps University
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
Pratik Meshram
 
Clustering
ClusteringClustering
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
YaseenRashid4
 

Similar to Clustering in data Mining (Data Mining) (20)

Data mining
Data miningData mining
Data mining
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Mastering Hierarchical Clustering: A Comprehensive Guide
Mastering Hierarchical Clustering: A Comprehensive GuideMastering Hierarchical Clustering: A Comprehensive Guide
Mastering Hierarchical Clustering: A Comprehensive Guide
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
METHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptxMETHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptx
 
1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
A0360109
A0360109A0360109
A0360109
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
Clustering - K-Means, DBSCAN
Clustering - K-Means, DBSCANClustering - K-Means, DBSCAN
Clustering - K-Means, DBSCAN
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
Clustering
ClusteringClustering
Clustering
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 

Clustering in data Mining (Data Mining)

  • 1.
  • 2.  Atta Ul Mustafa  Armgan Ali  Ali raza  Atif Ali  Abdul Rehman  (4425)  4424)  (4427)  (4407)  (4403)
  • 3.  Introduction  Clustering  Why Clustering?  Types of clustering  Methods of clustering  Applications of clustering
  • 4.  Clustering is the process of making a group of abstract objects into classes of similar objects. Points to Remember  A cluster of data objects can be treated as one group.  While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups.  The main advantage of clustering over classification is that, it is adaptable to changes and helps single out useful features that distinguish different groups.
  • 5.
  • 6.  High dimensionality - The clustering algorithm should not only be able to handle low- dimensional data but also the high dimensional space.  Ability to deal with noisy data - Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.  Interpretability - The clustering results should be interpretable, comprehensible and usable.
  • 7.  Scalability - We need highly scalable clustering algorithms to deal with large databases.  Ability to deal with different kind of attributes Algorithms should be capable to be applied on any kind of data such as interval based (numerical) data, categorical, binary data.  Discovery of clusters with attribute shape - The clustering algorithm should be capable of detect cluster of arbitrary shape. It should not be bounded to only distance measures that tend to find spherical cluster of small size.
  • 8.  Clustering can be divided into different categories based on different criteria  1.Hard clustering: A given data point in n- dimensional space only belongs to one cluster. This is also known as exclusive clustering. The K-Means clustering mechanism is an example of hard clustering.  2.Soft clustering: A given data point can belong to more than one cluster in soft clustering. This is also known as overlapping clustering. The Fuzzy K-Means algorithm is a good example of soft clustering.  3.Hierarchial clustering: In hierarchical clustering, a hierarchy of clusters is built using the top-down (divisive) or bottom-up (agglomerative) approach.
  • 9.  4. Flat clustering: Is a simple technique where no hierarchy is present.  5.Model-based clustering: In model-based clustering, data is modeled using a standard statistical model to work with different distributions. The idea is to find a model that best fits the data.
  • 10.  Clustering analysis is broadly used in many applications such as market research, pattern recognition, data analysis, and image processing.  Clustering can also help marketers discover distinct groups in their customer base. And they can characterize their customer groups based on the purchasing patterns.  In the field of biology, it can be used to derive plant and animal taxonomies, categorize genes with similar functionalities and gain insight into structures inherent to populations.  Clustering also helps in identification of areas of similar land use in an earth observation database. It also helps in the identification of groups of houses in a city according to house type, value, and geographic location.
  • 11.  Clustering also helps in classifying documents on the web for information discovery.  Clustering is also used in outlier detection applications such as detection of credit card fraud.  As a data mining function, cluster analysis serves as a tool to gain insight into the distribution of data to observe characteristics of each cluster.
  • 12.  The following points throw light on why clustering is required in data mining −  Scalability − We need highly scalable clustering algorithms to deal with large databases.  Ability to deal with different kinds of attributes − Algorithms should be capable to be applied on any kind of data such as interval-based (numerical) data, categorical, and binary data.  Discovery of clusters with attribute shape − The clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find spherical cluster of small sizes.
  • 13.  High dimensionality − The clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space.  Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.  Interpretability − The clustering results should be interpretable, comprehensible, and usable.
  • 14. Clustering methods can be classified into the following categories −  Partitioning Method  Hierarchical Method  Density-based Method  Grid-Based Method  Model-Based Method  Constraint-based Method
  • 15.  Suppose we are given a database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data. Each partition will represent a cluster and k ≤ n. It means that it will classify the data into k groups, which satisfy the following requirements −  Each group contains at least one object.  Each object must belong to exactly one group.  Points to remember −  For a given number of partitions (say k), the partitioning method will create an initial partitioning.  Then it uses the iterative relocation technique to improve the partitioning by moving objects from one group to other.
  • 16.  This method creates a hierarchical decomposition of the given set of data objects. We can classify hierarchical methods on the basis of how the hierarchical decomposition is formed. There are two approaches here −  Agglomerative Approach  Divisive Approach
  • 17. Agglomerative Approach  This approach is also known as the bottom-up approach. In this, we start with each object forming a separate group. It keeps on merging the objects or groups that are close to one another. It keep on doing so until all of the groups are merged into one or until the termination condition holds. Divisive Approach  This approach is also known as the top-down approach. In this, we start with all of the objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone.
  • 18.  This method is based on the notion of density. The basic idea is to continue growing the given cluster as long as the density in the neighborhood exceeds some threshold, i.e., for each data point within a given cluster, the radius of a given cluster has to contain at least a minimum number of points.
  • 19.  In this, the objects together form a grid. The object space is quantized into finite number of cells that form a grid structure. Advantages  The major advantage of this method is fast processing time.  It is dependent only on the number of cells in each dimension in the quantized space.
  • 20.  In this method, a model is hypothesized for each cluster to find the best fit of data for a given model. This method locates the clusters by clustering the density function. It reflects spatial distribution of the data points.  This method also provides a way to automatically determine the number of clusters based on standard statistics, taking outlier or noise into account. It therefore yields robust clustering methods.
  • 21.  In this method, the clustering is performed by the incorporation of user or application- oriented constraints. A constraint refers to the user expectation or the properties of desired clustering results. Constraints provide us with an interactive way of communication with the clustering process. Constraints can be specified by the user or the application requirement.