SlideShare a Scribd company logo
Presenting by: Pushkar Kumar
Course: BCA 3rd Year Aft.
Presenting to: Ms. Rupali Pandey
• Introduction
• categorization of major clustering methods
• partitioning methods
• Hierarchical methods
• outlier analysis
Contents
Clustering
• It is basically a type of Unsupervised learning method;
• It is a method in which we draw reference from datasets consisting of
input data without labeled responses.
• Clustering is the task of dividing the population or data point into a
number of groups such that data point in the same groups are more
similar to other data points in the same group and dissimilar to the data
points in other groups.
•There are no criteria for a good clustering, It depends on the user, what
is the criteria they may use which satisfy their need.
Introduction
Drawbacks of Traditional Clustering Algorithms
 Favor Cluster approximating spherical shapes.
 Similar Size.
 poor at handling Outliers.
Methods of using Clustering
1. Centroid by finding dmean
dmean (Ca, Cb) = || Ma - Mb ||
2. All point approach by finding dmin.
dmin (Ca, Cb) = minimum(|| pa,i –pb,j ||)
Application of cluster analysis:
• It is widely used in many applications such as image processing, data
analysis, and pattern recognition.
• It can be used in the field of biology, by deriving animal and plant
taxonomies, Identifying genes with the same capabilities.
• It also helps in information discovery by classifying documents on the web.
• Clustering is used in outlier detection application such as detection of
credit card fraud.
• It also help in identification of areas of similar land use in an earth
observation database.
categorization of major clustering methods
Clustering methods can be classified into the following categories
 Partitioning Method
 Hierarchical Method
 Density-Based Method
 Grid-based Method
 Model-Based Method
 Constraints-Based Method
Partitioning Method
 These Methods partition the object into k cluster and each partition forms one
cluster.
• Each group has at least one Object, each object belonging to one group
• In this method starts with one big cluster and downward step by
step reaches the number of cluster wanted partitioning the existing
clusters.
• Then it uses the iterative relocation technique to improve the partitioning
by moving object from one group to other.
• There are many algorithms that come under partitioning methods
some the popular are: K-means, CLARANS(Clustering Large Application
based upon Randomized Search) etc.
K-Mean (A centroid based Technique)
• We are given a data set of items, with certain futures, and values
for these features (Like a vector).
• The tasks to categorize those items into groups. To achieve this,
we will use the k-Means algorithm.
• An unsupervised learning algorithm.
• The algorithms will categorize the items into k groups of
similarity.
• To calculate the similarity, we will use the Euclidean distance as
measurement.
The algorithm works as follows:
1. First we initialize k points, called means, randomly.
1. First we initialize k points, called means, randomly.
2. We categorize each item to its closest mean and we update the
mean’s coordinates, which are the averages of the items
categorized in that mean so far.
3. We repeat the process for a given number of iterations and at the
end, we have our clusters.
The “Points” mentioned above are called means, because they hold
the mean values of the items categorized in it.
Hierarchical Methods
 In this method starts with single point cluster and
upward step by step merge cluster until desired number of
cluster is reached.
• It is begins by treating every data point as a separate cluster.
• New cluster is formed using the previously formed one.
• It is divided into two category:
 Agglomerative (Bottom up approach)
 Divisive (Top down approach)
• Example: CURE (Clustering Using Representatives), BIRCH
(Balanced Iterative Reducing Clustering and using Hierarchies)
etc.
Basic Concept of CURE Algorithm
CURE(Clustering using Representatives)
 It is a hierarchical based clustering technique, that adopts a
middle ground between the centroid based and the all-point
techniques.
 It is used for identifying the spherical and non-spherical
clusters.
 Pre defined representatives points.
 Works with the outliers.
 Shrinking the cluster with the factor.
CURE Architecture
Random Sampling
 When all data set is considered as input of algorithm,
execution time could be high due to the I/O costs. So,
Random samples are considered as input of algorithm.
 Random sampling is fitted in main memory.
 Random samples are generated very fast.
 The overhead of generating random sample is very small
compared to the time for performing the clustering on the
sample.
Partitioning Sample
 Random samples are created.
 Partitioning helps to speed up the CURE algorithm.
 The steps followed are
 Partition the data point into different partitions.(n/p).
 The advantage of partitioning the input is to reduce the
execution time.
 Each n/p group of point fit in the main memory for increasing
performance of partial clustering.
Handling Outlier
 Random sampling filter out the majority of outliers.
 Outliers due to their larger distance from the points tend
to merge with other point, and grow slower.
 Number of outliers are less then clusters.
 So, first the clusters which are growing very slowly are
identified and eliminated.
 Second, at the end of growing process,, very small
cluster are eliminated.
Handling Outlier
Labeling Data on Disk
 The process of sampling the initial data set, exclude the
majority of data points. This data point must be assigned to
some cluster created in former phases.
Conclusion
 We have see that CURE can detect cluster with non-spherical
shape and wide variance in size using a set of representative
point for each cluster.
 CURE can also have a good execution time in presence of large
database using random sampling and partitioning methods.
 CURE works well when the database contains outliers. These
are detected and eliminated.

More Related Content

What's hot

Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
rajshreemuthiah
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
singh7599
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 
Clustering & classification
Clustering & classificationClustering & classification
Clustering & classification
Jamshed Khan
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
guru_prasadg
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Jewel Refran
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
Derek Kane
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
K means clustering
K means clusteringK means clustering
K means clustering
keshav goyal
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
DataminingTools Inc
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
hadifar
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
Megha Sharma
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
Rajib Kumar De
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
Krish_ver2
 
What is cluster analysis
What is cluster analysisWhat is cluster analysis
What is cluster analysis
Prabhat gangwar
 
Clustering
ClusteringClustering
Clustering
Rashmi Bhat
 

What's hot (20)

Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Clustering & classification
Clustering & classificationClustering & classification
Clustering & classification
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
What is cluster analysis
What is cluster analysisWhat is cluster analysis
What is cluster analysis
 
Clustering
ClusteringClustering
Clustering
 

Similar to Cluster analysis

UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
Data mining
Data miningData mining
Data mining
EmaSushan
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
Pulkit Chhabra
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
Lino Possamai
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita Dubey
Ankita Dubey
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
IJERA Editor
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
ijsrd.com
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
Binus Online Learning
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
 
Introduction to Clustering . pptx
Introduction    to     Clustering . pptxIntroduction    to     Clustering . pptx
Introduction to Clustering . pptx
Harsha Patel
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
Sudhakar Chavan
 
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxUnsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptx
jasontseng19
 
METHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptxMETHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptx
agniva pradhan
 
1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx
agniva pradhan
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptx
ssuser2e437f
 
47 292-298
47 292-29847 292-298
47 292-298
idescitation
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 

Similar to Cluster analysis (20)

UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Data mining
Data miningData mining
Data mining
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita Dubey
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Introduction to Clustering . pptx
Introduction    to     Clustering . pptxIntroduction    to     Clustering . pptx
Introduction to Clustering . pptx
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxUnsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptx
 
METHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptxMETHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptx
 
1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptx
 
47 292-298
47 292-29847 292-298
47 292-298
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 

Recently uploaded

原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 

Recently uploaded (20)

原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 

Cluster analysis

  • 1. Presenting by: Pushkar Kumar Course: BCA 3rd Year Aft. Presenting to: Ms. Rupali Pandey
  • 2. • Introduction • categorization of major clustering methods • partitioning methods • Hierarchical methods • outlier analysis Contents
  • 3. Clustering • It is basically a type of Unsupervised learning method; • It is a method in which we draw reference from datasets consisting of input data without labeled responses. • Clustering is the task of dividing the population or data point into a number of groups such that data point in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. •There are no criteria for a good clustering, It depends on the user, what is the criteria they may use which satisfy their need. Introduction
  • 4. Drawbacks of Traditional Clustering Algorithms  Favor Cluster approximating spherical shapes.  Similar Size.  poor at handling Outliers. Methods of using Clustering 1. Centroid by finding dmean dmean (Ca, Cb) = || Ma - Mb || 2. All point approach by finding dmin. dmin (Ca, Cb) = minimum(|| pa,i –pb,j ||)
  • 5. Application of cluster analysis: • It is widely used in many applications such as image processing, data analysis, and pattern recognition. • It can be used in the field of biology, by deriving animal and plant taxonomies, Identifying genes with the same capabilities. • It also helps in information discovery by classifying documents on the web. • Clustering is used in outlier detection application such as detection of credit card fraud. • It also help in identification of areas of similar land use in an earth observation database.
  • 6. categorization of major clustering methods Clustering methods can be classified into the following categories  Partitioning Method  Hierarchical Method  Density-Based Method  Grid-based Method  Model-Based Method  Constraints-Based Method
  • 7. Partitioning Method  These Methods partition the object into k cluster and each partition forms one cluster. • Each group has at least one Object, each object belonging to one group • In this method starts with one big cluster and downward step by step reaches the number of cluster wanted partitioning the existing clusters. • Then it uses the iterative relocation technique to improve the partitioning by moving object from one group to other. • There are many algorithms that come under partitioning methods some the popular are: K-means, CLARANS(Clustering Large Application based upon Randomized Search) etc.
  • 8. K-Mean (A centroid based Technique) • We are given a data set of items, with certain futures, and values for these features (Like a vector). • The tasks to categorize those items into groups. To achieve this, we will use the k-Means algorithm. • An unsupervised learning algorithm. • The algorithms will categorize the items into k groups of similarity. • To calculate the similarity, we will use the Euclidean distance as measurement. The algorithm works as follows: 1. First we initialize k points, called means, randomly.
  • 9. 1. First we initialize k points, called means, randomly. 2. We categorize each item to its closest mean and we update the mean’s coordinates, which are the averages of the items categorized in that mean so far. 3. We repeat the process for a given number of iterations and at the end, we have our clusters. The “Points” mentioned above are called means, because they hold the mean values of the items categorized in it.
  • 10. Hierarchical Methods  In this method starts with single point cluster and upward step by step merge cluster until desired number of cluster is reached. • It is begins by treating every data point as a separate cluster. • New cluster is formed using the previously formed one. • It is divided into two category:  Agglomerative (Bottom up approach)  Divisive (Top down approach) • Example: CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies) etc.
  • 11. Basic Concept of CURE Algorithm CURE(Clustering using Representatives)  It is a hierarchical based clustering technique, that adopts a middle ground between the centroid based and the all-point techniques.  It is used for identifying the spherical and non-spherical clusters.  Pre defined representatives points.  Works with the outliers.  Shrinking the cluster with the factor.
  • 13. Random Sampling  When all data set is considered as input of algorithm, execution time could be high due to the I/O costs. So, Random samples are considered as input of algorithm.  Random sampling is fitted in main memory.  Random samples are generated very fast.  The overhead of generating random sample is very small compared to the time for performing the clustering on the sample.
  • 14. Partitioning Sample  Random samples are created.  Partitioning helps to speed up the CURE algorithm.  The steps followed are  Partition the data point into different partitions.(n/p).  The advantage of partitioning the input is to reduce the execution time.  Each n/p group of point fit in the main memory for increasing performance of partial clustering.
  • 15. Handling Outlier  Random sampling filter out the majority of outliers.  Outliers due to their larger distance from the points tend to merge with other point, and grow slower.  Number of outliers are less then clusters.  So, first the clusters which are growing very slowly are identified and eliminated.  Second, at the end of growing process,, very small cluster are eliminated.
  • 17. Labeling Data on Disk  The process of sampling the initial data set, exclude the majority of data points. This data point must be assigned to some cluster created in former phases. Conclusion  We have see that CURE can detect cluster with non-spherical shape and wide variance in size using a set of representative point for each cluster.  CURE can also have a good execution time in presence of large database using random sampling and partitioning methods.  CURE works well when the database contains outliers. These are detected and eliminated.