SlideShare a Scribd company logo
Clustering Techniques
Visit: Learnbay.co
Agenda
R Setup
Clustering Overview
Hierarchical Clustering
Non-Hierarchical Clustering
Visit: Learnbay.co
Python Setup for Clustering
3
▪ Python 3.2.3 or higher version should beinstalled
▪ Following Libraries are installed. Check by running thebelow command
## it is okay if you get Warning Message, but you should not get Error Message
sklearn
seaborn
matplotlib
Visit: Learnbay.co
Clustering Overview
Visit: Learnbay.co
Intro: Data – Information – Knowledge – Wisdom(DIKW)
5
Where is the Life… we have lost inLiving?
Where is the wisdom… we have lost in Knowledge?
Where is the Knowledge…we have lost in Information?
- T.S. Eliot
▪ Where is the Information… we have lost inData?
In order to go from
Data to Information to Knowledge and toWisdom
We need to simplifyData
Visit: Learnbay.co
How do we simplifydata
▪ Clustering /Segmentation
– Casereduction
– Reduce the number of records by identifying similar groupsand
representing them as acluster
▪ Simplify data equals to eliminating datacomplexity
– too many variables to a fewvariables
– too many records to a fewrecords
▪ Dimension Reduction
– Reduce number of variable by using techniques like Principal Component
Analysis, Collinearity Check, Business Rule Based,etc
Our focusarea
inthis
6
presentation
Visit: Learnbay.co
Clustering
▪ Clustering is a technique for finding similar groups in
data, called clusters.
▪ It groups data instances that are similar to (near) each
other in one cluster and data instances that are very
different (far away) from each other into differentclusters.
▪ A cluster is therefore a collection of objects whichare
“similar” between them and are “dissimilar” to the
objects belonging to other clusters
▪ How do we define “Similar” in clustering?
Note: Clustering is an unsupervised learningtechnique
7
Visit: Learnbay.co
Brand ConsciousCustomers
8
Price
Consci
ous
High
High
Visit: Learnbay.co
Simple Clustering e.g.
▪ From Scatter Plot we can
see that
– A & D form one segment of
customers who are veryhigh
on Brand Loyalty
– B, C, & E is another segment
which is very Price Conscious
Individuals PriceConscious BrandLoyalty
A 2 4
B 8 2
C 9 3
D 1 5
E 8 1
2 B
1 E
0
3 C
4 A
5 D
6
0 2 8 10
BrandLoyalty
4 6
PriceConscious
Individuals
Note: The above e.g. is just an hypotheticaldata to introducethe subject of clustering.It is not relatedto the below url:
http://globalbizresearch.org/chennai_conference/pdf/pdf/ID_C405_Formatted.pdf
9
Visit: Learnbay.co
Steps involved in Clustering Analysis
Formulate the problem – Select variables to be used for clustering
Decide the Clustering Procedure (Hierarchical / Non-Hierarchical)
Select the measure of similarity(dis-similarity)
Decide on the number ofclusters
Interpret the cluster output (Profile theclusters)
Validate the clusters
Choose clustering algorithm (applicable inhierarchicalclustering)
10
Visit: Learnbay.co
Formulate the clustering problem
▪ Formulate the problem
– understand the business problem
– hypothesize variables that will help solve the clustering
problem at hand
▪ Applications of Clustering Technique
– Store Clustering
– Customer Clustering
– Village AffluencyCategorization
Note: It is preferable to do Factor Analysis / Principal Component Analysis before clustering
11
Visit: Learnbay.co
Decide the clustering procedure
▪ Types of Clustering Procedures
– Hierarchical Clustering
– Non-Hierarchical Clustering
▪ Hierarchical clustering is
characterized by a tree likestructure
and uses distance as a measure of
(dis)similarity
▪ Non-hierarchical clustering
techniques uses partitioning
methods and within cluster variance
as a measure to formhomogeneous
groups
12
Visit: Learnbay.co
Hierarchical Vs. Non-Hierarchical clustering
13
Hierarchical Clustering
▪ Relatively veryslower
▪ Agglomerative clustering is mostused
algorithm
▪ Uses distance as a measurefor
(dis)similarity
▪ Helps suggest optimal numbersof
clusters in data.
▪ Object assigned to a cluster remainsin
that cluster
Non-HierarchicalClustering
▪ Fast and preferable to usewith large
datasets
▪ K-means is a very popular non-
hierarchical clusteringtechnique
▪ Uses within cluster varianceas a
measure of similarity
▪ Non-hierarchical clusteringrequire
number of clusters as an input
parameter for starting
▪ Objects can be reassigned to other
clusters during the clusteringprocess
Visit: Learnbay.co
Hierarchical Clustering
Visit: Learnbay.co
Hierarchical Clustering - (dis)similarity measure
15
▪ Hierarchical Clustering is based on dis(similarity)
measure
▪ Most software package calculate a measure of
dissimilarity by estimating the distance between pair of
objects
▪ Objects with shorter distance are considerd similar,
whereas objects with larger distance are dissimilar
▪ Measures of similarity
– Euclidean distance (most commonly used)
– City Block or Manhattan distance
– Chebyshev distance
Visit: Learnbay.co
Distance Computation
A B
A
B
What is the distancebetween
PointAandB?
Ans: 7
What is the distancebetween
PointAandB?
Ans: √ [(x2-x1)2 +(y2-y1)2]
(Remember the PythagorasTheorem)
16
Visit: Learnbay.co
Distance Computation Contd…
▪ What is the distance between
Point A and B in n-Dimension
Space?
▪ IfA(a1, a2, … an) and B (b1, b2, … bn) are cartesian
coordinates
▪ By using Euclidean Distance (which is an extension of
Pythagoras Theorem), we get Distance AB as
▪ DAB = √ [(a1-b1)2 + (a2-b2)2 +….+ (an-bn)2]
17
Visit: Learnbay.co
Chebyshev Distance
18
▪ In mathematics, Chebyshev distance is a metricdefined
on a vector space where the distance between two
vectors is the greatest of their differences along any
coordinate dimension
▪ Assume two vectors:A (a1, a2, … an) & B (b1, b2, …bn)
▪ Chebyshev Distance
= Max ( | a1 - b1 | , | a2 - b2 | , ….. | an - bn | )
▪ Application: Survey / Research Data where the
responses are Ordinal
https://en.wikipedia.org/wiki/Chebyshev_distance
Visit: Learnbay.co
Manhattan Distance
19
▪ Manhattan Distance also called City Block Distance
▪ Assume two vectors:A (x1, x2, …xn) & B (y1, y2,…yn)
▪ Manhattan Distance
= | a1 - b1 | + | a2 - b2 | +….. | an - bn |
A
Block
Block
Block Block Block Block Block Block Block Block B
Manhattan Distance = 8 + 4 =12
Chebyshev Distance = Max (8, 4) =8
Eucledian Distance = sqrt ( 8^2 + 4^2) =8.94
Block
Visit: Learnbay.co
Hierarchical Clustering | Agglomerative Clustering
▪ Starts with each record as a cluster
of one recordeach
▪ Sequentially merges 2 closest
records by distance as a measureof
(dis)similarity to form a cluster. This
reduces the number of records by1
▪ Repeat the above step with new
cluster and all remaining clusterstill
we have one big cluster
Step
0
a
b
c
d
e
Step Step Step Step
1 2 3 4
a b
a b c de
c d e
d e
How do you measure the distance
between cluster (a,b) and (c) orthe
cluster (a,b) and (d,e)
????
20
Visit: Learnbay.co
Agglomerative Clustering Procedures
• Single linkage – Minimumdistance
or Nearest neighbourrule
• Complete linkage –Maximum
distance or Farthestdistance
• Average linkage –Average of the
distances between allpairs
• Centroid method – combine cluster
with minimum distance between the
centroids of the two clusters
• Ward’s method – Combineclusters
with which the increase in within
cluster variance is to the smallest
degree
http://sites.stat.psu.edu/~ajw13/stat505/fa06/19_cluster/09_cluster_wards.html
21
Visit: Learnbay.co
Clustering e.g. 1 : Clustering for RetailCustomers
## Let us find the clusters in given RetailCustomer Spends data
## We will use HierarchicalClusteringtechnique
## Let us first set the working directorypath and import the data
RCDF <- pd.read_csv("datafiles/Cust_Spend_Data.csv", header=TRUE)
View(RCDF)
HyperMarket CustomerSpend
Data - Metatdata
AVG_Mthly_Spend: The average
monthly amount spent bycustomer
No_of_Visits:The numberof times
a customer visited theHyperMarketi
Count of Apparel, Fruits and
Vegetable, Staple Itemspurchased
in amonth
22
Visit: Learnbay.co
Building the hierarchical clusters (without variablescaling)
Dendogram looks like Decision Tree but its not s Tree
or classification Note: The two clustersformed
are primarily on the basis of
AVG_MTHLY_SPEND
23
Eucledian Distance
computation in this case is
influenced by
AVG_MTHLY_SPEND variable
as the range of this variable is
too largecomparedto the other
variables
Toavoid this problem, we
shouldscale the variablesused
for clustering
Visit: Learnbay.co
Building the hierarchical clusters (with variablescaling)
## scale function standardizesthe values
Scaling of features can change the shape and sizes of the
clusters. It can even generate various numbers of clusters.
24
Visit: Learnbay.co
Understanding the Height Calculation inClustering
25
## Let us see the distance matrix
Dist. A B C D E F G H I
B 4.25
C 3.41 3.84
D 2.51 3.47 1.26
E 4.27 2.70 2.92 3.20
F 3.98 2.21 3.58 2.85 3.43
G 4.38 3.02 3.38 3.35 1.41 3.17
H 3.40 3.60 3.66 2.93 3.24 2.35 2.46
I 3.53 3.39 4.05 3.21 3.48 2.18 2.61 0.73
J 4.55 2.97 3.59 3.04 3.41 1.24 2.80 2.12 2.06
Visit: Learnbay.co
`
Dist. A B C D E F G H I
B 4.25
C 3.41 3.84
D 2.51 3.47 1.26
E 4.27 2.70 2.92 3.20
F 3.98 2.21 3.58 2.85 3.43
G 4.38 3.02 3.38 3.35 1.41 3.17
H 3.40 3.60 3.66 2.93 3.24 2.35 2.46
I 3.53 3.39 4.05 3.21 3.48 2.18 2.61 0.73
J 4.55 2.97 3.59 3.04 3.41 1.24 2.80 2.12 2.06
C, D 1.26 H,I 0.73 F,J 1.24 E,G 1.41
A, (C,D) (H,I), (F,J) B, (E,G)
A, C 3.41 F J B, E 2.70
A, D 2.51 H 2.35 2.12 B, G 3.02
A, (C,D) 2.96 I 2.18 2.06 B, (E,G) 2.86
((H,I), (F,J)) 2.17
(H,I, F,J) , (B, E,G)
B E G
H 3.60 3.24 2.46
I 3.39 3.48 2.61
F 2.21 3.43 3.17
J 2.97 3.41 2.80
(H,I, F,J) , (B,E,G) 3.06
(A, C, D) , (H, I, F, J, B, E, G)
H I F J B E G
A 3.40 3.53 3.98 4.55 4.25 4.27 4.38
C 3.66 4.05 3.58 3.59 3.84 2.92 3.38
D 2.93 3.21 2.85 3.04 3.47 3.20 3.35
(A, C, D) , (H, I, F, J, B, E, G) 3.59
26
Profiling the clusters
## profiling theclusters
Profiling of clusters means the aggregation of the patterns identified
within the each cluster.
Also, The characteristics of the clusters
27
Visit: Learnbay.co
Non-Hierarchical Clustering
(K Means)
Visit: Learnbay.co
K Means Clustering
29
▪ K-Means is the most used, non-hierarchical clustering
technique
▪ It is not based on Distance…
▪ It is based on within cluster Variation, in other words
Squared Distance from the Centre of theCluster
▪ The algorithm aims at segmenting data such thatwithin
cluster variation is reduced
Visit: Learnbay.co
K Means Algorithm
30
▪ Input Required : No of Clusters to be formed. (SayK)
▪ Steps
1. Assume K Centroids (for KClusters)
2. Compute Eucledian distance of each objects with these
Centroids.
3. Assign the objects to clusters with shortest distance
4. Compute the new centroid (mean) of each cluster based onthe
objects assigned to each clusters. The K number of means
obtained will become the new centroids for each cluster
5. Repeat step 2 to 4 till there is convergence
- i.e. there is no movement of objects from one cluster toanother
- Or threshold number of iterations haveoccurred
Visit: Learnbay.co
K-means advantages
31
▪ K-means is superior technique compared to Hierarchical
technique as it is less impacted by outliers
▪ Computationally it is more faster comparedto
Hierarchical
▪ Preferable to use on interval or ratio-scaled data as it
uses Eucledian distance… desirable to avoid using on
ordinal data
▪ Challenge – Number of clusters are to bepre-defined
and to be provided as input to the process
Visit: Learnbay.co
Why find optimal No. of Clusters?
D1
D2
D1 D1
▪ Two Clusters – 2 possible solution
C1 C2
D2 D2
C1
C2
▪ Three Clusters – Multiple possible solution
D1
D2
C1 C2
D1
D2
C1
C2
D1
D2
C1 C2
D1
D2
C1
C2C3
C3
C3 C3
Data to beclustered
32
Visit: Learnbay.co
Optimal No. of Clusters
WSS Plot or Within Sum of Square Error Plot is used to identify the optimal number
of clusters.
WSS plot is also called as Scree Plot or Elbow Curve within the Analytics Industry.
Elbow in the graph represents the optimal value of the clusters.
33
Visit: Learnbay.co
Plotting the clusters
## plotting theclusters
# More betterplot
34
Visit: Learnbay.co
Profiling the clusters
35
Visit: Learnbay.co
Next steps after clustering
36
▪ Clustering provides you with clusters in the given dataset
▪ Clustering does not provide you rules to classify future
records
▪ Tobe able to classify future records you may dothe
following
– Build Discriminant Model on Clustered Data
– Build Classification Tree Model on Clustered Data
Visit: Learnbay.co
References
37
▪ Chapter 9 : Cluster Analysis(http://www.springer.com)
– Google search : “www.springer.com cluster analysis chapter 9”
Visit: Learnbay.co

More Related Content

What's hot

M3 search
M3 searchM3 search
M3 search
Yasir Khan
 
functional groovy
functional groovyfunctional groovy
functional groovy
Paul King
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
Humberto Marchezi
 
Searching
SearchingSearching
20.5 Java polymorphism
20.5 Java polymorphism 20.5 Java polymorphism
20.5 Java polymorphism
Intro C# Book
 
Chapter3 Search
Chapter3 SearchChapter3 Search
Chapter3 Search
Khiem Ho
 

What's hot (6)

M3 search
M3 searchM3 search
M3 search
 
functional groovy
functional groovyfunctional groovy
functional groovy
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
 
Searching
SearchingSearching
Searching
 
20.5 Java polymorphism
20.5 Java polymorphism 20.5 Java polymorphism
20.5 Java polymorphism
 
Chapter3 Search
Chapter3 SearchChapter3 Search
Chapter3 Search
 

Similar to Clustering techniques

ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clustering
monalisa Das
 
Classification Tree - Cart
Classification Tree - CartClassification Tree - Cart
Classification Tree - Cart
Learnbay Datascience
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
ABINASHPADHY6
 
Chapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdfChapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdf
Amy Aung
 
Clusters (4).pptx
Clusters (4).pptxClusters (4).pptx
Clusters (4).pptx
brahimNasibov
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
YaswanthHariKumarVud
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
Support Vector Machine (SVM) Artificial Intelligence
Support Vector Machine (SVM) Artificial IntelligenceSupport Vector Machine (SVM) Artificial Intelligence
Support Vector Machine (SVM) Artificial Intelligence
Hiew Moi Sim
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
Sri Ambati
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
Roger Barga
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
Kmeans
KmeansKmeans
Kmeans
Nikita Goyal
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
MapR Technologies
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
Khalid Rabayah
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
problem solve and resolving in ai domain , probloms
problem solve and resolving in ai domain , problomsproblem solve and resolving in ai domain , probloms
problem solve and resolving in ai domain , probloms
SlimAmiri
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
Amy Aung
 

Similar to Clustering techniques (20)

ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clustering
 
Classification Tree - Cart
Classification Tree - CartClassification Tree - Cart
Classification Tree - Cart
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
 
Chapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdfChapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdf
 
Clusters (4).pptx
Clusters (4).pptxClusters (4).pptx
Clusters (4).pptx
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Support Vector Machine (SVM) Artificial Intelligence
Support Vector Machine (SVM) Artificial IntelligenceSupport Vector Machine (SVM) Artificial Intelligence
Support Vector Machine (SVM) Artificial Intelligence
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Kmeans
KmeansKmeans
Kmeans
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
problem solve and resolving in ai domain , probloms
problem solve and resolving in ai domain , problomsproblem solve and resolving in ai domain , probloms
problem solve and resolving in ai domain , probloms
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
 

More from Learnbay Datascience

Top data science projects
Top data science projectsTop data science projects
Top data science projects
Learnbay Datascience
 
Python my SQL - create table
Python my SQL - create tablePython my SQL - create table
Python my SQL - create table
Learnbay Datascience
 
Python my SQL - create database
Python my SQL - create databasePython my SQL - create database
Python my SQL - create database
Learnbay Datascience
 
Python my sql database connection
Python my sql   database connectionPython my sql   database connection
Python my sql database connection
Learnbay Datascience
 
Python - mySOL
Python - mySOLPython - mySOL
Python - mySOL
Learnbay Datascience
 
AI - Issues and Terminology
AI - Issues and TerminologyAI - Issues and Terminology
AI - Issues and Terminology
Learnbay Datascience
 
AI - Fuzzy Logic Systems
AI - Fuzzy Logic SystemsAI - Fuzzy Logic Systems
AI - Fuzzy Logic Systems
Learnbay Datascience
 
AI - working of an ns
AI - working of an nsAI - working of an ns
AI - working of an ns
Learnbay Datascience
 
Artificial Intelligence- Neural Networks
Artificial Intelligence- Neural NetworksArtificial Intelligence- Neural Networks
Artificial Intelligence- Neural Networks
Learnbay Datascience
 
AI - Robotics
AI - RoboticsAI - Robotics
AI - Robotics
Learnbay Datascience
 
Applications of expert system
Applications of expert systemApplications of expert system
Applications of expert system
Learnbay Datascience
 
Components of expert systems
Components of expert systemsComponents of expert systems
Components of expert systems
Learnbay Datascience
 
Artificial intelligence - expert systems
 Artificial intelligence - expert systems Artificial intelligence - expert systems
Artificial intelligence - expert systems
Learnbay Datascience
 
AI - natural language processing
AI - natural language processingAI - natural language processing
AI - natural language processing
Learnbay Datascience
 
Ai popular search algorithms
Ai   popular search algorithmsAi   popular search algorithms
Ai popular search algorithms
Learnbay Datascience
 
AI - Agents & Environments
AI - Agents & EnvironmentsAI - Agents & Environments
AI - Agents & Environments
Learnbay Datascience
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areas
Learnbay Datascience
 
Artificial intelligence composed
Artificial intelligence composedArtificial intelligence composed
Artificial intelligence composed
Learnbay Datascience
 
Artificial intelligence intelligent systems
Artificial intelligence   intelligent systemsArtificial intelligence   intelligent systems
Artificial intelligence intelligent systems
Learnbay Datascience
 
Applications of ai
Applications of aiApplications of ai
Applications of ai
Learnbay Datascience
 

More from Learnbay Datascience (20)

Top data science projects
Top data science projectsTop data science projects
Top data science projects
 
Python my SQL - create table
Python my SQL - create tablePython my SQL - create table
Python my SQL - create table
 
Python my SQL - create database
Python my SQL - create databasePython my SQL - create database
Python my SQL - create database
 
Python my sql database connection
Python my sql   database connectionPython my sql   database connection
Python my sql database connection
 
Python - mySOL
Python - mySOLPython - mySOL
Python - mySOL
 
AI - Issues and Terminology
AI - Issues and TerminologyAI - Issues and Terminology
AI - Issues and Terminology
 
AI - Fuzzy Logic Systems
AI - Fuzzy Logic SystemsAI - Fuzzy Logic Systems
AI - Fuzzy Logic Systems
 
AI - working of an ns
AI - working of an nsAI - working of an ns
AI - working of an ns
 
Artificial Intelligence- Neural Networks
Artificial Intelligence- Neural NetworksArtificial Intelligence- Neural Networks
Artificial Intelligence- Neural Networks
 
AI - Robotics
AI - RoboticsAI - Robotics
AI - Robotics
 
Applications of expert system
Applications of expert systemApplications of expert system
Applications of expert system
 
Components of expert systems
Components of expert systemsComponents of expert systems
Components of expert systems
 
Artificial intelligence - expert systems
 Artificial intelligence - expert systems Artificial intelligence - expert systems
Artificial intelligence - expert systems
 
AI - natural language processing
AI - natural language processingAI - natural language processing
AI - natural language processing
 
Ai popular search algorithms
Ai   popular search algorithmsAi   popular search algorithms
Ai popular search algorithms
 
AI - Agents & Environments
AI - Agents & EnvironmentsAI - Agents & Environments
AI - Agents & Environments
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areas
 
Artificial intelligence composed
Artificial intelligence composedArtificial intelligence composed
Artificial intelligence composed
 
Artificial intelligence intelligent systems
Artificial intelligence   intelligent systemsArtificial intelligence   intelligent systems
Artificial intelligence intelligent systems
 
Applications of ai
Applications of aiApplications of ai
Applications of ai
 

Recently uploaded

Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 

Recently uploaded (20)

Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 

Clustering techniques

  • 2. Agenda R Setup Clustering Overview Hierarchical Clustering Non-Hierarchical Clustering Visit: Learnbay.co
  • 3. Python Setup for Clustering 3 ▪ Python 3.2.3 or higher version should beinstalled ▪ Following Libraries are installed. Check by running thebelow command ## it is okay if you get Warning Message, but you should not get Error Message sklearn seaborn matplotlib Visit: Learnbay.co
  • 5. Intro: Data – Information – Knowledge – Wisdom(DIKW) 5 Where is the Life… we have lost inLiving? Where is the wisdom… we have lost in Knowledge? Where is the Knowledge…we have lost in Information? - T.S. Eliot ▪ Where is the Information… we have lost inData? In order to go from Data to Information to Knowledge and toWisdom We need to simplifyData Visit: Learnbay.co
  • 6. How do we simplifydata ▪ Clustering /Segmentation – Casereduction – Reduce the number of records by identifying similar groupsand representing them as acluster ▪ Simplify data equals to eliminating datacomplexity – too many variables to a fewvariables – too many records to a fewrecords ▪ Dimension Reduction – Reduce number of variable by using techniques like Principal Component Analysis, Collinearity Check, Business Rule Based,etc Our focusarea inthis 6 presentation Visit: Learnbay.co
  • 7. Clustering ▪ Clustering is a technique for finding similar groups in data, called clusters. ▪ It groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into differentclusters. ▪ A cluster is therefore a collection of objects whichare “similar” between them and are “dissimilar” to the objects belonging to other clusters ▪ How do we define “Similar” in clustering? Note: Clustering is an unsupervised learningtechnique 7 Visit: Learnbay.co
  • 9. Simple Clustering e.g. ▪ From Scatter Plot we can see that – A & D form one segment of customers who are veryhigh on Brand Loyalty – B, C, & E is another segment which is very Price Conscious Individuals PriceConscious BrandLoyalty A 2 4 B 8 2 C 9 3 D 1 5 E 8 1 2 B 1 E 0 3 C 4 A 5 D 6 0 2 8 10 BrandLoyalty 4 6 PriceConscious Individuals Note: The above e.g. is just an hypotheticaldata to introducethe subject of clustering.It is not relatedto the below url: http://globalbizresearch.org/chennai_conference/pdf/pdf/ID_C405_Formatted.pdf 9 Visit: Learnbay.co
  • 10. Steps involved in Clustering Analysis Formulate the problem – Select variables to be used for clustering Decide the Clustering Procedure (Hierarchical / Non-Hierarchical) Select the measure of similarity(dis-similarity) Decide on the number ofclusters Interpret the cluster output (Profile theclusters) Validate the clusters Choose clustering algorithm (applicable inhierarchicalclustering) 10 Visit: Learnbay.co
  • 11. Formulate the clustering problem ▪ Formulate the problem – understand the business problem – hypothesize variables that will help solve the clustering problem at hand ▪ Applications of Clustering Technique – Store Clustering – Customer Clustering – Village AffluencyCategorization Note: It is preferable to do Factor Analysis / Principal Component Analysis before clustering 11 Visit: Learnbay.co
  • 12. Decide the clustering procedure ▪ Types of Clustering Procedures – Hierarchical Clustering – Non-Hierarchical Clustering ▪ Hierarchical clustering is characterized by a tree likestructure and uses distance as a measure of (dis)similarity ▪ Non-hierarchical clustering techniques uses partitioning methods and within cluster variance as a measure to formhomogeneous groups 12 Visit: Learnbay.co
  • 13. Hierarchical Vs. Non-Hierarchical clustering 13 Hierarchical Clustering ▪ Relatively veryslower ▪ Agglomerative clustering is mostused algorithm ▪ Uses distance as a measurefor (dis)similarity ▪ Helps suggest optimal numbersof clusters in data. ▪ Object assigned to a cluster remainsin that cluster Non-HierarchicalClustering ▪ Fast and preferable to usewith large datasets ▪ K-means is a very popular non- hierarchical clusteringtechnique ▪ Uses within cluster varianceas a measure of similarity ▪ Non-hierarchical clusteringrequire number of clusters as an input parameter for starting ▪ Objects can be reassigned to other clusters during the clusteringprocess Visit: Learnbay.co
  • 15. Hierarchical Clustering - (dis)similarity measure 15 ▪ Hierarchical Clustering is based on dis(similarity) measure ▪ Most software package calculate a measure of dissimilarity by estimating the distance between pair of objects ▪ Objects with shorter distance are considerd similar, whereas objects with larger distance are dissimilar ▪ Measures of similarity – Euclidean distance (most commonly used) – City Block or Manhattan distance – Chebyshev distance Visit: Learnbay.co
  • 16. Distance Computation A B A B What is the distancebetween PointAandB? Ans: 7 What is the distancebetween PointAandB? Ans: √ [(x2-x1)2 +(y2-y1)2] (Remember the PythagorasTheorem) 16 Visit: Learnbay.co
  • 17. Distance Computation Contd… ▪ What is the distance between Point A and B in n-Dimension Space? ▪ IfA(a1, a2, … an) and B (b1, b2, … bn) are cartesian coordinates ▪ By using Euclidean Distance (which is an extension of Pythagoras Theorem), we get Distance AB as ▪ DAB = √ [(a1-b1)2 + (a2-b2)2 +….+ (an-bn)2] 17 Visit: Learnbay.co
  • 18. Chebyshev Distance 18 ▪ In mathematics, Chebyshev distance is a metricdefined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension ▪ Assume two vectors:A (a1, a2, … an) & B (b1, b2, …bn) ▪ Chebyshev Distance = Max ( | a1 - b1 | , | a2 - b2 | , ….. | an - bn | ) ▪ Application: Survey / Research Data where the responses are Ordinal https://en.wikipedia.org/wiki/Chebyshev_distance Visit: Learnbay.co
  • 19. Manhattan Distance 19 ▪ Manhattan Distance also called City Block Distance ▪ Assume two vectors:A (x1, x2, …xn) & B (y1, y2,…yn) ▪ Manhattan Distance = | a1 - b1 | + | a2 - b2 | +….. | an - bn | A Block Block Block Block Block Block Block Block Block Block B Manhattan Distance = 8 + 4 =12 Chebyshev Distance = Max (8, 4) =8 Eucledian Distance = sqrt ( 8^2 + 4^2) =8.94 Block Visit: Learnbay.co
  • 20. Hierarchical Clustering | Agglomerative Clustering ▪ Starts with each record as a cluster of one recordeach ▪ Sequentially merges 2 closest records by distance as a measureof (dis)similarity to form a cluster. This reduces the number of records by1 ▪ Repeat the above step with new cluster and all remaining clusterstill we have one big cluster Step 0 a b c d e Step Step Step Step 1 2 3 4 a b a b c de c d e d e How do you measure the distance between cluster (a,b) and (c) orthe cluster (a,b) and (d,e) ???? 20 Visit: Learnbay.co
  • 21. Agglomerative Clustering Procedures • Single linkage – Minimumdistance or Nearest neighbourrule • Complete linkage –Maximum distance or Farthestdistance • Average linkage –Average of the distances between allpairs • Centroid method – combine cluster with minimum distance between the centroids of the two clusters • Ward’s method – Combineclusters with which the increase in within cluster variance is to the smallest degree http://sites.stat.psu.edu/~ajw13/stat505/fa06/19_cluster/09_cluster_wards.html 21 Visit: Learnbay.co
  • 22. Clustering e.g. 1 : Clustering for RetailCustomers ## Let us find the clusters in given RetailCustomer Spends data ## We will use HierarchicalClusteringtechnique ## Let us first set the working directorypath and import the data RCDF <- pd.read_csv("datafiles/Cust_Spend_Data.csv", header=TRUE) View(RCDF) HyperMarket CustomerSpend Data - Metatdata AVG_Mthly_Spend: The average monthly amount spent bycustomer No_of_Visits:The numberof times a customer visited theHyperMarketi Count of Apparel, Fruits and Vegetable, Staple Itemspurchased in amonth 22 Visit: Learnbay.co
  • 23. Building the hierarchical clusters (without variablescaling) Dendogram looks like Decision Tree but its not s Tree or classification Note: The two clustersformed are primarily on the basis of AVG_MTHLY_SPEND 23 Eucledian Distance computation in this case is influenced by AVG_MTHLY_SPEND variable as the range of this variable is too largecomparedto the other variables Toavoid this problem, we shouldscale the variablesused for clustering Visit: Learnbay.co
  • 24. Building the hierarchical clusters (with variablescaling) ## scale function standardizesthe values Scaling of features can change the shape and sizes of the clusters. It can even generate various numbers of clusters. 24 Visit: Learnbay.co
  • 25. Understanding the Height Calculation inClustering 25 ## Let us see the distance matrix Dist. A B C D E F G H I B 4.25 C 3.41 3.84 D 2.51 3.47 1.26 E 4.27 2.70 2.92 3.20 F 3.98 2.21 3.58 2.85 3.43 G 4.38 3.02 3.38 3.35 1.41 3.17 H 3.40 3.60 3.66 2.93 3.24 2.35 2.46 I 3.53 3.39 4.05 3.21 3.48 2.18 2.61 0.73 J 4.55 2.97 3.59 3.04 3.41 1.24 2.80 2.12 2.06 Visit: Learnbay.co
  • 26. ` Dist. A B C D E F G H I B 4.25 C 3.41 3.84 D 2.51 3.47 1.26 E 4.27 2.70 2.92 3.20 F 3.98 2.21 3.58 2.85 3.43 G 4.38 3.02 3.38 3.35 1.41 3.17 H 3.40 3.60 3.66 2.93 3.24 2.35 2.46 I 3.53 3.39 4.05 3.21 3.48 2.18 2.61 0.73 J 4.55 2.97 3.59 3.04 3.41 1.24 2.80 2.12 2.06 C, D 1.26 H,I 0.73 F,J 1.24 E,G 1.41 A, (C,D) (H,I), (F,J) B, (E,G) A, C 3.41 F J B, E 2.70 A, D 2.51 H 2.35 2.12 B, G 3.02 A, (C,D) 2.96 I 2.18 2.06 B, (E,G) 2.86 ((H,I), (F,J)) 2.17 (H,I, F,J) , (B, E,G) B E G H 3.60 3.24 2.46 I 3.39 3.48 2.61 F 2.21 3.43 3.17 J 2.97 3.41 2.80 (H,I, F,J) , (B,E,G) 3.06 (A, C, D) , (H, I, F, J, B, E, G) H I F J B E G A 3.40 3.53 3.98 4.55 4.25 4.27 4.38 C 3.66 4.05 3.58 3.59 3.84 2.92 3.38 D 2.93 3.21 2.85 3.04 3.47 3.20 3.35 (A, C, D) , (H, I, F, J, B, E, G) 3.59 26
  • 27. Profiling the clusters ## profiling theclusters Profiling of clusters means the aggregation of the patterns identified within the each cluster. Also, The characteristics of the clusters 27 Visit: Learnbay.co
  • 29. K Means Clustering 29 ▪ K-Means is the most used, non-hierarchical clustering technique ▪ It is not based on Distance… ▪ It is based on within cluster Variation, in other words Squared Distance from the Centre of theCluster ▪ The algorithm aims at segmenting data such thatwithin cluster variation is reduced Visit: Learnbay.co
  • 30. K Means Algorithm 30 ▪ Input Required : No of Clusters to be formed. (SayK) ▪ Steps 1. Assume K Centroids (for KClusters) 2. Compute Eucledian distance of each objects with these Centroids. 3. Assign the objects to clusters with shortest distance 4. Compute the new centroid (mean) of each cluster based onthe objects assigned to each clusters. The K number of means obtained will become the new centroids for each cluster 5. Repeat step 2 to 4 till there is convergence - i.e. there is no movement of objects from one cluster toanother - Or threshold number of iterations haveoccurred Visit: Learnbay.co
  • 31. K-means advantages 31 ▪ K-means is superior technique compared to Hierarchical technique as it is less impacted by outliers ▪ Computationally it is more faster comparedto Hierarchical ▪ Preferable to use on interval or ratio-scaled data as it uses Eucledian distance… desirable to avoid using on ordinal data ▪ Challenge – Number of clusters are to bepre-defined and to be provided as input to the process Visit: Learnbay.co
  • 32. Why find optimal No. of Clusters? D1 D2 D1 D1 ▪ Two Clusters – 2 possible solution C1 C2 D2 D2 C1 C2 ▪ Three Clusters – Multiple possible solution D1 D2 C1 C2 D1 D2 C1 C2 D1 D2 C1 C2 D1 D2 C1 C2C3 C3 C3 C3 Data to beclustered 32 Visit: Learnbay.co
  • 33. Optimal No. of Clusters WSS Plot or Within Sum of Square Error Plot is used to identify the optimal number of clusters. WSS plot is also called as Scree Plot or Elbow Curve within the Analytics Industry. Elbow in the graph represents the optimal value of the clusters. 33 Visit: Learnbay.co
  • 34. Plotting the clusters ## plotting theclusters # More betterplot 34 Visit: Learnbay.co
  • 36. Next steps after clustering 36 ▪ Clustering provides you with clusters in the given dataset ▪ Clustering does not provide you rules to classify future records ▪ Tobe able to classify future records you may dothe following – Build Discriminant Model on Clustered Data – Build Classification Tree Model on Clustered Data Visit: Learnbay.co
  • 37. References 37 ▪ Chapter 9 : Cluster Analysis(http://www.springer.com) – Google search : “www.springer.com cluster analysis chapter 9” Visit: Learnbay.co