Cluster analysis in R by Aman Chauhan

•Download as PPTX, PDF•

0 likes•27 views

Aman Chauhan

This PPT deals with basics of clusters and their implementation in R.

Data & Analytics

What is Clustering?
 Clustering is a technique of data segmentation that partitions the data into several
groups based on their similarity.
 We group the data through a statistical operation. These smaller groups that are
formed from the bigger data are known as clusters.

What is cluster Analysis?
 A method of summarizing data, similar to factor analysis cases are grouped into
“clusters” with other similar cases
 It’s a way of “grouping” data into meaningful groups or clusters

Types of Cluster analysis
 There are two types of cluster analysis: R and Q type analysis:
R Type - to what extent do the variables covary across the cases?
 Q Type - to what extent do the cases covary across the variables?

Where it is used?
 It used in cases where the underlying input data has a colossal volume and we are
tasked with finding similar subsets that can be analysed in several ways.
 For example – A marketing company can categorise their customers based on
their economic background, age and several other factors to sell their products, in
a better way.

K-Means clustering in R
 One of the most popular partitioning algorithms in clustering is the K-means
cluster analysis in R. It is an unsupervised learning algorithm. It tries to cluster
data based on their similarity. Also, we have specified the number of clusters and
we want that the data must be grouped into the same clusters. The algorithm
assigns each observation to a cluster and also finds the centroid of each cluster.

Code for K means Clustering in R
 mydata <- mtcars[, c('mpg', 'cyl', 'wt')]
 clusters <- kmeans(mydata, 3)
 kmeanPlot <- par(mar = c(5.1, 4.1, 0, 1)) plot(mydata, col = clusters$cluster)

Points to keep in mind
• k-means clustering is a flat clustering technique, which produces only one
partition with k clusters
• requires a user to determine the number of clusters at the beginning
• k-means clustering is much faster than hierarchical clustering

 #Using the mtcars dataset
 #clean/normalize the data
 data(mtcars)
 mydata = na.omit(mtcars)
 #deletion of missing
 mydata = scale(mydata)
 #standarize variables
 # Determine number of clusters
 wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) for (i in 2:15)
 wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
 plot(1:15, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")
 # check out the plot
 # K-Means Cluster Analysis
 fit <- kmeans(mydata, 5) # 5 cluster solution
 # get cluster means
 aggregate(mydata,by=list(fit$cluster),FUN=mean)
 # append cluster assignment
 mydata <- data.frame(mydata, fit$cluster)
#visualize the clustering results
library(cluster)
clusplot(mydata, fit$cluster, color=TRUE, shade=TRUE, labels=2, lines=0

What's hot

Big data presentationCatur Wibisono

Data Preprocessing || Data MiningIffat Firozy

Data preprocessingankur bhalla

Introduction to Data MiningDataminingTools Inc

Bigdata analyticsGuruAbirami2

Data preprocessingksamyMCA

What's hot (6)

Big data presentation

Data Preprocessing || Data Mining

Data preprocessing

Introduction to Data Mining

Bigdata analytics

Data preprocessing

Similar to Cluster analysis in R by Aman Chauhan

Customer segmentation.pptxAddalashashikumar

Chapter 5.pdfDrGnaneswariG

BAS 250 Lecture 3Wake Tech BAS

Presentation on K-Means ClusteringPabna University of Science & Technology

XL-MINER: Data ExplorationDataminingTools Inc

XL-MINER:Data Explorationxlminer content

CLUSTERING IN DATA MINING.pdfSowmyaJyothi3

CS8091_BDA_Unit_II_ClusteringPalani Kumar

MODULE 4_ CLUSTERING.pptxnikshaikh786

Clustering & classificationJamshed Khan

Clustering[306] [Read-Only].pdfigeabroad

Clustering.pptxRamakrishna Reddy Bijjam

Bank marketingDattatreya Biswas

ClusteringVenkateswara Rao Katevarapu

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...KamleshKumar394

machine learning - Clustering in RSudhakar Chavan

K MEANS CLUSTERINGsingh7599

A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal

Experimental study of Data clustering using k- Means and modified algorithmsIJDKP

Bank loan purchase modelingSaleesh Satheeshchandran

Similar to Cluster analysis in R by Aman Chauhan (20)

Customer segmentation.pptx

Chapter 5.pdf

BAS 250 Lecture 3

Presentation on K-Means Clustering

XL-MINER: Data Exploration

XL-MINER:Data Exploration

CLUSTERING IN DATA MINING.pdf

CS8091_BDA_Unit_II_Clustering

MODULE 4_ CLUSTERING.pptx

Clustering & classification

Clustering[306] [Read-Only].pdf

Clustering.pptx

Bank marketing

Clustering

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...

machine learning - Clustering in R

K MEANS CLUSTERING

A Study of Efficiency Improvements Technique for K-Means Algorithm

Experimental study of Data clustering using k- Means and modified algorithms

Bank loan purchase modeling

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

April 2024 - Crypto Market Report's Analysismanisha194592

Anomaly detection and data imputation within time seriesParis Women in Machine Learning and Data Science

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Discover Why Less is More in B2B Researchmichael115558

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Halmar dropshipping via API with DroFxolyaivanovalion

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Midocean dropshipping via API with DroFxolyaivanovalion

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Recently uploaded (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

April 2024 - Crypto Market Report's Analysis

Anomaly detection and data imputation within time series

Mature dropshipping via API with DroFx.pptx

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

Sampling (random) method and Non random.ppt

Discover Why Less is More in B2B Research

BigBuy dropshipping via API with DroFx.pptx

Halmar dropshipping via API with DroFx

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Midocean dropshipping via API with DroFx

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Edukaciniai dropshipping via API with DroFx

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Cluster analysis in R by Aman Chauhan

1. Cluster Analysis

2. What is Clustering?  Clustering is a technique of data segmentation that partitions the data into several groups based on their similarity.  We group the data through a statistical operation. These smaller groups that are formed from the bigger data are known as clusters.

3. What is cluster Analysis?  A method of summarizing data, similar to factor analysis cases are grouped into “clusters” with other similar cases  It’s a way of “grouping” data into meaningful groups or clusters

4. Types of Cluster analysis  There are two types of cluster analysis: R and Q type analysis: R Type - to what extent do the variables covary across the cases?  Q Type - to what extent do the cases covary across the variables?

5. Where it is used?  It used in cases where the underlying input data has a colossal volume and we are tasked with finding similar subsets that can be analysed in several ways.  For example – A marketing company can categorise their customers based on their economic background, age and several other factors to sell their products, in a better way.

6. K-Means clustering in R  One of the most popular partitioning algorithms in clustering is the K-means cluster analysis in R. It is an unsupervised learning algorithm. It tries to cluster data based on their similarity. Also, we have specified the number of clusters and we want that the data must be grouped into the same clusters. The algorithm assigns each observation to a cluster and also finds the centroid of each cluster.

7. Code for K means Clustering in R  mydata <- mtcars[, c('mpg', 'cyl', 'wt')]  clusters <- kmeans(mydata, 3)  kmeanPlot <- par(mar = c(5.1, 4.1, 0, 1)) plot(mydata, col = clusters$cluster)

8. Points to keep in mind • k-means clustering is a flat clustering technique, which produces only one partition with k clusters • requires a user to determine the number of clusters at the beginning • k-means clustering is much faster than hierarchical clustering

9.  #Using the mtcars dataset  #clean/normalize the data  data(mtcars)  mydata = na.omit(mtcars)  #deletion of missing  mydata = scale(mydata)  #standarize variables  # Determine number of clusters  wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) for (i in 2:15)  wss[i] <- sum(kmeans(mydata, centers=i)$withinss)  plot(1:15, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")  # check out the plot  # K-Means Cluster Analysis  fit <- kmeans(mydata, 5) # 5 cluster solution  # get cluster means  aggregate(mydata,by=list(fit$cluster),FUN=mean)  # append cluster assignment  mydata <- data.frame(mydata, fit$cluster) #visualize the clustering results library(cluster) clusplot(mydata, fit$cluster, color=TRUE, shade=TRUE, labels=2, lines=0

Cluster analysis in R by Aman Chauhan

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Cluster analysis in R by Aman Chauhan

Similar to Cluster analysis in R by Aman Chauhan (20)

Recently uploaded

Recently uploaded (20)

Cluster analysis in R by Aman Chauhan