SlideShare a Scribd company logo
1 of 20
Download to read offline
Lesson 20
Clustering Kush Kulshrestha
Supervised vs Unsupervised
Supervised Learning
• For every x there is a y
• Goal is to predict y using x
• Most of the industrial problems are solved using
supervised learning.
Unsupervised Learning
• For every x there is no y.
• Goal is not prediction, but to investigate the x
• Unsupervised learning has less use cases
comparatively. It is used as one of the pre-processing
step before supervised learning.
Clustering
• Clustering splits data in order to find out how observations are similar on a number of different features.
• We are not predicting a true Y.
• The clusters are the model. We decide the number of clusters, represented as K.
Clustering
Imagine that you are the owner of an online clothing store. You want to segment your customers by their buying
habits.
• As the owner of the store, you think that some customers are bargain-hunters while others are price-insensitive;
some come in every week while others drop in during the holidays.
• But you don’t actually know definitively which ones are which.
Clustering
You could take a supervised approach to this problem, and try to predict how much a customer will spend, or how
frequently a customer will drop in.
But here, we’re just trying to see what the data will tell us. This is what makes clustering an unsupervised problem.
Clustering
Since we have an online website for our store, we have valuable data about how our consumers behave online.
We select features that will determine how clusters are formed in our algorithm.
Since we want to segment customers by their buying habits, we probably want to form clusters using the features
“number of visits” and “average amount spent.” Let’s start with these.
Clustering
By applying clustering on “number of visits” and “average amount spent”, we can demonstrate clustering using
two features in two dimensional space!
Clustering
And by tagging each of the cluster with appropriate name/context in business sense, we can focus on each group
differently.
K-Means Clustering - Algorithm
Step – I
We start with a simple scatter plot of number of visits against average amount spent.
Decide the value of K – total number of clusters in your data.
K-Means Clustering
Step – II
Randomly assign each customer to a cluster.
Data points have randomly been assigned into pink, yellow, and green groupings.
K-Means Clustering
In our example we specified 3 clusters, but we can tell the model whatever “K” we want.
A very large number of clusters will cause overfitting (for example, if you set n equal to the number of observations
you will have a cluster for every observation!) This is not very useful for making sense of subsets of our data.
K is an example of hyper parameter, the value which is input to the algorithm and is decided by you.
K-Means Clustering
Step – III
We calculate a centroid for each cluster. Our goal is to minimize the distance from centroid to any observation.
The centroids are calculated as the center of their randomly assigned group.
Here, centroids are really close together, and distance to the outer points is very large -- we can definitely do better!
K-Means Clustering
Step – IV
Now we re-assign each observation to the cluster group of the nearest centroid.
Recalculate centroid, with time centroids will go further apart.
With time, fewer observations will change clusters.
We keep repeating the centroid calculation / cluster re-assignments until nothing moves anymore - the model is
complete!
K-Means Clustering
Selecting Optimal value of K
An Elbow Plot visualizes how the error decreases as K increases.
If K == n (number of observations), then distance = 0 (each observation its own cluster)! This, as we know, is a case of
overfitting.
In the graph to the right, we should choose K=6. This appears to be the number of clusters where the biggest gains in
reducing error have already been made.
Pros and Cons
Advantages:
• Does not assume any underlying distribution (e.g. no normal distribution assumption like in linear
regression)
• Easy to represent physically.
• Produces intuitive groupings.
• Can work in multiple dimensions.
Disadvantages:
• Time-consuming to find the optimal number of clusters.
• Time-consuming feature engineering (features must be numeric and normalized)
Hierarchical Clustering
Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into
groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each other cluster, and the
objects within each cluster are broadly similar to each other.
Hierarchical clustering starts by treating each observation as a separate cluster. Then, it repeatedly executes the
following two steps: (1) identify the two clusters that are closest together, and (2) merge the two most similar
clusters. This continues until all the clusters are merged together.
Hierarchical Clustering
Hierarchical Clustering
Continued..
Hierarchical Clustering
The main output of Hierarchical Clustering is a dendrogram, which shows the hierarchical relationship between the
clusters:
Measure of Distance
In the example above, the distance between two clusters has been computed based on length of the straight line
drawn from one cluster to another. This is commonly referred to as the Euclidean distance. Many other distance
metrics have been developed.
The choice of distance metric should be made based on theoretical concerns from the domain of study. That is, a
distance metric needs to define similarity in a way that is sensible for the field of study. For example, if clustering
crime sites in a city, city block distance may be appropriate (or, better yet, the time taken to travel between each
location). Where there is no theoretical justification for an alternative, the Euclidean should generally be preferred,
as it is usually the appropriate measure of distance in the physical world.

More Related Content

What's hot

Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA BoostAman Patel
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and ClusteringEng Teong Cheah
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advancedHouw Liong The
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 

What's hot (20)

Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 

Similar to Online Clothing Store Customer Clustering with K-Means

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...Smarten Augmented Analytics
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxUnsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxjasontseng19
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKnoldus Inc.
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
Clustering (from Google)
Clustering (from Google)Clustering (from Google)
Clustering (from Google)Sri Prasanna
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clusteringmobius.cn
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)Abhimanyu Dwivedi
 

Similar to Online Clothing Store Customer Clustering with K-Means (20)

Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Clustering
ClusteringClustering
Clustering
 
Clustering
ClusteringClustering
Clustering
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
 
TYPES OF CLUSTERING.pptx
TYPES OF CLUSTERING.pptxTYPES OF CLUSTERING.pptx
TYPES OF CLUSTERING.pptx
 
Clustering
ClusteringClustering
Clustering
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxUnsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Clustering (from Google)
Clustering (from Google)Clustering (from Google)
Clustering (from Google)
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 

More from Kush Kulshrestha

Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNNKush Kulshrestha
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationKush Kulshrestha
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionKush Kulshrestha
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningKush Kulshrestha
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningKush Kulshrestha
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionKush Kulshrestha
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine LearningKush Kulshrestha
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesKush Kulshrestha
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their typesKush Kulshrestha
 

More from Kush Kulshrestha (16)

Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for Classification
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic Regression
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine Learning
 
Visualization-2
Visualization-2Visualization-2
Visualization-2
 
Visualization-1
Visualization-1Visualization-1
Visualization-1
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric Vehicles
 
Time management
Time managementTime management
Time management
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their types
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Online Clothing Store Customer Clustering with K-Means

  • 2. Supervised vs Unsupervised Supervised Learning • For every x there is a y • Goal is to predict y using x • Most of the industrial problems are solved using supervised learning. Unsupervised Learning • For every x there is no y. • Goal is not prediction, but to investigate the x • Unsupervised learning has less use cases comparatively. It is used as one of the pre-processing step before supervised learning.
  • 3. Clustering • Clustering splits data in order to find out how observations are similar on a number of different features. • We are not predicting a true Y. • The clusters are the model. We decide the number of clusters, represented as K.
  • 4. Clustering Imagine that you are the owner of an online clothing store. You want to segment your customers by their buying habits. • As the owner of the store, you think that some customers are bargain-hunters while others are price-insensitive; some come in every week while others drop in during the holidays. • But you don’t actually know definitively which ones are which.
  • 5. Clustering You could take a supervised approach to this problem, and try to predict how much a customer will spend, or how frequently a customer will drop in. But here, we’re just trying to see what the data will tell us. This is what makes clustering an unsupervised problem.
  • 6. Clustering Since we have an online website for our store, we have valuable data about how our consumers behave online. We select features that will determine how clusters are formed in our algorithm. Since we want to segment customers by their buying habits, we probably want to form clusters using the features “number of visits” and “average amount spent.” Let’s start with these.
  • 7. Clustering By applying clustering on “number of visits” and “average amount spent”, we can demonstrate clustering using two features in two dimensional space!
  • 8. Clustering And by tagging each of the cluster with appropriate name/context in business sense, we can focus on each group differently.
  • 9. K-Means Clustering - Algorithm Step – I We start with a simple scatter plot of number of visits against average amount spent. Decide the value of K – total number of clusters in your data.
  • 10. K-Means Clustering Step – II Randomly assign each customer to a cluster. Data points have randomly been assigned into pink, yellow, and green groupings.
  • 11. K-Means Clustering In our example we specified 3 clusters, but we can tell the model whatever “K” we want. A very large number of clusters will cause overfitting (for example, if you set n equal to the number of observations you will have a cluster for every observation!) This is not very useful for making sense of subsets of our data. K is an example of hyper parameter, the value which is input to the algorithm and is decided by you.
  • 12. K-Means Clustering Step – III We calculate a centroid for each cluster. Our goal is to minimize the distance from centroid to any observation. The centroids are calculated as the center of their randomly assigned group. Here, centroids are really close together, and distance to the outer points is very large -- we can definitely do better!
  • 13. K-Means Clustering Step – IV Now we re-assign each observation to the cluster group of the nearest centroid. Recalculate centroid, with time centroids will go further apart. With time, fewer observations will change clusters. We keep repeating the centroid calculation / cluster re-assignments until nothing moves anymore - the model is complete!
  • 14. K-Means Clustering Selecting Optimal value of K An Elbow Plot visualizes how the error decreases as K increases. If K == n (number of observations), then distance = 0 (each observation its own cluster)! This, as we know, is a case of overfitting. In the graph to the right, we should choose K=6. This appears to be the number of clusters where the biggest gains in reducing error have already been made.
  • 15. Pros and Cons Advantages: • Does not assume any underlying distribution (e.g. no normal distribution assumption like in linear regression) • Easy to represent physically. • Produces intuitive groupings. • Can work in multiple dimensions. Disadvantages: • Time-consuming to find the optimal number of clusters. • Time-consuming feature engineering (features must be numeric and normalized)
  • 16. Hierarchical Clustering Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other. Hierarchical clustering starts by treating each observation as a separate cluster. Then, it repeatedly executes the following two steps: (1) identify the two clusters that are closest together, and (2) merge the two most similar clusters. This continues until all the clusters are merged together.
  • 19. Hierarchical Clustering The main output of Hierarchical Clustering is a dendrogram, which shows the hierarchical relationship between the clusters:
  • 20. Measure of Distance In the example above, the distance between two clusters has been computed based on length of the straight line drawn from one cluster to another. This is commonly referred to as the Euclidean distance. Many other distance metrics have been developed. The choice of distance metric should be made based on theoretical concerns from the domain of study. That is, a distance metric needs to define similarity in a way that is sensible for the field of study. For example, if clustering crime sites in a city, city block distance may be appropriate (or, better yet, the time taken to travel between each location). Where there is no theoretical justification for an alternative, the Euclidean should generally be preferred, as it is usually the appropriate measure of distance in the physical world.