# k-Means Clustering.pptx

Jun. 2, 2023β’β’
1 of 27

### k-Means Clustering.pptx

• 1. KSI Microsoft AEP mentorrbuddy.co m K-MEANS CLUSTERING
• 2. KSI Microsoft AEP mentorrbuddy.co m CLUSTERING β’ Clustering falls into the category of un- supervised learning. β’ Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups.
• 3. KSI Microsoft AEP mentorrbuddy.co m WHAT K-MEANS DOES FOR YOU ? π₯1 π₯2 Before K- Means K- Mean s π₯1 π₯2 After K-Means
• 4. KSI Microsoft AEP mentorrbuddy.co m HOW DOES IT WORKS? STEP 1: Choose the number K clusters. STEP 2: Select at random K points, the centroids(not necessarily from your dataset). STEP 3: Assign each data point to the closest centroid β That forms K clusters. STEP 4: Compute and place the new centroid of each cluster. STEP 5: Reassign each data point to the new closet centroid. If reassignment took place, go to step 4, otherwise go to FIN.(Model Created)
• 5. KSI Microsoft AEP mentorrbuddy.co m STEP 1: CHOOSE THE NUMBER K OF CLUSTERS: K=2
• 6. KSI Microsoft AEP mentorrbuddy.co m STEP 2: SELECT AT RANDOM K POINTS, THE CENTROID (NOT NECESSARILY FROM YOUR DATASET)
• 7. KSI Microsoft AEP mentorrbuddy.co m STEP 3: ASSIGN EACH DATA POINT TO THE CLOSEST CENTROID β THAT FORMS K CLUSTERS
• 8. KSI Microsoft AEP mentorrbuddy.co m STEP 4: COMPUTE AND PLACE THE NEW CENTROID OF EACH CLUSTER
• 9. KSI Microsoft AEP mentorrbuddy.co m STEP 5: REASSIGN EACH DATA POINT TO THE NEW CLOSEST CENTROID. IF AN REASSIGNMENT TOOK PLACE GO TO STEP 4, OTHERWISE GO TO FIN.
• 10. KSI Microsoft AEP mentorrbuddy.co m FINISHED MODEL
• 11. KSI Microsoft AEP mentorrbuddy.co m RANDOM INITIALIZATION TRAP π₯1 π₯2 π₯1
• 12. KSI Microsoft AEP mentorrbuddy.co m RANDOM INITIALIZATION TRAP But what would happen if we choose a bad random initialization?
• 13. KSI Microsoft AEP mentorrbuddy.co m RANDOM INITIALIZATION TRAP π₯1 π₯2 π₯1
• 14. KSI Microsoft AEP mentorrbuddy.co m AVOIDING THE TRAP Solution K- Means++
• 15. KSI Microsoft AEP mentorrbuddy.co m WITHIN CLUSTERS SUM OF SQUARES WCSS= π΄ πππ π‘ππππ(ππ, πΆ1)2 + π΄ πππ π‘ππππ(ππ, πΆ2)2 + π΄ πππ π‘ππππ(ππ, πΆ3)2 ππ ππ πππ’π π‘ππ 1 ππ ππ πππ’π π‘ππ 2 ππ ππ πππ’π π‘ππ 3
• 16. KSI Microsoft AEP mentorrbuddy.co m CONTD.. π₯1 π₯2 π₯1 WCSS= π΄ πππ π‘ππππ(ππ, πΆ1)2 + π΄ πππ π‘ππππ(ππ, πΆ2)2 + π΄ πππ π‘ππππ(ππ, πΆ3)2 ππ ππ πππ’π π‘ππ 1 ππ ππ πππ’π π‘ππ 2 ππ ππ πππ’π π‘ππ 3
• 17. KSI Microsoft AEP mentorrbuddy.co m π₯1 π₯2 π₯1 WCSS= π΄ πππ π‘ππππ(ππ, πΆ1)2 ππ ππ πππ’π π‘ππ 1
• 18. KSI Microsoft AEP mentorrbuddy.co m π₯1 π₯2 π₯1 WCSS= π΄ πππ π‘ππππ(ππ, πΆ1)2 + π΄ πππ π‘ππππ(ππ, πΆ2)2 ππ ππ πππ’π π‘ππ 1 ππ ππ πππ’π π‘ππ 2
• 19. KSI Microsoft AEP mentorrbuddy.co m π₯1 π₯2 π₯1 WCSS= π΄ πππ π‘ππππ(ππ, πΆ1)2 + π΄ πππ π‘ππππ(ππ, πΆ2)2 + π΄ πππ π‘ππππ(ππ, πΆ3)2 ππ ππ πππ’π π‘ππ 1 ππ ππ πππ’π π‘ππ 2 ππ ππ πππ’π π‘ππ 3
• 20. KSI Microsoft AEP mentorrbuddy.co m CHOOSING THE RIGHT NUMBER OF CLUSTERS π₯1 π₯2 π₯1 π₯2 π₯1
• 21. KSI Microsoft AEP mentorrbuddy.co m THE ELBOW METHOD (1/2) β’ It is a graph of distortion score (WCSS) vs k β’ Distortion can be measured by WCSS as discussed in the previous slides. β’ Depending on the distortion appropriate number of clusters k is chosen.
• 22. KSI Microsoft AEP mentorrbuddy.co m THE ELBOW METHOD (2/2) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1 2 3 4 5 6 7 8 9 10 Series2
• 23. KSI Microsoft AEP mentorrbuddy.co m ASSUMPTIONS β’ If any one of these 3 assumptions is violated, then k-means will fail: β’ k-means assume the variance of the distribution of each attribute (variable) is spherical; β’ All variables have the same variance; β’ the prior probability for all k clusters are the same, i.e. each cluster has roughly equal number of observations β’ Clusters in K-means are defined by taking the mean of all the data points in the cluster.
• 24. KSI Microsoft AEP mentorrbuddy.co m ADVANTAGES β’ Relatively simple to implement. β’ Scales to large data sets. β’ Guarantees convergence. β’ Can warm-start the positions of centroids. β’ Easily adapts to new examples. β’ Generalizes to clusters of different shapes and sizes, such as elliptical clusters.
• 25. KSI Microsoft AEP mentorrbuddy.co m DISADVANTAGES β’ The user has to specify π (the number of clusters) in the beginning. β’ k-means can only handle numerical data. β’ Being dependent on initial values. β’ Clustering data of varying sizes and density. β’ Clustering outliers. β’ Scaling with number of dimensions.
• 26. KSI Microsoft AEP mentorrbuddy.co m APPLICATIONS β’ Marketing : It can be used to characterize & discover customer segments for marketing purposes. β’ Biology : It can be used for classification among different species of plants and animals. β’ Libraries : It is used in clustering different books on the basis of topics and information. β’ Insurance : It is used to acknowledge the customers, their policies and identifying the frauds.
• 27. KSI Microsoft AEP mentorrbuddy.co m THANK YOU