3. 2
1. Introduction
what is K-Means Clustering
• k-means clustering is an algorithm that
groups data into k clusters.
• ‘K’ represents the number of clusters, and ‘means’ represnts
the mean distance between each centroid of the clusters
and the data points.
9. 8
3. Advantage / Disadvantage
Advantage
• It is Intuitive and easy to implement.
• Fast Speed: It is easy to apply to large
datasets, and it provides quick clustering
results in most situations.
10. 9
3. Advantage / Disadvantage
Disadvantage
• Clusters are assumed to be circular, and the
algorithm may not handle clusters of various
shapes well.
• The choice of initial centroids can influence the
clustering results.
in brief, to explain...
"Let's look at the picture on the left. We can intuitively think of how the data will be grouped."
right picture shows clustering that we expected
왼쪽의 그림은 18개의 데이터포인트와 2개의 centroid가 랜덤하게 선택된 모습이다.
centroid는 데이터포인트 중 하나가 아니어도 된다.
let’s look at the picture on the left. there are 18 datapoints and two randomly selected centroid.
오른쪽 그림은 각 단계에서 관련된 코드이다.
the picture on the right is the python codes about each step
each data point is assigned in closest centroids.
the boundary means two group are generated.
각 그룹 안에서의 데이터포인트들의 평균을 새로운 centroid로 설정합니다.
in each group , Set the average point of the data points to the new centroid.
look at the fisrt picture . datapoints are reassigned in new centroids. and centroids are updated in second picture.
since centroids moved, repeat step 2, 3.
finally centroids is fixed. then the process is finished