K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science. In this topic, we will learn what is K-means clustering algorithm, how the algorithm works, along with the Python implementation of k-means clustering.
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
K means Clustering Algorithm
1. NAME- SOUMA MAITI
ROLL N0- 27500120016
REG NO.- 202750100110016
DEPARTMENT- COMPUTER SCIENCE
YEAR- 3RD YEAR(6TH SEMESTER)
SUBJECT NAME- DATA WAREHOUSING
AND DATA MINING
SUBJECT CODE- PEC-IT602B
2. K-MEANS CLUSTERING ALGORITHM
• K-MEANS CLUSTERING ALGORITHM COMPUTES
THE CENTROIDS AND ITERATES UNTIL WE IT
FINDS OPTIMAL CENTROID. IT ASSUMES THAT
THE NUMBER OF CLUSTERS ARE ALREADY
KNOWN. IT IS ALSO CALLED FLAT CLUSTERING
ALGORITHM. THE NUMBER OF CLUSTERS
IDENTIFIED FROM DATA BY ALGORITHM IS
REPRESENTED BY ‘K’ IN K-MEANS.
• IN THIS ALGORITHM, THE DATA POINTS ARE
ASSIGNED TO A CLUSTER IN SUCH A MANNER
THAT THE SUM OF THE SQUARED DISTANCE
BETWEEN THE DATA POINTS AND CENTROID
WOULD BE MINIMUM. IT IS TO BE UNDERSTOOD
THAT LESS VARIATION WITHIN THE CLUSTERS
WILL LEAD TO MORE SIMILAR DATA POINTS
3. HOW DOES THE K-MEANS ALGORITHM
WORK?
HE WORKING OF THE K-MEANS ALGORITHM IS EXPLAINED IN
THE BELOW STEPS:
• STEP-1: SELECT THE NUMBER K TO DECIDE THE NUMBER
OF CLUSTERS.
• STEP-2: SELECT RANDOM K POINTS OR CENTROIDS. (IT
CAN BE OTHER FROM THE INPUT DATASET).
• STEP-3: ASSIGN EACH DATA POINT TO THEIR CLOSEST
CENTROID, WHICH WILL FORM THE PREDEFINED K
CLUSTERS.
• STEP-4: CALCULATE THE VARIANCE AND PLACE A NEW
CENTROID OF EACH CLUSTER.
• STEP-5: REPEAT THE THIRD STEPS, WHICH MEANS
REASSIGN EACH DATAPOINT TO THE NEW CLOSEST
CENTROID OF EACH CLUSTER.
• STEP-6: IF ANY REASSIGNMENT OCCURS, THEN GO TO
4. ADVANTAGES OF K MEANS CLUSTERING
ALGORITM:
• THE FOLLOWING ARE SOME ADVANTAGES OF K-
MEANS CLUSTERING ALGORITHMS
• IT IS VERY EASY TO UNDERSTAND AND IMPLEMENT.
• IF WE HAVE LARGE NUMBER OF VARIABLES THEN,
K-MEANS WOULD BE FASTER THAN HIERARCHICAL
CLUSTERING.
• ON RE-COMPUTATION OF CENTROIDS, AN
INSTANCE CAN CHANGE THE CLUSTER.
• TIGHTER CLUSTERS ARE FORMED WITH K-MEANS
AS COMPARED TO HIERARCHICAL CLUSTERING.
APPLICATIONS OF K-MEANS CLUSTERING
ALGORITHM:
THE MAIN GOALS OF CLUSTER ANALYSIS ARE −
•TO GET A MEANINGFUL INTUITION FROM THE DATA
WE ARE WORKING WITH.
•CLUSTER-THEN-PREDICT WHERE DIFFERENT MODELS
WILL BE BUILT FOR DIFFERENT SUBGROUPS.
TO FULFILL THE ABOVE-MENTIONED GOALS, K-
MEANS CLUSTERING IS PERFORMING WELL ENOUGH.
IT CAN BE USED IN FOLLOWING APPLICATIONS −
•MARKET SEGMENTATION
•DOCUMENT CLUSTERING
•IMAGE SEGMENTATION
•IMAGE COMPRESSION
•CUSTOMER SEGMENTATION
•ANALYZING THE TREND ON DYNAMIC DATA