0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# K Means Clustering of Web Pages based on Tags and Words

1,902

Published on

Published in: Education
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
1,902
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
0
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. CLUSTERING USING K-MEANS IMPLEMENTATION
• 2. OVERVIEW INTRODUCTION LITERATURE SURVEY IMPLEMENTATION DETAILS FUTURISTIC SCOPE
• 3. INTRODUCTION A Cluster is nothing but a group of similar data objects. Clustering refers to a method by which large sets of data are grouped into clusters of smaller sets of similar data. Clustering has many types. These include : -  Hierarchical clustering  Partitional clustering  Density - based clustering  Distance - based clustering
• 4.  One of the Algorithms for clustering is K-means Algorithm. As the name suggests, we divide the data set into K clusters; where k is a positive integer number. Firstly, we compute the centroid of each cluster. Then, the proximity of data points from this centroid is computed by finding the mean. This process continues iteratively till entire data is divided into proper k clusters.
• 5. LITERATURE SURVEYClustering :- Let us consider an example :- f the three different colours into three different groups.
• 6.  The balls of same colour are clustered into a group as shown belowTypes of Clustering :- Hard clustering Soft clustering
• 7. Clustering Algorithms :- A clustering algorithm attempts to find natural groups of components (or data) based on some similarity. The clustering algorithm finds the centroid of a group of datasets. Most algorithms evaluate the distance between a point and the cluster centroids.
• 8. K-Means Algorithm:- It is a distance-based, Partitional clustering algorithm. “K” stands for number of clusters, it is a user input to the algorithm. It is unsupervised algorithm. Each cluster is associated with a centroid. Each point is assigned to cluster with closest centroid. This algorithm is iterative in nature.
• 9. 1) Select K points as the initial centroid.2) repeat3) form K clusters by assigning all points to the closest centroid.4) Recompute the centroid of each cluster.5) until the centroids don’t change.
• 10. K-means example, step 1 k1Pickk=3initialcluster Y k2centers(randomly) k3 X
• 11. K-means example, step 2 k1Assigneach pointto the k2closest Yclustercenter k3 X
• 12. K-means example, step 3Move k1 k1eachclustercenterto the Y k2meanof each k3cluster k2 k3 X
• 13. K-means example, step 4Reassignpoints k1closest to adifferentnew clustercenter YQ: Which k3points are k2reassigned? X
• 14. K-means example, step 4 … k1A: threepoints withanimation Y k3 k2 X
• 15. K-means example, step 5 k1re-computecluster Ymeans k3 k2 X
• 16. K-means example, step 6 k1moveclustercenters Yto clustermeans k2 k3 X
• 17. Advantages : Simple, understandable. Items automatically assigned to clusters.Disadvantages : The number of clusters, K, must be determined before hand. We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. Too sensitive to outliers.
• 18. Applications of K-means : Unsupervised learning of neural networks. Pattern recognitions. Classification analysis. Artificial intelligence. Image processing. Machine vision. Email filtering. Web page classification.
• 19. IMPLEMENTATION DETAILS