K Means Clustering of Web Pages based on Tags and Words

  • 1,412 views
Uploaded on

 

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,412
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. CLUSTERING USING K-MEANS IMPLEMENTATION
  • 2. OVERVIEW INTRODUCTION LITERATURE SURVEY IMPLEMENTATION DETAILS FUTURISTIC SCOPE
  • 3. INTRODUCTION A Cluster is nothing but a group of similar data objects. Clustering refers to a method by which large sets of data are grouped into clusters of smaller sets of similar data. Clustering has many types. These include : -  Hierarchical clustering  Partitional clustering  Density - based clustering  Distance - based clustering
  • 4.  One of the Algorithms for clustering is K-means Algorithm. As the name suggests, we divide the data set into K clusters; where k is a positive integer number. Firstly, we compute the centroid of each cluster. Then, the proximity of data points from this centroid is computed by finding the mean. This process continues iteratively till entire data is divided into proper k clusters.
  • 5. LITERATURE SURVEYClustering :- Let us consider an example :- f the three different colours into three different groups.
  • 6.  The balls of same colour are clustered into a group as shown belowTypes of Clustering :- Hard clustering Soft clustering
  • 7. Clustering Algorithms :- A clustering algorithm attempts to find natural groups of components (or data) based on some similarity. The clustering algorithm finds the centroid of a group of datasets. Most algorithms evaluate the distance between a point and the cluster centroids.
  • 8. K-Means Algorithm:- It is a distance-based, Partitional clustering algorithm. “K” stands for number of clusters, it is a user input to the algorithm. It is unsupervised algorithm. Each cluster is associated with a centroid. Each point is assigned to cluster with closest centroid. This algorithm is iterative in nature.
  • 9. 1) Select K points as the initial centroid.2) repeat3) form K clusters by assigning all points to the closest centroid.4) Recompute the centroid of each cluster.5) until the centroids don’t change.
  • 10. K-means example, step 1 k1Pickk=3initialcluster Y k2centers(randomly) k3 X
  • 11. K-means example, step 2 k1Assigneach pointto the k2closest Yclustercenter k3 X
  • 12. K-means example, step 3Move k1 k1eachclustercenterto the Y k2meanof each k3cluster k2 k3 X
  • 13. K-means example, step 4Reassignpoints k1closest to adifferentnew clustercenter YQ: Which k3points are k2reassigned? X
  • 14. K-means example, step 4 … k1A: threepoints withanimation Y k3 k2 X
  • 15. K-means example, step 5 k1re-computecluster Ymeans k3 k2 X
  • 16. K-means example, step 6 k1moveclustercenters Yto clustermeans k2 k3 X
  • 17. Advantages : Simple, understandable. Items automatically assigned to clusters.Disadvantages : The number of clusters, K, must be determined before hand. We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. Too sensitive to outliers.
  • 18. Applications of K-means : Unsupervised learning of neural networks. Pattern recognitions. Classification analysis. Artificial intelligence. Image processing. Machine vision. Email filtering. Web page classification.
  • 19. IMPLEMENTATION DETAILS