Image Compression
using K-Means
Clustering Method
In Supervision of:
Prof. Amit Mitra
Submitted By:
Gyanendra Awasthi(201315)
Pankaj Kumar(170455)
Contents
Clustering
K-means Clustering
Cost Function
Display on actual Dataset
Code Snippets
Examples of some original and compressed images
Uses of image compression
Results
Drawbacks of K-means
References
Clustering
•Group of collection of points into clusters
• Patterns are extracted from variables without
analysing any variable – unsupervised learning
•The points in each cluster are closer to one
another and far from points in other clusters
K-Means Clustering
• Unsupervised learning algorithm.
• Grouping of different data points which are like each other.
• Forming dissimilar groups and each group containing similar data points
• To partition data into distinct K clusters. K is defined by user.
•Works on predefined distinct K clusters in which each data point belongs to a
particular cluster.
Cost Function
•The goal is to minimize within-cluster dissimilarity.
•The Cost function(J) is:
J= 𝑖=1
𝑁
𝑘=1
𝐾
𝑟𝑖𝑘 𝑥(𝑖)
− 𝜇𝑘
2
Where= 𝑥(𝑖)
are data points
𝜇𝑘 is center of cluster k.
𝑟𝑖𝑘 = 1 if 𝑥(𝑖)
belongs to cluster k and 0 if it doesn’t belong to cluster k.
k = 1,.,…,K where K is the number of clusters provided
N is the number of total data points
•J represents sum of distances between each data 𝑥(𝑖)
and cluster center 𝜇𝑘.
• Cost function J is minimized for optimal clustering.
•After each iteration 𝜇𝑘 is obtained by the formula
𝜇𝑘 =
𝑖=1
𝑁
𝑟𝑖𝑘𝑥𝑖
𝑖=1
𝑁
𝑟𝑖𝑘
K- Means Algorithm
Step1- Randomly initialize the K data points as initial centroids for K clusters
Step2- Until the cluster centers are changed or for max iteration
◦ Allocate each data point to centroid whose data point is nearest
◦ Replace the cluster centres with the mean of the element in their
clusters
end
Display on actual dataset
Code- Snippets
To compute the nearest centroids for data points.
Image Compression
•An image is made up of small intensity dots called pixels.
•Each pixel contains three values which are the values of intensities of Red, Blue, Green colors
respectively for that pixel
•Reducing the size that an image takes while storing and transmitting
• Reducing the number of colors occurred in image to the most frequent colors appearing in it
• Essentially forming the different clusters of frequent occurring colors present in the image by
using pixel values
Original and Compressed Image-Parrot
𝑡𝑛= time taken for K- means algorithm to run for n iterations
Fig1b. 𝑡10= 1min 42sec
Fig1a. Original Image
Original and Compressed Image-Parrot
Fig1c. 𝑡10= 6min 32sec Fig1d. 𝑡10= 12min 42sec Fig.1e 𝑡10= 50min 40sec
Original and Compressed Image-Scenery
Fig2b. 𝑡10= 2 min 10sec
Fig2a. Original Image
Original and Compressed Image-Scenery
Fig2e. 𝑡10= 63 min 30sec
Fig2d. 𝑡10= 15 min 30sec
Fig2c. 𝑡10= 8 min 15sec
Uses of Image Compression
•Lesser data for storing the compressed image compared to original image,
reducing the cost of storage and transmission
• K-Means is utilized to compress visual contents in vast nexus of social
messaging app for its faster transmission and less storage utilization
•Used for archival purpose and for medical imaging, technical drawings
•Widely used in remote sensing via satellite, television broadcasting, for
capturing and transmitting satellite images
Results
Actual Size of Image of
Parrot
Number of clusters(K)
Specified while
Compressing the image
Reduced Size of the
Image of Parrot
1,87,236 bytes
100 52,032 bytes
20 54,888 bytes
15 54,888 bytes
12 54,351 bytes
2 43,616 bytes
Table1. Results of K-means clustering applied on parrot.jpg
Results
Actual Size of Image of
Scenery
Number of clusters(K)
Specified while
Compressing the image
Reduced Size of the
Image of Scenery
5,50,287 bytes
50 1,01,404 bytes
10 1,01,813 bytes
5 95,616 bytes
2 83,729 bytes
Table2. Results of K-means clustering applied on scenery.jpg
Drawbacks of K-means
• Gets sluggish as the size of data(image) increases.
• Time taken by algorithm increases as the number of cluster (K) increases.
• Results may represent a suboptimal local minimum.
• Works only for linear or almost linear boundaries
References
•Xing Wan (2019), “Application of K-means Algorithm in Image Compression”, IOP Conference
Series: Materials Science and Engineering, 563 052042,
•B. Reddaiah “A Study on Image Compression and its Applications”, International Journal of
Computer Applications, Volume 177 – No. 38, February 2020
•Hartigan, J. A., Wong, M. A. (1979). Algorithm as 136: a k-means clustering algorithm. Journal of
the Royal Statistical Society, 28(1), 100-10
•https://www.simplilearn.com/tutorials/machine-learning-tutorial/k-means-clustering-algorithm
•Van der Geer, J., Hanraads, J.A.J., Lupton, R.A. (2010) The art of writing a scientific article. J. Sci.
Commun., 163: 51–59.
Image Compression using K-Means Clustering Method

Image Compression using K-Means Clustering Method

  • 1.
    Image Compression using K-Means ClusteringMethod In Supervision of: Prof. Amit Mitra Submitted By: Gyanendra Awasthi(201315) Pankaj Kumar(170455)
  • 2.
    Contents Clustering K-means Clustering Cost Function Displayon actual Dataset Code Snippets Examples of some original and compressed images Uses of image compression Results Drawbacks of K-means References
  • 3.
    Clustering •Group of collectionof points into clusters • Patterns are extracted from variables without analysing any variable – unsupervised learning •The points in each cluster are closer to one another and far from points in other clusters
  • 4.
    K-Means Clustering • Unsupervisedlearning algorithm. • Grouping of different data points which are like each other. • Forming dissimilar groups and each group containing similar data points • To partition data into distinct K clusters. K is defined by user. •Works on predefined distinct K clusters in which each data point belongs to a particular cluster.
  • 5.
    Cost Function •The goalis to minimize within-cluster dissimilarity. •The Cost function(J) is: J= 𝑖=1 𝑁 𝑘=1 𝐾 𝑟𝑖𝑘 𝑥(𝑖) − 𝜇𝑘 2 Where= 𝑥(𝑖) are data points 𝜇𝑘 is center of cluster k. 𝑟𝑖𝑘 = 1 if 𝑥(𝑖) belongs to cluster k and 0 if it doesn’t belong to cluster k. k = 1,.,…,K where K is the number of clusters provided N is the number of total data points •J represents sum of distances between each data 𝑥(𝑖) and cluster center 𝜇𝑘. • Cost function J is minimized for optimal clustering. •After each iteration 𝜇𝑘 is obtained by the formula 𝜇𝑘 = 𝑖=1 𝑁 𝑟𝑖𝑘𝑥𝑖 𝑖=1 𝑁 𝑟𝑖𝑘
  • 6.
    K- Means Algorithm Step1-Randomly initialize the K data points as initial centroids for K clusters Step2- Until the cluster centers are changed or for max iteration ◦ Allocate each data point to centroid whose data point is nearest ◦ Replace the cluster centres with the mean of the element in their clusters end
  • 7.
  • 8.
    Code- Snippets To computethe nearest centroids for data points.
  • 9.
    Image Compression •An imageis made up of small intensity dots called pixels. •Each pixel contains three values which are the values of intensities of Red, Blue, Green colors respectively for that pixel •Reducing the size that an image takes while storing and transmitting • Reducing the number of colors occurred in image to the most frequent colors appearing in it • Essentially forming the different clusters of frequent occurring colors present in the image by using pixel values
  • 10.
    Original and CompressedImage-Parrot 𝑡𝑛= time taken for K- means algorithm to run for n iterations Fig1b. 𝑡10= 1min 42sec Fig1a. Original Image
  • 11.
    Original and CompressedImage-Parrot Fig1c. 𝑡10= 6min 32sec Fig1d. 𝑡10= 12min 42sec Fig.1e 𝑡10= 50min 40sec
  • 12.
    Original and CompressedImage-Scenery Fig2b. 𝑡10= 2 min 10sec Fig2a. Original Image
  • 13.
    Original and CompressedImage-Scenery Fig2e. 𝑡10= 63 min 30sec Fig2d. 𝑡10= 15 min 30sec Fig2c. 𝑡10= 8 min 15sec
  • 14.
    Uses of ImageCompression •Lesser data for storing the compressed image compared to original image, reducing the cost of storage and transmission • K-Means is utilized to compress visual contents in vast nexus of social messaging app for its faster transmission and less storage utilization •Used for archival purpose and for medical imaging, technical drawings •Widely used in remote sensing via satellite, television broadcasting, for capturing and transmitting satellite images
  • 15.
    Results Actual Size ofImage of Parrot Number of clusters(K) Specified while Compressing the image Reduced Size of the Image of Parrot 1,87,236 bytes 100 52,032 bytes 20 54,888 bytes 15 54,888 bytes 12 54,351 bytes 2 43,616 bytes Table1. Results of K-means clustering applied on parrot.jpg
  • 16.
    Results Actual Size ofImage of Scenery Number of clusters(K) Specified while Compressing the image Reduced Size of the Image of Scenery 5,50,287 bytes 50 1,01,404 bytes 10 1,01,813 bytes 5 95,616 bytes 2 83,729 bytes Table2. Results of K-means clustering applied on scenery.jpg
  • 17.
    Drawbacks of K-means •Gets sluggish as the size of data(image) increases. • Time taken by algorithm increases as the number of cluster (K) increases. • Results may represent a suboptimal local minimum. • Works only for linear or almost linear boundaries
  • 18.
    References •Xing Wan (2019),“Application of K-means Algorithm in Image Compression”, IOP Conference Series: Materials Science and Engineering, 563 052042, •B. Reddaiah “A Study on Image Compression and its Applications”, International Journal of Computer Applications, Volume 177 – No. 38, February 2020 •Hartigan, J. A., Wong, M. A. (1979). Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society, 28(1), 100-10 •https://www.simplilearn.com/tutorials/machine-learning-tutorial/k-means-clustering-algorithm •Van der Geer, J., Hanraads, J.A.J., Lupton, R.A. (2010) The art of writing a scientific article. J. Sci. Commun., 163: 51–59.