Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Whitepaper - SmartVision for Image Transcoding

1,491 views

Published on

SmartVision is a collection of methods developed by Instart Logic to automatically analyze certain image types as they pass through the service and intelligently determine how compression can be applied to images without causing a perceptible degradation in image quality. By doing this we allow web applications to send fewer bytes back to the browser, resulting in faster-loading pages.

Published in: Technology
  • Be the first to comment

Whitepaper - SmartVision for Image Transcoding

  1. 1. SmartVisionTM for Image Transcoding A Deep Dive WHITE PAPER !
  2. 2. instartlogic.com Copyright © 2016 Instart Logic, Inc. All rights reserved. 1 WHITE PAPER Executive Summary: . SmartVisionTM for Image Transcoding Image Compression A site publisher wants to assure that the image files he serves take up the smallest number of bytes as possible so that they can be received and displayed by their users’ web browsers as fast as possible. One way of doing this is to choose a form of image compression to apply to the images. While some forms of compression are lossless, you can achieve a greater reduction in the number of bytes by applying a lossy form of compression. While some image data is lost, the fact is that for most images, the change is not perceptible by the human eye. Therefore, you can almost always reduce an image’s quality to a certain degree and the image will still appear identical to the original. SmartVision is a collection of methods developed by Instart Logic to automatically analyze certain image types as they pass through the service and intelligently determine how compression can be applied to images without causing a perceptible degradation in image quality. By doing this we allow web applications to send fewer bytes back to the browser, resulting in faster-loading pages. This paper focuses specifically on our approach to JPEG image optimization. Human eyes use specialized cells called rods and cones for vision. Rods are highly sensitive to the luminance (brightness) of light but are not very sensitive to colors. Cones are separated into different types that are sensitive to specific color frequencies (red, green and blue). The ratio of rods and cones is not equal, and in fact there are 20 rods for each cone in the human eye. This rod/cone ratio imbalance determines the fact that the eye can notice differences in luminance much smaller than differences in color. Most forms of image compression therefore take groups of pixels and replace them with pixels with similar colors but almost identical luminance. If human vision was more refined (say to the level of a bird of prey), current methods of image compression would be unbearable, and would have to be discontinued. Rod density Cone density -80 -60 -40 -20 0 20 40 60 80 Densityinthousandspersquaremm Angular separation from fovea (degrees)
  3. 3. instartlogic.com Copyright © 2016 Instart Logic, Inc. All rights reserved. 2 WHITE PAPER Historical Context When images are compressed to be loaded into webservers their file format (png, jpg, gif, etc.) is selected based on the best balance of file size vs quality of reproduction (if the webmaster knows what they are doing) that corresponds to the business needs/metrics of that site. Once the file format is selected they can choose the level of compression based on the Quality Factor (Q Factor) that is desired for that image based on the same business drivers listed above. The problem is that in real practice, the person performing this task typically picks the default Q Factor value and doesn’t try to optimize this step. Also if the image was given/purchased by/from a third party, this Q Factor cannot be changed. This results in the fact that the majority of the images on the internet not having the optimal compression rate (Q Factor). The first thought to try to solve this issue would be a manual method of adjustment of compression levels by paid technicians. Internal studies at Instart Logic determined that the time to properly classify the optimal compression value of an image would take a human at least 3 seconds (and typically greater than 5 seconds) to complete. This time estimate does not take into the fact that after a few days of straight human processing, these tasks would cause a technician to have tunnel vision as well as developing other physiological and psychological maladies. The large volume of images that would need to be processed also makes this method unworkable. For example, Instart Logic processes a very large number of images daily (in the Billions!), so it is easy to surmise that it would take an enormously large team of people (in the hundreds) to achieve this task on a day-to-day basis. In a word, the method of using human intervention is unworkable. Artificial Intelligence (AI) is the only logical path to take, and at Instart Logic, this is the approach we have taken. Measuring Loss in Compression In measuring the different levels of degradation based on increased aggressiveness of compression, two common tools utilized in the industry are the Peak Signal to Noise Ratio (PSNR) and the Structural Similarity (SSIM) values. Peak Signal to Noise Ratio (most commonly seen in broadcast degradation) is used in image compression to determine how much noise (loss) can be introduced to an image before it becomes noticeable. PSNR is measured in decibels (dB) because not only is it a ratio of two values, but it needs to be logarithmic due to its scaling (see Figure 1). Logarithmic scaling is utilized when the units being measured have an exponential relation to each other, and needs to be compared based on the perception level instead of their raw numeric value. By knowing how much noise can be introduced to an image, analysts can determine the appropriate aggressiveness of compression for that image. Structural Similarity is calculated by taking two images and determining the correlation between these images. As images become further compressed (and loss increases) the Structural Similarity decreases in value (see Figure 2). Since these variances are not logarithmic, a linear scale (from 1 to 0) can be utilized.
  4. 4. instartlogic.com Copyright © 2016 Instart Logic, Inc. All rights reserved. 3 WHITE PAPER Executive Summary: . Figure 1 Figure 2 http://www.mdpi.com/jsan/jsan-01-00003/article_deploy/html/images/jsan-01-00003-g013-1024.png http://www.jku.at/cg/content/e48361/e60689/e297573/SSIM.png
  5. 5. instartlogic.com Copyright © 2016 Instart Logic, Inc. All rights reserved. 4 WHITE PAPER Groupings of Similar Images In the development of SmartVision, Instart Logic took a large sample of images and took their Peak Signal to Noise Ratio as well as the Structural Similarity values at different rates of compression and recorded them into a matrix. This matrix internally is called a signature, and every image regardless of size or image type will have a unique one. Next, an extremely large pool of image signatures was grouped based on their mathematical similarity to each other. When a group of mathematically similar signatures were determined, they were assembled into logical buckets we call clusters. The graphic below illustrates images being divided into clusters based on their signatures. A sample set of images from each of the clusters were then crowdsourced out to establish a consensus on the most aggressive value of compression they could be given without noticeable degradation. Each cluster could now be assigned an ideal compression setting. For example, cluster 11 will be given a value of 20% compression, and cluster 3 will be given a value of 5% compression. Cluster -415.34. 47 -46.83 -48.53 12.12 -7.73 -1.03 -0.17 G = v. u -30.19 -21.86 7.37 12.07 -6.55 2.91 0.18 0.14 -61.20 -60.76 77.13 34.10 -13.20 2.38 0.42 -1.07 27.24 10.25 -24.56 -14.76 -3.95 -5.94 -2.42 -1.07 56.12 13.15 -28.91 -10.24 -1.87 -2.38 -0.88 -1.17 -20.10 -7.09 9.93 6.30 1.75 0.94 -3.02 -0.10 -2.39 -8.54 5.42 1.83 -2.79 4.30 4.12 1.68
  6. 6. instartlogic.com Copyright © 2016 Instart Logic, Inc. All rights reserved. 5 WHITE PAPER SmartVision in Use In practice, when each new image is processed by our service, SmartVision calculates its signature (using its Peak SignaltoNoiseRatioandStructuralSimilarity)andcompares it with the signatures of the clusters within our image library. When a match is found, the compression level of that matched cluster is applied to that specific image file. The business benefit to customers using SmartVision is that it allows images to be delivered with an optimal compression level on the fly, maximizing the user experience by providing the fastest download time possible without noticeable degradation of image quality. Further Reading SmartVision was developed by a team of engineers headed by a UC Berkeley-educated Ph.D. in AI specializing in computer vision. To arrive at a deeper understanding of how SmartVision works, this paper would have to start introducing equations like this: Since this level of sophistication is beyond the desire of most readers, we have decided to omit them from this paper. However, if an even deeper dive is of interest to you, we welcome you to visit this tech paper.(1) Cluster 1 Cluster 2 Which cluster does this belong to? 1) http://www.slideshare.net/InstartLogic/research-qoedriven-unsupervised-image-categorization-for-optimized-web-delivery-acm-mm-2014 a(i, k) a(i, k) min 0,r (k,k) +∑ max (0,r (iˇ,k)) for i ≠ k and ∑ max (0,r (iˇ,k)). iˇ€ { i, k } iˇ≠k

×