Optimization of Video Compression Parameters through Genetic Algorithms (2008)


Published on

Applied research of an automated method to quickly emerge better encoding profiles, using Genetic Algorithms with many source video samples to converge to the specified limits of processing time, video quality and file size.

Published in: Technology, Art & Photos
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Optimization of Video Compression Parameters through Genetic Algorithms (2008)

  1. 1. Optimization of Video Compression Parameters through Genetic Algorithms Carlo "zED" Caputo globo.com – webmedia Avenida das Americas, 500, Bl. 18, Sala 103 Rio de Janeiro, Brazil carlo.caputo@corp.globo.com ABSTRACT 2. METHOD Applied research of an automated method to quickly emerge Given the difficulty to optimize more than one variable [2] better encoding profiles, using Genetic Algorithms with many and the short time we had to implement a solution, it seemed source video samples to converge to the specified limits of necessary to use an evolutionary method to make the nec- processing time, video quality and file size. essary automation possible. Genetic Algorithms were chosen for the first prototype, us- Categories and Subject Descriptors ing two-point crossover and roulette wheel selection – to I.4.2 [Data]: Image Processing and Computer Vision—Com- avoid to be stuck with at a local maxima. So most of the pression (Coding); I.2.m [Computing Methodologies]: work were employed in finding the appropriate variables for Artificial Intelligence—Miscellaneous, Evolutionary comput- to forge an adequate fitness function. ing and genetic algorithms; C.2.4 [Computer Systems Organization]: Computer-Communication Networks—Dis- tributed Systems 2.1 Encoding Parameters to Optimize The worst problem were the difficulty to find the relation between the encoder tool parameters [1] and the final vari- Keywords ables to optimize. A raw search would explore a domain of Video Compression, Encoding Profile, 2-Pass Encoding, Bi- 3.18981e+78 combinations in 91 parameters (for 1st and 2nd trate, ffmpeg, Peak Signal Noise Ratio (PSNR), Structural pass), whose values could usually range from 0 to 1 or even SIMilarity (SSIM), Genetic Algorithms, Two-point Crossover, from 0 to millions, but that in our case were represented by Roulette Wheel Selection, Global Optimization, Local Max- a discrete sample of meaningful values. ima, Master-Worker Paradigm, Distributed Encoding, UDP Broadcast, Webmedia 2.2 Evaluation Variables to Optimize The result of encoding would be evaluated into the following 1. INTRODUCTION few variables that would be optimized without worrying to Traditionally the encoding profile tunning has being per- keep track of their complex relation to the overwhelming formed as a time-consuming craft, by manual tinkering of number of encoding parameters. the parameters and intense observation. In this process the artisan should have a good eye to spot what each parameter 2.2.1 Size of Encoded Video File (bitrate) caused on the encoded video; such labor were quite mislead- The main variable is bitrate, either for the network or storage ing and error prone, because some combinations, despite restrictions, that can be a great concern for those who deal bringing a little quality improvement, could cause a great with a large audience or production. slow down, or a seemingly harmless changes could crash the encoder for some unusual source video. In any circumstances, it would not make sense to compare video quality without controlling the bitrate. As there weren’t much time, nor interest, to understand what parameters produced the desired result, I had to con- 2.2.2 Time of Encoding (timerate) ceive a tool capable of accelerating the research of profiles This variable really changes from profile to profile and were under the ever changing technological restrictions. Some- the one hardest to optimize by hand; it’s easy to achieve thing that mixed smart fine tunning of the encoded videos, great quality with an extremely long processing time, but it’s with robust stress testing of the profile under the encoder of very hard to make it very good with a very short encoding choice [1]. time. As I am not against the manual tinkering and believe that This variable were called timerate, for short, and calculated it’s the only way to reach a truly pleasant video quality, using encoding time over duration of the encoded video. I also believe that a profile must match the source being encoded and ideally each source video must have a specific At first, on the single-processed prototype, wall-clock time profile fine tunned just for it. Well, clearly it’s not a task were used as encoding time and later, as there were multiple for human endeavor. machines with many processes each, CPU time were used.
  2. 2. 2.2.3 Picture Quality (PSNR) fact, the more the profile was tested, the surer one After controlling bitrate and timerate, there is the video qual- can be that the fitness is valid. It is necessary for two ity, roughly represented by the PSNR of the encoded frames reasons: (a) there is an optimization that reduces the against the equivalent source ones. In our case, a great ad- duration of the source video tested to speedup the evo- vantage of using PSNR is that the encoder had it built-in and lution; (b) even if there is an error in the middle of the the evaluation cost were almost insignificant, but there are processing the profiles that went farther are benefited. problems associated with this method, specially when op- erations of different nature were applied to the frames (like • Minimize the errors and warnings – in a similar spirit of the blurriness noted below, on the problem between scaling the above, profiles are penalized by the amount of error and compression), if this becomes unacceptable an options and warning messages. Either to minimize problems is to implement an independent SSIM evaluation [4]. on working profiles or to make broken ones to approach a working state. But in all cases, to use it in the fitness formula the PSNR of • Minimize the number of parameters – slightly avoid each video frame must be combined into one single number; useless combinations of parameters that would not bring average them will not do, because small slips in each frame any improvement and may raise the risk of instability, would look the same as a huge damage in few frames, and the by bringing seldom tested options into the profiles. latter is very undesirable. So as the intra-frame evaluation used PSNR the inter-frame must use it as well, let’s call it PSNR’ and define as follows: 2.3 Hacks to Speedup n−1 • Abort soon if too slow – monitor encodings on-the- 1 X fly and abort, if it is taking longer than minimally M SE (S, E) = max (0, P SN Re − P SN R (S(k), E(k))) n acceptable for the total time of encoding expected. k=0 ! • One encode per individual tested – pick at random only P SN Re P SN R (S, E) = 20 · log 10 p one source video to process for each individual, since M SE (S, E) processing all of them would make the evolution many S(k) and E(k) are the kth source and encoded frames; n times slower. Happened, as expected, that good pro- are the number of video frames encoded; M SE (S, E) is the files appeared in genes of many individuals and were mean square root of the encoded frames against the source verified against multiple sources, which made them us- ones; P SN Re , or target PSNR, is the maximum expected able globally. PSNR for each frame. • Process small bits at beginning – encode only a small part of each source video at the beginning of the evo- In other words, the encoded video’s PSNR’ is the PSNR of lution, when the entropy is higher and there is a lot each frame’s PSNR against a target PSNR. of wasted processing with completely broken profiles. Then, as the fitness raises, encode longer segments of It could be noted that, if multiple source videos were being the source videos to fine tune the surviving profiles. encoded for each profile tested, it would be necessary to pick the PSNR’ from each video and generate a PSNR” in a • Profile injection – profiles could be injected on-the-fly similar way than before, and this new variable is the one that at the beginning of any generation, including the first should be used on the fitness calculation. In this project it one, which needs it most. No process had to be stopped didn’t happened because of an optimization described below. for this; a file with the command-line of the encoding tool should be placed on a watch folder and the con- Also should be noted that, to compare source and encoded trol process would map it’s parameters to genes of new frames using PSNR, both must have the same number of individuals. It’s very streamlined, because the state pixels. And, since the resize must happen before this com- of each generation is also stored as the command-line parison, the scale method – which have many parameters of each individual commented with the fitness value. itself – is not taken in account by the PSNR calculation. So, This way, even changes on the definition of the genetic if those parameters are being optimized, the evolution will strip does not break the gene pool, because it’s state make sure the scaling method used make the smaller amount can be loaded as usual, mapping the parameters to the of compression artifacts, for this reason the blurriest scaler appropriate genes. For this reason, injections can hap- will be automatically selected. A possible solution to this pen much smoothly for profiles foreign to this system problem is to use the best scaler you have to generate the (e.g., exchanged on video forums). Usually to inject a reference frames as close as possible to the source ones. good profile on the gene pool would bring it’s qualities to many individuals on subsequent generations, being 2.2.4 Other Optimizations it a faster processing, better quality or more stability. • Have a working profile, above all – it’s critical to distin- guish whose profiles are working, for it there were two implicit bands for fitness values: (a) 0 ≤ fitness < 10 2.4 Distributed Processing The first prototype could only run as a single process, and for profiles that broke at some point of the process and despite it had greatly improved the testing of candidate pro- (b) fitness ≥ 10 for those that processed all the way files for one source video, the result profile were only capable to the end of the requested duration. of encoding that video well. This behavior was expected, but • Maximize duration of source encoded – the longer the to overcome it the gene pool had to be evolved with mul- source video processed the higher the fitness value; in tiple source videos. That would use a lot more processing
  3. 3. power per generation and add a lot of entropy, requiring Finally, fitness of a working individual: to raise the population limit per generation, to avoid loos- ing valuable genes in chaos. Again, more individuals means fi =10+ (1) even more processing power, so we had to quickly integrated min(0.5 + (fg /fe ), 1)· (2) the evolution control process with distributed workers [3] to pi · (3) perform the encoding. 2 min(be /bi , 1) · (4) 2 At start, every worker process binds an UDP port and keep min(te /ti , 1) · (5) standing-by, waiting for the control process to broadcast a „„ « « clamp(te /2, ti , te ) job offering. Upon the offer a simple protocol verifies that 1−2· · 0.001 + 0.999 · (6) only one worker get the job. This way, the control process te „„ « « generated new individuals and asked for the workers to per- gi 1− · 0.0001 + 0.9999 (7) form the evaluation by giving them the profiles with the ge command-line of the encoder – much like they would be in production –, and waiting for the processing log, that had pe , be and te are the PSNR’, bitrate and timerate targets for all the information necessary to compose the individual fit- the evolution; the starting value of 10 is the base fitness for ness. All the job control is handled by UDP, but the videos, working profiles; fg is the best fitness on the last closed gen- profiles and logs are transfered through common LAN file eration and fe is the target fitness of the whole evolution, sharing. The whole design is in such fashion that to plug or then fg /fe represents how mature the evolution is so that unplug workers on-the-fly would not compromise the evalu- no steep turn in fitness are accepted, this may avoid local ation of any individual on the gene pool. maxima, specially because the start of the evolution is pro- cessed with smaller samples and those fitness values worth less (see tlg , below); gi is the number of genes (parameters) 2.5 Evolution that assumed non-default values and ge the total number of Assuming that there are values in a gene, genes in an in- genes per individuals, in this run. dividual, individuals in a generation and generations in an evolution (v ∈ g ∈ I ∈ G ∈ E). In other words, each gene In case of fatal errors the fitness got a special value: g holds a position on the genetic strip of the individual I and can assume some values v preselected from a reasonable fi =1 + 9· (8) range, to reduce the search space. At the end of each gen- min(0.5 + (fg /fe ), 1)· (9) eration G, the number of individuals must be trimmed to a 1 maximum population. This is accomplished by sorting the · (10) 1 + errorsi /99 individuals by the value of the fitness function and discard- pi ing the less fit. The target at each generation is to select · (11) the fittest individuals, to determine the highest fitness of pe the generation (fg ) and ultimately achieve the determined min(progressi /durationi ), 1) (12) target fitness of the whole evolution (fe ). At the beginning errorsi are the number of errors and warnings received from it loads the individuals from the saved gene pool file, or gen- the encoder (fatal errors add 99 to this value and warnings erate them randomly, to fill the maximum population. And add only 1); progressi is how much of the requested work after that, it looks for some profiles in the injection watch was completed – it were designed to work with 2-pass en- folder, as it does at every generation start. codings, so the 1st pass is accounted as 25% of the whole progress, and the 2nd pass start on 25% and goes through 2.5.1 Fitness Function 100%. The design of the fitness function were highly empirical, so, to simplify it’s formulation – along with the well known 2.5.2 Generation End max() and min() – the following helper function and vari- At this point we know which is the most fitted individual of ables were used: 8 the generation (fg ), the population can be trimmed to the < a (x < a) maximum allowed and it’s gene pool can be saved to disk. clamp (a, x, b) = b (x > b) But besides that, some variables have to be readjusted: : x (a ≤ x ≤ b) • timelimit – duration of the source videos to be evalu- pi = min(P SN R (Si , Ei ), pe ) ated per individual profile. As seen before, fg /fe reg- „ „ «« ulates the proximity to the end of the evolution, so the sizei · 8 advance of this value is a smooth interactive process, bi = max 0.001, durationi · 1024 grows from tl0 to tl1 : ti = max(0.001, cputimei /durationi ) tlg = 1 + tl0 + min(fg /fe , 1) ∗ (tl1 − tl0 − 1) Si are the frames of the source video sample; Ei are the • timeout timerate – this is the variable that kill the en- encoded frames of that video; sizei is the encoded file size (in coding if it’s taking too long to process. Based on the bytes); durationi is the duration (in seconds) of the encoded same fg /fe , it shrinks from tt0 to tt1 : video; cputimei is the time that the process took to execute and pe is the target PSNR’. ttg = tt0 + min(fg /fe , 1) ∗ (tt1 − tt0 )
  4. 4. 2.7 Perceptual Evaluation It’s strongly recommended that at any point of the evolution the encoded videos are verified by eye and adjusts applied to the gene structure or the automated evaluation method. And the final selection of profiles should be tested against as many sources as possible, on the types of content that will be used in production (e.g., baseball games, soap op- eras, news reports or all of them, for general usage). For that, the final test were performed using the winner profile to encode 675 source videos – representing an average daily production. All the encoded videos had their resulting bi- trate and timerate verified and they were played-back in loop on the monitor screens around the work area for scrutiny of all Webmedia personnel. 3. CONCLUSION Genetic Algorithms can quickly bring great improvement on the quality of hand-crafted video profiles; from the design to the end of the first run, it took less than two weeks. But the clear disadvantage is that the profile can fail badly with sources outside the trained content type. On the other side, using too much video sources to train it, would not Figure 1: evolution chart displaying a normalized only slow down the process, but can also generate mediocre pi combination of the main variables optimized ( bi ·ti ) profiles. A better solution is to create a profile for every different source type, like every TV show or every different over about 5 days sport game. It is important to be noted that the greatest benefit of the 2.6 Reference Run first run of this project were to successfully bring the legacy The Genetic Algorithms had crossover rate at 80%, mutation codec – Sorenson Spark, that had a more widespread adop- at 2.5% and population pruning to 250 individuals. The tion among our user-base – on par with a newer technology basic targets and ranges were set like this: target bitrate, – On2 VP6, that only had proprietary tools which did not on this reference run, were 600kbps; target timerate were integrate well on our production system. This brought us 2, so CPU time could be around two times the duration of time for the stabilization of the technology and the adoption the encoded video; target PSNR were set to 40db, above of H.264 as the standard encoding, instead of lesser, propri- which any improvement would not be significant, and target etary alternatives that pressed quality improvements in the PSNR’ to 60; target fitness were 70, target PSNR’ plus the gap between the standardization of the convergence codec base fitness for working profiles (10); timelimit ranges from and the stability of the legacy one. 5 to 45 and timeout timerate ranges from 50 to 2, from the loosest to the strictest timeout policy, at the end. Finally, for the organization, it brings the security of charted evolution, above the insecurity of sole subjective evaluation. The evolution started with one process and finished on six machines, using a total of thirteen 3GHz cores. One ma- 4. ACKNOWLEDGMENTS chine were used for the genetic control and the others, with The distributed processing would not be possible without two worker processes each, dedicated to encoding. More the help of Fernando Luiz Valente de Souza. processes were plugged on-the-fly, as the machines became available. And, at the beginning, there were injected the best quality profile (extremely slow to encode), the fastest 5. REFERENCES [1] F. Bellard. FFmpeg Documentation. one (of subpar quality) and the one used in production (more http://www.ffmpeg.org/, 2004-2008. stable). Finally, the winner choice took in account the pro- files that had the better fitness for the most source videos. [2] D. E. G. K. Deb and J. Horn. Genetic algorithms. In in Search, Optimization, and Machine Learning. Many breakthroughs are distinguishable on the progress chart Addison-Wesley, 1989. (Figure 1), some associated with changes in the circum- [3] J. pierre Goux; Jeff Linderoth and M. Yoder. stances described below, by the birth count: at the begin- Metacomputing and the master-worker paradigm. In ning there were one process, only one source video were eval- Preprint MCS/ANL-P792-0200, Mathematics and uated and the bitrate constraint was 420kbps; at the indi- Computer Science Division, Argonne National vidual 4000 it changed to 600kbps; at 7000 many machines Laboratory, Argonne, 2000. entered the game and there were several changes on the code; [4] Z. Wang, A. C. Bovik, H. R. Sheikh, S. Member, E. P. at 20000 the fitness function changed to allow working pro- Simoncelli, and S. Member. Image quality assessment: files with warnings to evolve freely, as they can work sur- From error visibility to structural similarity. IEEE prisingly well in some cases. Transactions on Image Processing, 13:600–612, 2004.