Successfully reported this slideshow.
RaVioli: A Parallel Video Processing Librarywith Auto Resolution Adjustability<br />Hiroko SAKURAI†Masaomi OHNO†Shintaro O...
Background(1/2): Portability of Video Applications<br />Real-time video processing applications<br />should run on a great...
Background(2/2): Many-Core Era is Coming<br />Multi/Many-core processors have come into wide use<br /> Video processing ap...
A Video Processing Library: RaVioli<br />RaVioli provides:<br />Easy writeability of<br />pseudo real-time video processin...
Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing...
Traditional Image Processing Program<br />Image processing program written by traditional C<br />Applied Computing 2009<br...
Image Processing Program with RaVioli<br />Grayscale program using RaVioli<br />Applied Computing 2009<br />7<br />RV_Imag...
Video Processing Program with RaVioli<br />Video processing program with RaVioli<br />Applied Computing 2009<br />8<br />R...
Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing...
Auto-Adjustment of Computation Load<br />Spatial resolution (pixel rate)<br />Ss: Spatial stride<br />Temporal resolution ...
Priority Set<br />Which stride should be increased?<br />(Spatial resolution, Temporal resolution)=<br />(7,3) : keep spat...
Detecting Overload<br />Applied Computing 2009<br />12<br />RV_Video class<br />Frame interval<br />Higher-oder<br />metho...
Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing...
Parallelization: Block Decomposition<br />Image processing with c/c++<br />Image processing with RaVioli<br />RV_PixGraySc...
Parallelization: Block Decomposition<br />Image processing with RaVioli<br />RV_PixGrayScale(RV_PixPix){<br />intY;<br />Y...
Translator for Block Decomposition<br />Reduction operations may be required<br />Applied Computing 2009<br />16<br />Tran...
for Reference: Example Code with OpenMP<br />OpenMP<br />Standardized model of parallel programming for C/C++ and FORTRAN<...
Reduction Op.s can be Automatically Added<br />Applied Computing 2009<br />18<br />intsum = 0;<br />void pixSum(RV_Pixel p...
Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing...
Assisting Pipeline Implementation<br />For building pipeline<br />Whole process is split into several stages<br />Several ...
is troublesome for programmers</li></ul>thread1<br />thread2<br />thread3<br />binarize<br />edge<br />detect<br />hough<b...
Interface for Pipelining<br />Applied Computing 2009<br />21<br />RV_Pipedata* GrayScale(RV_Pipedata* data){<br />   // Gr...
Interface for Pipelining<br />Applied Computing 2009<br />22<br />RV_Pipedata* GrayScale(RV_Pipedata* data){<br />   // Gr...
Load Imbalance between Stages<br />Applied Computing 2009<br />23<br />thread1<br />thread2<br />thread3<br />A<br />B<br ...
Automatic Load Balancing<br />Applied Computing 2009<br />24<br />thread1<br />thread2<br />thread3<br />frame1<br />frame...
Automatic Load Balancing<br />Applied Computing 2009<br />25<br />thread1<br />thread2<br />thread3<br />A<br />B<br />C<b...
Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing...
Evaluation: Resolution Adjustment<br />27<br />frame rate(fps)<br />Number of pixels<br />Priority set<br />Spatial resolu...
Evaluation: Parallelization Functions<br />Applied Computing 2009<br />28<br />
Evaluation: Auto Block Decomposition<br />Applied Computing 2009<br />29<br />voronoi<br />laplacian<br />pixAverage<br />...
Evaluation: Hough transform<br />30<br />     Reduction variable initialization<br />     Reduction operations<br />hough<...
Evaluation: Automatic load balancing<br />31<br />A<br />B<br />C<br />A<br />B<br />C<br />A<br />B<br />C<br />A<br />B<...
Conclusion<br />RaVioli<br />hides resolutions from programmers<br />pseudo real-time processing<br />has semi-automatic p...
Upcoming SlideShare
Loading in …5
×

RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability

389 views

Published on

  • Be the first to comment

  • Be the first to like this

RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability

  1. 1. RaVioli: A Parallel Video Processing Librarywith Auto Resolution Adjustability<br />Hiroko SAKURAI†Masaomi OHNO†Shintaro OKADA‡<br />Tomoaki TSUMURA† Hiroshi MATSUO†<br />† Nagoya Institute of Technology, Japan<br />‡ Toyota Motor Corp., Japan<br />IADIS International Conference APPLIED COMPUTING 2009<br />November 19 – 21, 2009<br />Rome, Italy<br />
  2. 2. Background(1/2): Portability of Video Applications<br />Real-time video processing applications<br />should run on a great variety of platforms<br />Cell phones<br />Cars<br />PCs<br />Principal goal of an application<br />Long battery life<br />High throughput<br />Good accuracy<br />Applied Computing 2009<br />2<br />We must rewrite a video processing program,<br />when porting it to another platform<br />
  3. 3. Background(2/2): Many-Core Era is Coming<br />Multi/Many-core processors have come into wide use<br /> Video processing applications<br />have various parallelisms<br />Pixels in video frames have data parallelism<br />Multiple frames can be processed in parallel by pipelining<br />promise good performance on such parallel systems<br />Applied Computing 2009<br />3<br />Parallelizing programs is not so simple<br />It becomes much important to improve compilers and libraries<br />
  4. 4. A Video Processing Library: RaVioli<br />RaVioli provides:<br />Easy writeability of<br />pseudo real-time video processing<br />Interfaces for parallelization<br />Detecting data dependencies and formulating reductions<br />Balancing loadsof pipeline stages<br />Applied Computing 2009<br />4<br />
  5. 5. Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing applications<br />Pseudo real-time processing by adjusting loads<br />Semi-automatic parallelization functions<br />Automatic block decomposition<br />Pipelining interface with automatic load balance mechanism<br />Evaluation results<br />Applied Computing 2009<br />5<br />
  6. 6. Traditional Image Processing Program<br />Image processing program written by traditional C<br />Applied Computing 2009<br />6<br />InImg<br />void main{<br /> // Input image<br />intluma;<br />for(int y=0;y<180;y++){<br /> for(int x=0;x<200;x++){<br />luma = (int)(<br />InImg[x][y].R*0.299<br />   +InImg[x][y].G*0.587<br />   +InImg[x][y].B*0.114);<br />  OutImg[x][y].R = luma;<br />OutImg[x][y].G = luma;<br />OutImg[x][y].B = luma;<br />  }<br /> }<br />}<br />OutImg<br />
  7. 7. Image Processing Program with RaVioli<br />Grayscale program using RaVioli<br />Applied Computing 2009<br />7<br />RV_ImageInImg<br />Component function<br />RV_PixelGrayScale(RV_Pixel Pix){<br /> intluma;<br /> luma=(int)(<br />   Pix.R()*0.299<br />   +Pix.G()*0.587<br />   +Pix.B()*0.114);<br /> return(Pix.setRGB(luma, luma, luma));<br />}<br />void main(){<br />RV_ImageInImg,OutImg;<br /> // Input image<br />OutImg=InImg.procPix(GrayScale);<br />}<br />Higher-oder<br />method<br />procPix<br />RV_ImageOutImg<br />
  8. 8. Video Processing Program with RaVioli<br />Video processing program with RaVioli<br />Applied Computing 2009<br />8<br />RV_Imageobj<br />RV_PixelGrayScale(RV_Pixelp){<br />}<br />Higher-oder<br />method<br />Grayscale<br />RV_ImageGrayScale(RV_Imageimg){<br />}<br />RV_Imageobj<br />RV_Videoobj<br />Higher-oder<br />method<br />
  9. 9. Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing applications<br />Pseudo real-time processing by adjusting loads<br />Semi-automatic parallelization functions<br />Automatic block decomposition<br />Pipelining interface with automatic load balance mechanism<br />Evaluation results<br />Applied Computing 2009<br />9<br />
  10. 10. Auto-Adjustment of Computation Load<br />Spatial resolution (pixel rate)<br />Ss: Spatial stride<br />Temporal resolution (frame rate)<br />St: Temporal stride<br />Applied Computing 2009<br />10<br />1/4<br />Ss=1<br />Ss=2<br />1/2<br />St=1<br />St=2<br />
  11. 11. Priority Set<br />Which stride should be increased?<br />(Spatial resolution, Temporal resolution)=<br />(7,3) : keep spatial stride and temporal stride in the ratio of “3:7”<br />(1,0) : keep spatial stride “1”<br />Applied Computing 2009<br />11<br />Moving object detection<br />Temporal resolution<br />Pattern recognition<br />Spatial resolution<br />We can specify resolution priorities by priority set<br />St=1<br />St=2<br />Ss=1<br />Ss=2<br />
  12. 12. Detecting Overload<br />Applied Computing 2009<br />12<br />RV_Video class<br />Frame interval<br />Higher-oder<br />method<br />Overloaded!<br /><<br />Ring<br />buffer<br />Processing time<br />RV_Image instance<br />Image<br />Processing<br />program<br />Higher-order<br />method<br />
  13. 13. Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing applications<br />Pseudo real-time processing by adjusting loads<br />Semi-automatic parallelization functions<br />Automatic block decomposition<br />Pipelining interface with automatic load balance mechanism<br />Evaluation results of our work<br />Applied Computing 2009<br />13<br />
  14. 14. Parallelization: Block Decomposition<br />Image processing with c/c++<br />Image processing with RaVioli<br />RV_PixGrayScale(RV_PixPix){<br />intY;<br /> Y = (int)(<br />Pix.R()*0.299<br /> +Pix.G()*0.587<br /> +Pix.B()*0.114);<br />return( Pix.setRGB(Y, Y, Y) );<br />}<br />void main(){<br />RV_ImgInImg, OutImg;<br />OutImg = InImg.procPix(GrayScale);<br />}<br />void main(){<br />byte InImg[180][200];<br />byte OutImg[180][200];<br />for( inty=0; y<180; y++ ){<br />for( intx=0; x<200; x++ ){<br />OutImg[x][y]=(int)(<br />InImg[x][y].R*0.299<br /> +InImg[x][y].G*0.587<br /> +InImg[x][y].B*0.114);<br />}<br />}<br />}<br />
  15. 15. Parallelization: Block Decomposition<br />Image processing with RaVioli<br />RV_PixGrayScale(RV_PixPix){<br />intY;<br />Y = (int)(<br />Pix.R()*0.299<br /> +Pix.G()*0.587<br /> +Pix.B()*0.114);<br />return( Pix.setRGB(Y, Y, Y) );<br />}<br />voidmain(){<br />RV_ImgInImg,OutImg;<br />OutImg = InImg.procPix(GrayScale);<br />}<br />thread1<br />thread2<br />thread4<br />thread3<br />OutImg = InImg.procPix(GrayScale, 4);<br />InImg<br />
  16. 16. Translator for Block Decomposition<br />Reduction operations may be required<br />Applied Computing 2009<br />16<br />Translator<br />RV_PixGrayScale(RV_PixPix){<br />intY;<br />Y = (int)(<br />Pix.R()*0.299<br /> +Pix.G()*0.587<br /> +Pix.B()*0.114);<br />return(Pix.setRGB(Y, Y, Y) );<br />}<br />void main(){<br />RV_ImgInImg,OutImg;<br />OutImg = InImg.procPix(GrayScale);<br />}<br />RV_PixGrayScale(RV_PixPix){<br />intY;<br />Y = (int)(<br />Pix.R()*0.299<br /> +Pix.G()*0.587<br /> +Pix.B()*0.114);<br /> return( Pix.setRGB(Y, Y, Y) );<br />}<br />void main(){<br />RV_ImgInImg,OutImg;<br />OutImg = InImg.procPix(GrayScale, 4);<br />}<br />parallelize<br />
  17. 17. for Reference: Example Code with OpenMP<br />OpenMP<br />Standardized model of parallel programming for C/C++ and FORTRAN<br />#define NUM_THREADS 4<br />inti; int sum=0;<br />#pragma parallel<br />for(i=1;i<=256;i++)<br /> sum+= i;<br />Reduction pragma<br />reduction(+:sum)<br />Process 1<br />Process 2<br />Process 3<br />Process 4<br />for( ... )sum1+= i;<br />for( ... )sum2+= i;<br />for( ... )sum3+= i;<br />for( ... )sum4+= i;<br />sum<br />
  18. 18. Reduction Op.s can be Automatically Added<br />Applied Computing 2009<br />18<br />intsum = 0;<br />void pixSum(RV_Pixel p){<br />sum += 1;<br />}<br />intmain(){<br />RV_ImageInputImg;<br /> //read image data in “InputImg”<br />InputImg.procPix(pixSum);<br />}<br />void __pixSum(intthreadNum)<br />{<br />mutex_lock(&Mutex);<br /> sum += _localsum;<br />mutex_unlock(&Mutex);<br />}<br />__thread int_localsum= 0;<br />sum += 1;<br />_localsum+= 1;<br />Component function<br />InputImg.procPix(pixSum, 4);<br />inputImg.reduction(__pixSum);<br />sum += 1<br />associative law ?<br />commutative law ? <br />associative law OK!<br />commutative law OK!<br />Reduction<br />operation<br />_localsum+=1;<br />sum+= _localsum;<br />
  19. 19. Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing applications<br />Pseudo real-time processing by adjusting loads<br />Semi-automatic parallelization functions<br />Automatic block decomposition<br />Pipelining interface with automatic load balance mechanism<br />Evaluation results of our work<br />Applied Computing 2009<br />19<br />
  20. 20. Assisting Pipeline Implementation<br />For building pipeline<br />Whole process is split into several stages<br />Several threads are created and assigned to the stages<br />FIFOs are needed to be implemented and managed for data transfer between stages<br />Applied Computing 2009<br />20<br />Creating threads and FIFOs <br /><ul><li>is not the essence of video processing
  21. 21. is troublesome for programmers</li></ul>thread1<br />thread2<br />thread3<br />binarize<br />edge<br />detect<br />hough<br />trans<br />FIFO3<br />FIFO2<br />FIFO1<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />
  22. 22. Interface for Pipelining<br />Applied Computing 2009<br />21<br />RV_Pipedata* GrayScale(RV_Pipedata* data){<br /> // Grayscale processing for a frame<br /> return data;<br />}<br />RV_Pipedata* Laplacian(RV_Pipedata* data){<br /> // Laplacian filter processing for a frame<br /> return data;}<br />int main (){<br />RV_Pipelinepipe;<br />pipe.push(GrayScale);<br />pipe.push(Laplacian);<br />pipe.run();<br /> return 0;}<br />RV_Pipeline pipe<br />FIFO1<br />FIFO2<br />thread1<br />thread2<br />push<br />Laplacian<br />GrayScale<br />run<br />・<br />・<br />・<br />・<br />・<br />・<br />
  23. 23. Interface for Pipelining<br />Applied Computing 2009<br />22<br />RV_Pipedata* GrayScale(RV_Pipedata* data){<br /> // Grayscale processing for a frame<br /> return data;<br />}<br />RV_Pipedata* Laplacian(RV_Pipedata* data){<br /> // Laplacian filter processing for a frame<br /> return data;}<br />int main (){<br />RV_Pipelinepipe;<br />pipe.push(GrayScale);<br />pipe.push(Laplacian);<br />pipe.run();<br /> return 0;}<br />RV_Pipeline pipe<br />FIFO1<br />FIFO2<br />push<br />thread1<br />thread2<br />Laplacian<br />GrayScale<br />run<br />・<br />・<br />・<br />・<br />・<br />・<br />
  24. 24. Load Imbalance between Stages<br />Applied Computing 2009<br />23<br />thread1<br />thread2<br />thread3<br />A<br />B<br />C<br />frame1<br />A<br />B<br />C<br />frame2<br />A<br />B<br />C<br />frame3<br />Pipeline<br />stalls<br />thread3<br />thread1<br />thread2<br />1<br />A<br />B<br />C<br />2<br />3<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />
  25. 25. Automatic Load Balancing<br />Applied Computing 2009<br />24<br />thread1<br />thread2<br />thread3<br />frame1<br />frame2<br />frame3<br />thread2<br />C<br />thread3<br />thread1<br />thread2<br />thread1<br />A<br />B<br />C<br />B<br />thread3<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />C<br />
  26. 26. Automatic Load Balancing<br />Applied Computing 2009<br />25<br />thread1<br />thread2<br />thread3<br />A<br />B<br />C<br />frame1<br />A<br />B<br />C<br />frame2<br />A<br />B<br />C<br />frame3<br />thread2<br />C<br />thread1<br />thread1<br />1<br />A<br />B<br />2<br />3<br />thread3<br />・<br />・<br />・<br />・<br />・<br />・<br />C<br />
  27. 27. Outline<br />Concept of RaVioli<br />RaVioli hides resolutions from programmers<br />Easy writeability of video processing applications<br />Pseudo real-time processing by adjusting loads<br />Semi-automatic parallelization functions<br />Automatic parallelization with block decomposition<br />Pipelining interfacewith automatic load balance mechanism<br />Evaluation results of our work<br />Applied Computing 2009<br />26<br />
  28. 28. Evaluation: Resolution Adjustment<br />27<br />frame rate(fps)<br />Number of pixels<br />Priority set<br />Spatial resolution :Temporal resolution<br />0:1<br />1:0<br />3:7<br />
  29. 29. Evaluation: Parallelization Functions<br />Applied Computing 2009<br />28<br />
  30. 30. Evaluation: Auto Block Decomposition<br />Applied Computing 2009<br />29<br />voronoi<br />laplacian<br />pixAverage<br />hough<br />
  31. 31. Evaluation: Hough transform<br />30<br /> Reduction variable initialization<br /> Reduction operations<br />hough<br />
  32. 32. Evaluation: Automatic load balancing<br />31<br />A<br />B<br />C<br />A<br />B<br />C<br />A<br />B<br />C<br />A<br />B<br />C<br />A<br />A<br />B<br />C<br />A<br />B<br />C<br />A<br />B<br />
  33. 33. Conclusion<br />RaVioli<br />hides resolutions from programmers<br />pseudo real-time processing<br />has semi-automatic parallelization functions<br />semi-automatic block decompotision<br />load balancing mechanism between pipeline stages<br />Our future works<br />implementing automatic power-saving function to RaVioli<br />making RaVioli adaptive to various platforms such as Cell Broadband Engine<br />designing easy-to-write language which cooperates with RaVioli<br />Applied Computing 2009<br />32<br />
  34. 34. Automatic Load Balancing<br />Applied Computing 2009<br />33<br />Manager<br />thread3<br />thread1<br />thread2<br />1<br />2<br />3<br />A<br />B<br />C<br />4<br />5<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />
  35. 35. Automatic Load Balancing<br />Applied Computing 2009<br />34<br />A:1<br />B:1<br />C:4<br />Manager<br />thread2<br />1<br />1<br />4<br />C<br />thread3<br />thread1<br />thread2<br />thread1<br />4<br />5<br />2<br />A<br />B<br />C<br />B<br />3<br />1<br />thread3<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />・<br />C<br />1<br />

×