Multi Processor Architecture for image processing


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Multi Processor Architecture for image processing

  1. 1. Multiprocessor Architecture for Image Processing Under the guidance of Dr. Anshul Kumar Mayank Kumar 2006EE10331 Pushpendre Rastogi 2006EE50412
  2. 2. Introduction <ul><li>Signal Processing, particularly image/video processing in embedded platform for implementing complex algorithms meeting real time deadlines requires high end processors. </li></ul><ul><li>Power consumption and cost are the major issues against massive deployments of Embedded processing nodes. </li></ul><ul><ul><li>Eg surveillance camera network, traffic monitoring and control etc </li></ul></ul>
  3. 3. Introduction <ul><li>FPGA/Reconfigurable ASIC provide promising solution to the above problem by designing specific hardware utilizing the parallelism in algorithm. </li></ul><ul><li>Though, there are many shortcomings </li></ul><ul><ul><li>Gates get used up when complex algorithm are implemented. </li></ul></ul><ul><ul><li>Implementing sequential algorithms on FPGA directly is highly inefficient. </li></ul></ul>
  4. 4. Our approach <ul><li>To design a multiprocessor architecture to facilitate the processing of high resolution image/video frames. </li></ul><ul><ul><li>Design of PE, or node processor customized to handle pixel/region level operations efficiently. </li></ul></ul><ul><ul><li>Given the PE, design of the architecture for interconnecting these processors and design of input/output Hardware. </li></ul></ul>
  5. 5. Novelty <ul><li>By having an array of processors, we are exploiting the parallelism offered by processing different regions of frame in different processors. </li></ul><ul><li>In any processor, sequential algorithm are efficiently implemented by providing application specific instruction set. </li></ul><ul><li>Locally Sequential and Globally parallel </li></ul>
  6. 6. Locally Sequential Globally Parallel <ul><li>Any class of algorithms which are window based and essentially operates on regions of the image, rather then the image as a whole. </li></ul><ul><ul><li>Image change detection for surveillance applications </li></ul></ul><ul><ul><li>Optic flow, motion estimation, filtering etc </li></ul></ul><ul><li>We chose “ Image change detection using Background Modeling ” as a test algorithm. </li></ul>
  7. 7. Word Done <ul><li>Hardware Part </li></ul><ul><ul><li>Initial Architecture </li></ul></ul><ul><ul><ul><li>Drawbacks </li></ul></ul></ul><ul><ul><li>Change of platform </li></ul></ul><ul><ul><li>New Architecture </li></ul></ul><ul><ul><ul><li>Implementation </li></ul></ul></ul><ul><li>Software Part </li></ul><ul><ul><li>Algorithm Analysis and implementation </li></ul></ul><ul><ul><li>Fixed point Matlab Simulation </li></ul></ul><ul><ul><li>C Implementation </li></ul></ul>
  8. 8. Initial Architecture Camera Video ADC` Virtex II Pro RGB Conversion Power PC M1 M1 M1 M1 M1 M1 M1 M1 M1 M E M O R Y Video DAC MPMC Monitor Array Topology
  9. 9. Architectural Drawbacks <ul><li>Multi processor Memory controller could only handle finite (2-4) parallel access from different processors. </li></ul><ul><ul><li>Solution: We should use BRAM for parallel access. </li></ul></ul><ul><li>We need to store the whole frame as the image format in XUPV30 is interlacing. -> Will use up all available BRAMs </li></ul><ul><ul><li>Solution: Use a board which provides progressive data. Moreover, all digital camera these days provide progressive image data. </li></ul></ul>
  10. 10. Change of Platform <ul><li>We switched to Xilinx ML401 Virtex Video Starter Kit. </li></ul><ul><ul><li>Provides progressive Video input </li></ul></ul><ul><ul><li>Much more BRAM, </li></ul></ul><ul><ul><li>Matlab/Simulink as a design platform for designing at higher abstraction level. </li></ul></ul><ul><li>Though, switching platform consumed time due to a associated learning curve. </li></ul>
  11. 11. New Architecture Camera Video ADC` VIO_in Custom Memory Controller (Verilog Module) ` Array of Block Ram Array of Processor Network VIO_in Video DAC Monitor
  12. 12. Description and Implementation <ul><li>ML401 VSK provides two FPGAs </li></ul><ul><ul><li>Xilinx XUP2V7 for image input/output </li></ul></ul><ul><ul><li>Xilinx ML401 for developing application. </li></ul></ul><ul><li>VIO_in and VIO_out are reference design which sandwiches the user level design. It provides progressive image data. </li></ul><ul><li>We designed the custom Memory controller suited to our needs. It writes data to FIFOs implemented using BRAMs. </li></ul>
  13. 13. Custom Memory controller <ul><li>Takes H_sync, v_sync, rst, Pixel_clk as input and selects a target FIFO to write the incoming data. </li></ul><ul><li>Each BRAM stores Image data corresponding to 4 lines. </li></ul><ul><li>It first empties the queue reading the result computed in the last iteration. </li></ul><ul><li>The other end of the FIFO is read through the Microblaze processor using FSL Links. </li></ul>
  14. 14. Processor Network <ul><li>Each processor network comprises of one Master processor, and 1-7 slave processors. </li></ul><ul><li>Master processor reads data from FIFO and distribute the work among slave processors. </li></ul><ul><li>We demonstrated this using 3 processor- 1 master and 2 slave </li></ul>
  15. 15. Processor Network Basic Design <ul><li>We connected the master processor to Uart to establish a serial link for input/output. </li></ul><ul><li>The master processor connected to slave processor which are running the same algorithm. </li></ul><ul><li>It takes input from uart, and passes it to diferent slaves. </li></ul><ul><li>Master processor distributes work, by sending different regions of the image to different processors. </li></ul>
  16. 16. Software Architecture <ul><li>Studied the Adaptive Background Mixture Model. [1], [2] </li></ul><ul><li>Analysis of the algorithm for: </li></ul><ul><ul><li>Parallelism exploitation </li></ul></ul><ul><ul><li>Length of code for implementation </li></ul></ul><ul><ul><li>Memory requirements to store data. </li></ul></ul><ul><ul><li>Feasibility </li></ul></ul>
  17. 17. The Algorithm <ul><li>Models each region of the image frame as a sum of N Gaussians with respective weights attached, </li></ul><ul><li>Update the model when new frame arrives. </li></ul><ul><li>Depending on which Gaussian distribution (k) the current pixel data belongs to , make the Foreground/Background decision </li></ul><ul><li>Effectively models repetitive changes in background. </li></ul><ul><li>Resistant to noise and slow illumination variations </li></ul>
  18. 18. Fixed Point Matlab simulation <ul><li>Using Fixed point toolbox, we redefined our variables and constant in Q format. </li></ul><ul><li>Data Types: </li></ul>DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 31 DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 23 Weight/other Constants Pixel Data
  19. 19. Fixed Point Calculations RoundMode: nearest OverflowMode: wrap ProductMode: SpecifyPrecision ProductWordLength: 32 ProductFractionLength: 23 SumMode: SpecifyPrecision SumWordLength: 32 SumFractionLength: 23 CastBeforeSum: true
  20. 20. Matlab simulation
  21. 21. C implementation <ul><li>The Code is ported onto Xilinx Platform Studio for putting it onto Microblaze processors. </li></ul><ul><li>Simulations shows equivalent results. </li></ul><ul><li>All the PE contains the same code, they get different data to operate upon coming from different regions of the image. </li></ul>
  22. 22. Pitfalls <ul><li>Xilinx VSK design suit promises high level design of image/video processing using simulink. </li></ul><ul><ul><li>We tried using this, but it does not provide enough granularity for our design needs. </li></ul></ul><ul><ul><li>Design become very complex to debug. </li></ul></ul><ul><ul><li>Very tough to tweak sample design </li></ul></ul><ul><li>Xilinx EDK should be used for these kind of designs. </li></ul>
  23. 23. Conclusions <ul><li>We designed different parts of our proposed architecture: </li></ul><ul><ul><li>Input/output </li></ul></ul><ul><ul><li>Custom Memory controller </li></ul></ul><ul><ul><li>Basic Network processor. </li></ul></ul><ul><li>We have simulated and implemented the test algorithm on a network of processor as a proof of concept. </li></ul><ul><li>We learnt the FPGA design flow and the Hardware Software Co-design. </li></ul>
  24. 24. Future work <ul><li>In this work, we used Microblaze processors. </li></ul><ul><ul><li>Instruction set not optimized for Pixel/Region based image processing. </li></ul></ul><ul><ul><li>Lots of extra features that can be trimmed. </li></ul></ul><ul><li>Design of a custom processor suited for these application. </li></ul><ul><ul><li>Less FPGA Area need </li></ul></ul><ul><ul><li>More efficient </li></ul></ul>
  25. 25. References <ul><li>[1] Adaptive Background Mixture Model for Real-time tracking – Cris Stauffer, WELGrimson: AI, MIT – 1999 </li></ul><ul><li>[2] Understanding Background Mixture model- P Wayne Power, Johnn A. Schoonees: Image and vision computing NZ, 2002 </li></ul><ul><li>[3] A Microblaze based Multiprocessor SoC – P. Huerta, J. Castillo, J.I. Martinaze: 2007 </li></ul><ul><li>[4]Xilinx Microblaze ProcessorReference V7.0 UG081 </li></ul><ul><li>[5]Xilinx Virtex II Pro User Guide </li></ul><ul><li>[6] Xilinx Video Start Kit (VSK) user Guide </li></ul><ul><li>[7] Xilinx: SAPP529 Connecting customized IP to the Microblaze Soft Processor Core using FSL Link </li></ul><ul><li>[8] EDK 9.1i Microblaze tutorial – A getting Started Guide </li></ul><ul><li>[9] Xilinx White paper: Multiprocessor on XPS </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.