High Performance Processors and Systems Project Conducted by: Prof. Donatella Sciuto Students: Mohammad Ali  709610 Faisal...
Outline <ul><ul><li>Theory </li></ul></ul><ul><ul><ul><li>The MPEG-2 Algorithm </li></ul></ul></ul><ul><ul><ul><li>The Cer...
The MPEG-2 Algorithm <ul><li>MPEG-2 is an enhanced version of MPEG-1 </li></ul><ul><li>MPEG-1 is a lossy video compression...
The MPEG-2 Algorithm(2) <ul><li>An MPEG stream has a hierarchical image data structure which consists levels organized in ...
The MPEG-2 Encoder <ul><li>The main stages for MPEG-2 encoder are </li></ul><ul><li>Motion estimation (ME) </li></ul><ul><...
The MPEG-2 Encoder(2) <ul><ul><li>Only the following 2 (two) stages are parallelized </li></ul></ul><ul><ul><ul><li>Motion...
The CerberO Architecture <ul><ul><li>The problems with other Multiprocessor  Systems are </li></ul></ul><ul><ul><ul><li>Mu...
The CerberO Architecture(2) Figure: The CerberO Architecture CerberO is created by connecting different  MicroBlaze cores ...
The CerberO Architecture(3) <ul><li>CerberO uses the  external memory  of the  </li></ul><ul><li>Development board as a un...
The CerberO Architecture(4) <ul><li>SE recieves commands from the Microblaze  </li></ul><ul><li>Processor and communicates...
Task Scheduling and Allocation <ul><ul><li>CerberO Nano Kernel (CNK) : The thin software layer that pertmits to schedule a...
The Thread Execution Model <ul><li>At the  boot phase , one MicroBlaze is selected  </li></ul><ul><li>(by CNK) to be the  ...
The Thread Execution Model(2) <ul><li>When the execution of a  task ends , the </li></ul><ul><li>Processor checks if there...
Programming Model for Mpeg-2 <ul><ul><li>Memory Model </li></ul></ul><ul><ul><ul><li>Processors directy access local memor...
Programming Model for Mpeg-2 (2) <ul><ul><li>3. Consistency Model </li></ul></ul><ul><ul><ul><li>Ensuring sequential order...
MPEG-2 Applicaton Partition Model
Summary of the work <ul><ul><li>At first we have studied how JPEG sequential is converted to JPEG parallel to port in Cerb...
Upcoming SlideShare
Loading in …5
×

Porting MPEG-2 files on CerberO, a framework for FPGA based MPSoc

654 views

Published on

A project for the course &quot;High Performance Processors&quot;.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
654
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Porting MPEG-2 files on CerberO, a framework for FPGA based MPSoc

  1. 1. High Performance Processors and Systems Project Conducted by: Prof. Donatella Sciuto Students: Mohammad Ali 709610 Faisal Adnan 709399, Date: 25-09-07 Porting Mpeg-2 files on CerberO, a framework for FPGA based MPSoc
  2. 2. Outline <ul><ul><li>Theory </li></ul></ul><ul><ul><ul><li>The MPEG-2 Algorithm </li></ul></ul></ul><ul><ul><ul><li>The CerberO Architecture </li></ul></ul></ul><ul><ul><ul><li>Task Scheduling and Allocation </li></ul></ul></ul><ul><ul><ul><li>The Thread Execution Model </li></ul></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><ul><li>Programming model for MPEG-2 </li></ul></ul></ul><ul><ul><ul><li>MPEG-2 Application Paritition Model </li></ul></ul></ul><ul><ul><li>Summary of The Work </li></ul></ul><ul><ul><li>Future Works </li></ul></ul>
  3. 3.
  4. 4. The MPEG-2 Algorithm <ul><li>MPEG-2 is an enhanced version of MPEG-1 </li></ul><ul><li>MPEG-1 is a lossy video compression which enhances still-picture compression using </li></ul><ul><ul><li>Discrete Cosine Transformation (DCT), and </li></ul></ul><ul><ul><li>Run-length coding, with </li></ul></ul><ul><ul><li>Motion compensation </li></ul></ul>
  5. 5. The MPEG-2 Algorithm(2) <ul><li>An MPEG stream has a hierarchical image data structure which consists levels organized in the following manner </li></ul><ul><li>Video sequence </li></ul><ul><li>Group of pictures (GOP) </li></ul><ul><li>Picture (Frame) </li></ul><ul><li>Slice </li></ul><ul><li>Macroblock, and </li></ul><ul><li>Block </li></ul>Figure: MPEG video stream data structure
  6. 6. The MPEG-2 Encoder <ul><li>The main stages for MPEG-2 encoder are </li></ul><ul><li>Motion estimation (ME) </li></ul><ul><li>Forward DCT </li></ul><ul><li>Quantization (Q) </li></ul><ul><li>Variable length coding (VLC) </li></ul><ul><li>Rate control </li></ul><ul><li>Mode decision </li></ul>Fig 2: Block diagram of MPEG-2 encoder
  7. 7. The MPEG-2 Encoder(2) <ul><ul><li>Only the following 2 (two) stages are parallelized </li></ul></ul><ul><ul><ul><li>Motion estimation (ME), and </li></ul></ul></ul><ul><ul><ul><li>Forward DCT </li></ul></ul></ul><ul><ul><li>Because, they occupy the processor for most of the time. </li></ul></ul><ul><ul><ul><ul><li>Figure: Breakdown in execution time in MPEG-2 encoding </li></ul></ul></ul></ul>
  8. 8. The CerberO Architecture <ul><ul><li>The problems with other Multiprocessor Systems are </li></ul></ul><ul><ul><ul><li>Multiprocessor interrupt management </li></ul></ul></ul><ul><ul><ul><li>Processor Idetification </li></ul></ul></ul><ul><ul><ul><li>Cache coherency </li></ul></ul></ul><ul><ul><ul><li>Memory synchronization mechanism </li></ul></ul></ul><ul><ul><li>CerberO architecture addresses these limitations </li></ul></ul>
  9. 9. The CerberO Architecture(2) Figure: The CerberO Architecture CerberO is created by connecting different MicroBlaze cores to a shared instruction Memory and a shared data memory through a single OPB (On chip peripheral) bus.
  10. 10. The CerberO Architecture(3) <ul><li>CerberO uses the external memory of the </li></ul><ul><li>Development board as a unified instruction </li></ul><ul><li>Storage. </li></ul><ul><li>Each Microblaze has private data stored it </li></ul><ul><li>Its own private memory connected through </li></ul><ul><li>The Local Memory Bus (LMB). </li></ul><ul><li>A crossbar module is there to permit point to </li></ul><ul><li>point communication between all the </li></ul><ul><li>Processors to allow fast small data passing. </li></ul><ul><li>The Synchronization Engine (SE) acts </li></ul><ul><li>as a centralized hardware manager for locks </li></ul><ul><li>and barriers. </li></ul><ul><li>When a processor requests a lock , that is </li></ul><ul><li>Already acquired by another MicroBlaze, or </li></ul><ul><li>When it is waiting on a barrier, it starts spinning </li></ul><ul><li>In a busy-waiting loop. </li></ul>Figure: The CerberO Architecture
  11. 11. The CerberO Architecture(4) <ul><li>SE recieves commands from the Microblaze </li></ul><ul><li>Processor and communicates with an arbiter </li></ul><ul><li>that gives them access to three different </li></ul><ul><li>memories. </li></ul><ul><li>The first memory is a Context Addressable </li></ul><ul><li>Memory (CAM) and stores lock or barrier </li></ul><ul><li>Identifiers (i.e. Address of shared memory) </li></ul><ul><li>A second memory BRAM is stores the ID of </li></ul><ul><li>the processor that currently retains a lock or </li></ul><ul><li>Initialized a barrier. </li></ul><ul><li>A third memory, another BRAM is used to </li></ul><ul><li>Count processor arrivals. </li></ul>Figure: The CerberO Architecture
  12. 12. Task Scheduling and Allocation <ul><ul><li>CerberO Nano Kernel (CNK) : The thin software layer that pertmits to schedule and dynamically allocate threads on the processors of the archicture. </li></ul></ul><ul><ul><li>CNK relies on 2 shared tables which contain </li></ul></ul><ul><ul><ul><li>Ready Task </li></ul></ul></ul><ul><ul><ul><li>Free Processing Elements </li></ul></ul></ul>
  13. 13. The Thread Execution Model <ul><li>At the boot phase , one MicroBlaze is selected </li></ul><ul><li>(by CNK) to be the Master Processor and the </li></ul><ul><li>first thread is inserted into the Ready Table . </li></ul><ul><li>The Master Processor starts sending to the </li></ul><ul><li>Other ones the addresses of the tasks to </li></ul><ul><li>Execute over the CrossBar. </li></ul><ul><li>Then, any processor can start addding a new </li></ul><ul><li>Task to the ready task table , and, if there are </li></ul><ul><li>Any other free processors, it directly sends them </li></ul><ul><li>The address of the thread to execute. </li></ul>Figure: The threading model
  14. 14. The Thread Execution Model(2) <ul><li>When the execution of a task ends , the </li></ul><ul><li>Processor checks if there are any other </li></ul><ul><li>Ready Tasks. </li></ul><ul><li>If this is not the case, it sets itself as free </li></ul><ul><li>and Waits for the address of a new thread </li></ul><ul><li>to come from the CrossBar. </li></ul>Figure: The threading model
  15. 15.
  16. 16. Programming Model for Mpeg-2 <ul><ul><li>Memory Model </li></ul></ul><ul><ul><ul><li>Processors directy access local memory </li></ul></ul></ul><ul><ul><ul><li>Special Dynamic memory allocation for shared address space </li></ul></ul></ul><ul><ul><ul><li>Shared memory is allocated using SM_alloc() </li></ul></ul></ul><ul><ul><li>2. Execution Model </li></ul></ul><ul><ul><ul><li>Forked functions run in Parallel </li></ul></ul></ul><ul><ul><ul><ul><li>for (i=0;i<P;i++) { </li></ul></ul></ul></ul><ul><ul><ul><ul><li>int pid=create_task_on_table(ptmotion_estimation,16); </li></ul></ul></ul></ul><ul><ul><ul><ul><li>insert_parameter_on_task(pid,1,(unsigned int)oldorg); </li></ul></ul></ul></ul><ul><ul><ul><ul><li> ..... </li></ul></ul></ul></ul><ul><ul><ul><ul><li>schedule_pid(pid); } </li></ul></ul></ul></ul><ul><ul><ul><li>Join waits for completion </li></ul></ul></ul>
  17. 17. Programming Model for Mpeg-2 (2) <ul><ul><li>3. Consistency Model </li></ul></ul><ul><ul><ul><li>Ensuring sequential ordering for synchronization variable </li></ul></ul></ul><ul><ul><ul><ul><li>lock((int)counter); Local Synchronization </li></ul></ul></ul></ul><ul><ul><ul><ul><li> *counter = *counter + 1; </li></ul></ul></ul></ul><ul><ul><ul><ul><li> if(*counter == P) </li></ul></ul></ul></ul><ul><ul><ul><ul><li> { </li></ul></ul></ul></ul><ul><ul><ul><ul><li> *counter = 0; </li></ul></ul></ul></ul><ul><ul><ul><ul><li> higlander = 1; </li></ul></ul></ul></ul><ul><ul><ul><ul><li> } unlock((int)counter); </li></ul></ul></ul></ul><ul><ul><li>barrier((&(task_table->entry[pid_init]).barrier), NUM_PROCESSORS); </li></ul></ul><ul><ul><li>Global Synchronization </li></ul></ul>
  18. 18. MPEG-2 Applicaton Partition Model
  19. 19. Summary of the work <ul><ul><li>At first we have studied how JPEG sequential is converted to JPEG parallel to port in CerberO architecture. </li></ul></ul><ul><ul><li>We have parallelized the code for porting MPEG-2 files in </li></ul></ul><ul><ul><li>CerberO architecture. The previous version was sequential. </li></ul></ul><ul><ul><li>We used Xilinx toolchain to check our code. </li></ul></ul>Future Works <ul><ul><li>We can try to port the code directly onto the board. </li></ul></ul><ul><ul><li>Afterwards, we can compare the performance of MPEG-2 </li></ul></ul><ul><ul><li>Sequential and MPEG-2 parallel. </li></ul></ul>

×