Vliw and superscaler


Published on

AN Overview of VLIW Architecture...

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Vliw and superscaler

  1. 1. VLIW Very Large Instruction WordN m e : Ra fi d a r awww.csted.blogspot.in 1
  2. 2. • VLIW Overview• Instruction Level Parallelism (most relevant) 2
  3. 3. The method for exploiting parallelismThe key to higher performance inmicroprocessors is the ability to achievehigher degree of parallelism (fine-grain,instruction-level parallelism):> pipelining : the process of breaking downtask into substeps and executing them indifferent parts of processor. pipelining ismostly employed in pipelined processors.> replication: process of replication ofexecuting unit. Each unit then carry sameoperation on different data. 3
  4. 4. VLIW• In VLIW and superscaler both the method pipelining and replication are employed to achieve higher performace.• In both of them it involves specifying multiple independent operations per instruction.• However the two architectures differ in a way they specify such instructions.• This kind of complexity of specifying instructions in superscaler computer is at Hardware level• While as it as software (Compiler) level in 4 VLIW.
  5. 5. Problems we meet• it is not easy to exploit parallel execution in real programs, which are written in a serial fashion.• Mainstream high-level languages (C and FORTRAN) allow a limited freedom to execute operations in parallel. 5
  6. 6. Parallel processingProcessing instructions in parallel requires three major tasks:1. checking dependencies between instructions to determine which instructions can be grouped together for parallel execution;2. assigning instructions to the functional units on the hardware;3. determining when instructions are initiated placed together into a single word. 6
  7. 7. VLIW• Aim of VLIW is to achieve Performance.• Most of the processors allows sequential execution of instructions (one after another).• A VLIW computer is based on an architecture that implements Instruction Level Parallelism (ILP) – meaning VLIW processor allows parallel execution of instructions.• A Very Long Instruction Word (VLIW) specifies multiple numbers of primitive operations that are grouped together• They are passed to a register file that executes the instruction with the help of functional units provided as part of the hardware 7
  8. 8. VLIW 8
  9. 9. Static Scheduling• Unlike Super Scalar architectures, in the VLIW architecture all the scheduling is static – This means that they are not done at runtime by the hardware but are handled by the compiler.• The compiler takes the complex instructions that need to be handled, as a result of Instruction Level Parallelism and compiles them into object code• The object code is then passed to the register file 9
  10. 10. Static SchedulingIt is this object code that is referred to as the Very LongInstruction Word (VLIW).The compiler prearranges the object code so the VLIW chipcan quickly execute the instructions in parallelThis frees up the microprocessor from having to perform thecomplex and continual runtime analysis that Super ScalarRISC and CISC chips must do. 10
  11. 11. VLIW vs Super ScalarSuper Scalar architectures, in contrast, usedynamic scheduling that transform all ILPcomplexity to the hardwareThis leads to greater hardware complexity thatis not seen in VLIW hardwareVLIW chips don’t need most of the complexcircuitry that Super Scalar chips must use tocoordinate parallel execution at runtime 11
  12. 12. Tradeoffs• Also the VLIW compiler is specific – it is an integral part of the VLIW system• A poor VLIW compiler will have a much more negative impact on performance than would a poor RISC or CISC compiler 12
  13. 13. VLIW principles1.The compiler analyzes dependence of all instructions among sequential code, tries to extract as much parallelism as possible.2.Based on the analysis, the compiler re-codes the piece of sequential code in VLIW instruction words.3.Finally, the work left with VLIW hardware is only fetch the VLIWs from cache, decode them, and then dispatch the independent primitive instructions to corresponding function units and execute. 13
  14. 14. Generating of VLIW instruction words 14
  15. 15. 1. One VLIW instruction word contains maximum 8 primitive instructions.2. Each time, one VLIW instruction word is fetched from cache and decoded.3. After decoding, all primitive instructions in this VLIW word are issued to functional units in parallel for execution.4. These primitive instructions are from the same VLIW word, so they are guaranteed to be independent. 15
  16. 16. VLIW instructions explicitly specify severalindependent operations— decode theinstruction and dispatch hardware that triesto reconstruct parallelism from a serialinstruction stream. The processor does notneed to consider whether or not theinstructions are parallel. 16
  17. 17. Conclusion 1. The highly parallel implementation is much simpler and cheaper than its counterparts. 2. The encoding of VLIW words implies parallelism among their primitive instructions, which results in reduced hardware complexity. 3. The complier must assemble multiple primitive instructions into a single VLIW, to make sure that multiple function units are kept busy. 17
  18. 18. Thanks !For more : please visitwww.csted.blogspot.in 18