SlideShare a Scribd company logo
1 of 16
INSTRUCTION LEVEL
PARALLALISM
PRESENTED BY KAMRAN ASHRAF
13-NTU-4009
INTRODUCTION
 Instruction-level parallelism (ILP) is a
measure of how many operations in a
computer program can be performed
"in-parallel" at the same time
WHAT IS A PARALLEL INSTRUCTION?
 Parallel instructions are a set of instructions that do not depend on each other
to be executed.
 Hierarchy
 Bit level Parallelism
• 16 bit add on 8 bit processor
 Instruction level Parallelism
 Loop level Parallelism
• for (i=1; i<=1000; i= i+1)
x[i] = x[i] + y[i];
 Thread level Parallelism
• multi-core computers
EXAMPLE
Consider the following program:
1. e = a + b
2. f = c + d
3. g = e * f
 Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and
2, so "g" cannot be calculated until both of "e" and "f" are computed.
 However, operations 1 and 2 do not depend on any other operation, so they can be
computed simultaneously.
 If we assume that each operation can be completed in one unit of time then these three
instructions can be completed in a total of two units of time, giving an ILP factor of 3/2;
which means 3/2 = 1.5 greater than without ILP.
WHY ILP?
 One of the goals of compilers and processors designers is to use as much ILP as
possible.
 Ordinary programs are written execute instructions in sequence; one after the other, in
the order as written by programmers.
 ILP allows the compiler and the processor to overlap the execution of multiple
instructions or even to change the order in which instructions are executed.
ILP TECHNIQUES
Micro-architectural techniques that use ILP include:
 Instruction pipelining
 Superscalar
 Out-of-order execution
 Register renaming
 Speculative execution
 Branch prediction
INSTRUCTION PIPELINE
 An instruction pipeline is a technique
used in the design of modern
microprocessors, microcontrollers and
CPUs to increase their instruction
throughput (the number of instructions
that can be executed in a unit of time).
PIPELINING
 The main idea is to divide the processing of a CPU instruction
into a series of independent steps of "microinstructions with
storage at the end of each step.
 This allows the CPUs control logic to handle instructions at the
processing rate of the slowest step, which is much faster than
the time needed to process the instruction as a single step.
EXAMPLE
 For example, the RISC pipeline is broken into five stages with a set of flip flops between
each stage as follow:
 Instruction fetch
 Instruction decode & register fetch
 Execute
 Memory access
 Register write back
 The vertical axis is successive instructions, the horizontal axis is time. So in the green
column, the earliest instruction is in WB stage, and the latest instruction is undergoing
instruction fetch.
SUPERSCALER
 A superscalar CPU architecture
implements ILP inside a single processor
which allows faster CPU throughput at the
same clock rate.
WHY SUPERSCALER
 A superscalar processor executes more than one instruction during a clock
cycle
 Simultaneously dispatches multiple instructions to multiple redundant
functional units built inside the processor.
 Each functional unit is not a separate CPU core but an execution resource
inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a
bit shifter, or a multiplier.
EXAMPLE
 Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a
maximum of two instructions per cycle can be completed.
OUT-OF-ORDER EXECUTION
 OoOE, is a technique used in most high-
performance microprocessors.
 The key concept is to allow the processor to
avoid a class of delays that occur when the data
needed to perform an operation are unavailable.
 Most modern CPU designs include support for out
of order execution.
STEPS
 Out-of-order processors breaks up the processing of instructions into these steps:
 Instruction fetch.
 Instruction dispatch to an instruction queue (also called instruction buffer)
 The instruction waits in the queue until its input operands are available.
 The instruction is issued to the appropriate functional unit and executed by that unit.
 The results are queued (Re-order Buffer).
 Only after all older instructions have their results written back to the register file, then this
result is written back to the register.
OTHER ILP TECHNIQUES
 Register renaming which is a technique used to avoid unnecessary serialization of
program operations caused by the reuse of registers by those operations, in order to
enable out-of-order execution.
 Speculative execution which allow the execution of complete instructions or parts of
instructions before being sure whether this execution is required.
 Branch prediction which is used to avoid delays cause of control dependencies to be
resolved. Branch prediction determines whether a conditional branch (jump) in the
instruction flow of a program is likely to be taken or not.
THANKS

More Related Content

What's hot

Data transfer and manipulation
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulationSanjeev Patel
 
Cache memory
Cache memoryCache memory
Cache memoryAnuj Modi
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingTushar Swami
 
Instruction Cycle in Computer Organization.pptx
Instruction Cycle in Computer Organization.pptxInstruction Cycle in Computer Organization.pptx
Instruction Cycle in Computer Organization.pptxYash346903
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSvampugani
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer languageSanjeev Patel
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compilerIffat Anjum
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) A B Shinde
 
Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process CommunicationAdeel Rasheed
 
Microprogram Control
Microprogram Control Microprogram Control
Microprogram Control Anuj Modi
 
Registers and-common-bus
Registers and-common-busRegisters and-common-bus
Registers and-common-busAnuj Modi
 
Basic ops concept of comp
Basic ops  concept of compBasic ops  concept of comp
Basic ops concept of compgaurav jain
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahniHitesh Wagle
 
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formatsMazin Alwaaly
 

What's hot (20)

Virtual memory ppt
Virtual memory pptVirtual memory ppt
Virtual memory ppt
 
Data transfer and manipulation
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulation
 
Instruction format
Instruction formatInstruction format
Instruction format
 
Cache memory
Cache memoryCache memory
Cache memory
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set Computing
 
Instruction Cycle in Computer Organization.pptx
Instruction Cycle in Computer Organization.pptxInstruction Cycle in Computer Organization.pptx
Instruction Cycle in Computer Organization.pptx
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer language
 
Microprogrammed Control Unit
Microprogrammed Control UnitMicroprogrammed Control Unit
Microprogrammed Control Unit
 
pipelining
pipeliningpipelining
pipelining
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
 
Memory management
Memory managementMemory management
Memory management
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process Communication
 
Microprogram Control
Microprogram Control Microprogram Control
Microprogram Control
 
Registers and-common-bus
Registers and-common-busRegisters and-common-bus
Registers and-common-bus
 
Basic ops concept of comp
Basic ops  concept of compBasic ops  concept of comp
Basic ops concept of comp
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahni
 
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formats
 

Similar to INSTRUCTION LEVEL PARALLALISM

Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdfMadhuGupta99385
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxJEEVANANTHAMG6
 
Pipelining , structural hazards
Pipelining , structural hazardsPipelining , structural hazards
Pipelining , structural hazardsMunaam Munawar
 
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorSmit Shah
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreadingFraboni Ec
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesMahmudul Hasan
 
Basic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteBasic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteAniket Bhute
 
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...IDES Editor
 
conrol_Unit_part_of_computer_architecture.pptx
conrol_Unit_part_of_computer_architecture.pptxconrol_Unit_part_of_computer_architecture.pptx
conrol_Unit_part_of_computer_architecture.pptxjbri1395
 

Similar to INSTRUCTION LEVEL PARALLALISM (20)

Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
 
Parallelism
ParallelismParallelism
Parallelism
 
Assembly p1
Assembly p1Assembly p1
Assembly p1
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptx
 
pipelining
pipeliningpipelining
pipelining
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Pipelining , structural hazards
Pipelining , structural hazardsPipelining , structural hazards
Pipelining , structural hazards
 
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline Processor
 
Unit 5-lecture 5
Unit 5-lecture 5Unit 5-lecture 5
Unit 5-lecture 5
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
 
Basic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteBasic structure of computers by aniket bhute
Basic structure of computers by aniket bhute
 
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
 
Pipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryPipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan Chowdhury
 
Debate on RISC-CISC
Debate on RISC-CISCDebate on RISC-CISC
Debate on RISC-CISC
 
conrol_Unit_part_of_computer_architecture.pptx
conrol_Unit_part_of_computer_architecture.pptxconrol_Unit_part_of_computer_architecture.pptx
conrol_Unit_part_of_computer_architecture.pptx
 

More from Kamran Ashraf

The Maximum Subarray Problem
The Maximum Subarray ProblemThe Maximum Subarray Problem
The Maximum Subarray ProblemKamran Ashraf
 
Ubiquitous Computing
Ubiquitous ComputingUbiquitous Computing
Ubiquitous ComputingKamran Ashraf
 
Application programming interface sockets
Application programming interface socketsApplication programming interface sockets
Application programming interface socketsKamran Ashraf
 
Error Detection types
Error Detection typesError Detection types
Error Detection typesKamran Ashraf
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing UnitKamran Ashraf
 

More from Kamran Ashraf (6)

The Maximum Subarray Problem
The Maximum Subarray ProblemThe Maximum Subarray Problem
The Maximum Subarray Problem
 
Ubiquitous Computing
Ubiquitous ComputingUbiquitous Computing
Ubiquitous Computing
 
Application programming interface sockets
Application programming interface socketsApplication programming interface sockets
Application programming interface sockets
 
Error Detection types
Error Detection typesError Detection types
Error Detection types
 
VIRTUAL MEMORY
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORY
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
 

INSTRUCTION LEVEL PARALLALISM

  • 1. INSTRUCTION LEVEL PARALLALISM PRESENTED BY KAMRAN ASHRAF 13-NTU-4009
  • 2. INTRODUCTION  Instruction-level parallelism (ILP) is a measure of how many operations in a computer program can be performed "in-parallel" at the same time
  • 3. WHAT IS A PARALLEL INSTRUCTION?  Parallel instructions are a set of instructions that do not depend on each other to be executed.  Hierarchy  Bit level Parallelism • 16 bit add on 8 bit processor  Instruction level Parallelism  Loop level Parallelism • for (i=1; i<=1000; i= i+1) x[i] = x[i] + y[i];  Thread level Parallelism • multi-core computers
  • 4. EXAMPLE Consider the following program: 1. e = a + b 2. f = c + d 3. g = e * f  Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and 2, so "g" cannot be calculated until both of "e" and "f" are computed.  However, operations 1 and 2 do not depend on any other operation, so they can be computed simultaneously.  If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP factor of 3/2; which means 3/2 = 1.5 greater than without ILP.
  • 5. WHY ILP?  One of the goals of compilers and processors designers is to use as much ILP as possible.  Ordinary programs are written execute instructions in sequence; one after the other, in the order as written by programmers.  ILP allows the compiler and the processor to overlap the execution of multiple instructions or even to change the order in which instructions are executed.
  • 6. ILP TECHNIQUES Micro-architectural techniques that use ILP include:  Instruction pipelining  Superscalar  Out-of-order execution  Register renaming  Speculative execution  Branch prediction
  • 7. INSTRUCTION PIPELINE  An instruction pipeline is a technique used in the design of modern microprocessors, microcontrollers and CPUs to increase their instruction throughput (the number of instructions that can be executed in a unit of time).
  • 8. PIPELINING  The main idea is to divide the processing of a CPU instruction into a series of independent steps of "microinstructions with storage at the end of each step.  This allows the CPUs control logic to handle instructions at the processing rate of the slowest step, which is much faster than the time needed to process the instruction as a single step.
  • 9. EXAMPLE  For example, the RISC pipeline is broken into five stages with a set of flip flops between each stage as follow:  Instruction fetch  Instruction decode & register fetch  Execute  Memory access  Register write back  The vertical axis is successive instructions, the horizontal axis is time. So in the green column, the earliest instruction is in WB stage, and the latest instruction is undergoing instruction fetch.
  • 10. SUPERSCALER  A superscalar CPU architecture implements ILP inside a single processor which allows faster CPU throughput at the same clock rate.
  • 11. WHY SUPERSCALER  A superscalar processor executes more than one instruction during a clock cycle  Simultaneously dispatches multiple instructions to multiple redundant functional units built inside the processor.  Each functional unit is not a separate CPU core but an execution resource inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a bit shifter, or a multiplier.
  • 12. EXAMPLE  Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.
  • 13. OUT-OF-ORDER EXECUTION  OoOE, is a technique used in most high- performance microprocessors.  The key concept is to allow the processor to avoid a class of delays that occur when the data needed to perform an operation are unavailable.  Most modern CPU designs include support for out of order execution.
  • 14. STEPS  Out-of-order processors breaks up the processing of instructions into these steps:  Instruction fetch.  Instruction dispatch to an instruction queue (also called instruction buffer)  The instruction waits in the queue until its input operands are available.  The instruction is issued to the appropriate functional unit and executed by that unit.  The results are queued (Re-order Buffer).  Only after all older instructions have their results written back to the register file, then this result is written back to the register.
  • 15. OTHER ILP TECHNIQUES  Register renaming which is a technique used to avoid unnecessary serialization of program operations caused by the reuse of registers by those operations, in order to enable out-of-order execution.  Speculative execution which allow the execution of complete instructions or parts of instructions before being sure whether this execution is required.  Branch prediction which is used to avoid delays cause of control dependencies to be resolved. Branch prediction determines whether a conditional branch (jump) in the instruction flow of a program is likely to be taken or not.