SlideShare a Scribd company logo
1 of 13
PIPELINING IDEALISM

ANEESH R
Center For Development of Advanced Computing (C-DAC)
INDIA
aneeshr2020@gmail.com

ANEESH R
Pipelining idealism
•

Motivation of a k-stage pipelined design is to achieve a k-folded increase in throughput.

•

The K-fold increase in throughput represents the ideal case.

•

Unavoidable deviations form the idealism in real pipeline make pipelined design more
challenging .

•

Solution for idealism – realism gap in pipelining is more challenging.

•

Three points in pipelining idealism are :-

•

Uniform sub-computations : Computation to be performed is evenly partitioned into
uniform latency computations.

•

Identical sub-computations : Same computation is to be performed repeatedly on a
large number of input data sets

•

Independent sub-computations : All the repetitions of the same computations are
mutually independent
ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations
•

The computation to be pipelined can be evenly partitioned into K-uniform latency subcomputations.
•

Original design can be evenly partitioned into K-balanced(i.e. having same latency) pipeline
stages.

•

If the latency of the original computation and hence the clocking period of the non-pipelined
design is “T”, then clocking period of a k-stage pipelined design is exactly “T/K”.

•

The k-folded increase in throughput is achieved due to the k-fold increase of the clocking rate.

•
•
•

This idealized concept may not be true in an actual pipeline design.
It may not be possible to partition the computation into perfectly balanced stages.

The latency of 400 ns of the non-pipelined computation is partitioned into three stages with
latencies of 125, 150, and 125 ns, respectively.

•

The original latency has not been evenly partitioned into three balanced stages.

ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations (cont…)
•

The clocking period of a pipelined design is dictated by the stage with the longest
latency.
• The stages with shorter latencies in effect will incur some inefficiency or
penalty.
• The first and third stages have an inefficiency of 25 ns each.
• These are the internal fragmentation of pipeline stages.
• The total latency required for performing the same computation will increase
from T to Tf
• The clocking period of the pipelined design will be no longer T/k but Tf/k
• The performance of the three sub-computations will require 450 ns instead of
the original 400 ns
• The clocking period will be not 133 ns (400/3 ns) but 150 ns
ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations (cont…)
•

In actual designs, an additional delay is introduced by the introduction of buffers
between pipeline stages and an additional delay is also required for ensuring
proper clocking of the pipeline stages.
• An additional 22 ns is required to ensure proper clocking of the pipeline stages.
• This results in the cycle time of 172 ns for the three-stage pipelined design.
• The ideal cycle time for a three-stage pipelined design would have been 133
ns.
• The difference between 172 and 133 ns for the clocking period accounts for
the shortfall from the idealized three-fold increase of throughput.

ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations (cont…)
•

Uniform sub-computations basically assumes two things:
•

There is no inefficiency introduced due to the partitioning of the original computation into
multiple sub-computations

•

There is no additional delay caused by the introduction of the inter-stage buffers and the
clocking requirements

•

The additional delay incurred for proper pipeline clocking can be minimized by employing latches
similar to the Earle latch

•

The partitioning of a computation into balanced pipeline stages constitutes the first challenge of
pipelined design
•

•

The goal is to achieve stages as balanced as possible to minimize internal fragmentation

Internal fragmentation is the primary cause of deviation from the first point of pipelining idealism
•

This deviation leads to the shortfall from the idealized k-fold increase of throughput in a kstage pipelined design

ANEESH R
aneeshr2020@gmail.com
Identical sub-computations
•

Many repetitions of the same computation are to be performed by the pipeline.
• The same computation is repeated on multiple sets of input data.

• Each repetition requires the same sequence of sub-computations provided by
the pipeline stages.
•

This is certainly true for the Pipelined Floating-Point Multiplier.
• Because this pipeline performs only one function, that is, floating-point
multiplication.
• Many pairs of floating-point numbers are to be multiplied.
• Each pair of operands is sent through the same three pipeline stages.

• All the pipeline stages are used by every repetition of the computation.

ANEESH R
aneeshr2020@gmail.com
Identical sub-computations(cont…)
•

If a pipeline is designed to perform multiple functions, this assumption may not hold.
•

An arithmetic pipeline can be designed to perform both addition and multiplication

•

Not all the pipeline stages may be required by each of the functions supported by the
pipeline

•

A different subset of pipeline stages is required for performing each of the functions

•

Each computation may not require all the pipeline stages

•

Some data sets will not require some pipeline stages and effectively will be idling during
those stages

•

These unused or idling pipeline stages introduce another form of pipeline inefficiency
• Called external fragmentation of pipeline stages

•

External fragmentation is a form of pipelining overhead and should be minimized in
multifunction pipelines

ANEESH R
aneeshr2020@gmail.com
Identical sub-computations(cont…)
•

Identical computations effectively assume that all pipeline stages are always utilized.

•

It also implies that there are many sets of data to be processed.

• It takes k cycles for the first data set to reach the last stage of the pipeline.
• These cycles are referred to as the pipeline fill time.
• After the last data set has entered the first pipeline stage, an additional k cycles are
needed to drain the pipeline.

• During pipeline fill and drain times, not all the stages will be busy.
• Assuming the processing of many sets of input data is that the pipeline fill and
drain times constitute a very small fraction of the total time.
• The pipeline stages can be considered, for all practical purposes, to be always
busy.
ANEESH R
aneeshr2020@gmail.com
Independent sub-computations
•

The repetitions of computation, or simply computations, to be processed by the
pipeline are independent
• All the computations that are concurrently resident in the pipeline stages are
independent
• They have no data or control dependences between any pair of the
computations
• This permits the pipeline to operate in "streaming" mode

• A later computation needs not wait for the completion of an earlier computation
due to a dependence between them
• For our pipelined floating-point multiplier this assumption holds
• If there are multiple pairs of operands to be multiplied, the multiplication of a pair

of operands does not depend on the result from another multiplication
• These pairs can be processed by the pipeline in streaming mode
ANEESH R
aneeshr2020@gmail.com
Independent sub-computations (Cont…)
•

For some pipelines this point may not hold :•

A later computation may require the result of an earlier computation

•

Both of these computations can be concurrently resident in the pipeline stages

•

If the later computation has entered the pipeline stage that needs the result while the earlier
computation has not reached the pipeline stage that produces the needed result, the later
computation must wait in that pipeline stage
• Referred to as a pipeline stall

•

If a computation is stalled in a pipeline stage, all subsequent computations may have to be
stalled

•

Pipeline stalls effectively introduce idling pipeline stages

•

This is essentially a dynamic form of external fragmentation and results in the reduction of
pipeline throughput

•

In designing pipelines that need to process computations that are not necessarily independent, the
goal is to produce a pipeline design that minimizes the amount of pipeline stalls

ANEESH R
aneeshr2020@gmail.com
ANEESH R
aneeshr2020@gmail.com
• This topic is adopted form “Micro-processor
design” by authors “SHEN” and “LIPSATI”

ANEESH R
aneeshr2020@gmail.com

More Related Content

What's hot

INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...
INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...
INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...Prasant Kumar
 
052516 week11 quantum computers
052516 week11 quantum computers052516 week11 quantum computers
052516 week11 quantum computersSubas Nandy
 
Time domain response in rc & rl circuits
Time domain response in rc & rl circuitsTime domain response in rc & rl circuits
Time domain response in rc & rl circuitsDharit Unadkat
 
Speed control of dc motor using matlab
Speed control of dc motor using matlabSpeed control of dc motor using matlab
Speed control of dc motor using matlabShridhar kulkarni
 
MOSFET and Short channel effects
MOSFET and Short channel effectsMOSFET and Short channel effects
MOSFET and Short channel effectsLee Rather
 

What's hot (9)

INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...
INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...
INTRINSIC OR IONIC BREAKDOWN MECHANISM OF SOLID DIELECTRICS|BREAKDOWN IN SOLI...
 
Introduction a l’Ordinateur Quantique
Introduction a l’Ordinateur QuantiqueIntroduction a l’Ordinateur Quantique
Introduction a l’Ordinateur Quantique
 
Matching concept in Microelectronics
Matching concept in MicroelectronicsMatching concept in Microelectronics
Matching concept in Microelectronics
 
Computer Science Final Project
Computer Science Final ProjectComputer Science Final Project
Computer Science Final Project
 
Short channel effects
Short channel effectsShort channel effects
Short channel effects
 
052516 week11 quantum computers
052516 week11 quantum computers052516 week11 quantum computers
052516 week11 quantum computers
 
Time domain response in rc & rl circuits
Time domain response in rc & rl circuitsTime domain response in rc & rl circuits
Time domain response in rc & rl circuits
 
Speed control of dc motor using matlab
Speed control of dc motor using matlabSpeed control of dc motor using matlab
Speed control of dc motor using matlab
 
MOSFET and Short channel effects
MOSFET and Short channel effectsMOSFET and Short channel effects
MOSFET and Short channel effects
 

Similar to Pipelineing idealisam

Arithmatic pipline
Arithmatic piplineArithmatic pipline
Arithmatic piplineA. Shamel
 
pipelining-190913185902.pptx
pipelining-190913185902.pptxpipelining-190913185902.pptx
pipelining-190913185902.pptxAshokRachapalli1
 
Pipelining powerpoint presentation
Pipelining powerpoint presentationPipelining powerpoint presentation
Pipelining powerpoint presentationbhavanadonthi
 
BIL406-Chapter-7-Superscalar and Superpipeline processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline  processors.pptBIL406-Chapter-7-Superscalar and Superpipeline  processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline processors.pptKadri20
 
arithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxarithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxAshokRachapalli1
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip BasicsA B Shinde
 
Aiar. unit ii. transfer lines
Aiar. unit ii. transfer linesAiar. unit ii. transfer lines
Aiar. unit ii. transfer linesKunal mane
 
10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysis10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysisUsha Mehta
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRAPNIC
 
RIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsRIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsAPNIC
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 

Similar to Pipelineing idealisam (20)

Arithmatic pipline
Arithmatic piplineArithmatic pipline
Arithmatic pipline
 
pipelining-190913185902.pptx
pipelining-190913185902.pptxpipelining-190913185902.pptx
pipelining-190913185902.pptx
 
Pipelining powerpoint presentation
Pipelining powerpoint presentationPipelining powerpoint presentation
Pipelining powerpoint presentation
 
BIL406-Chapter-7-Superscalar and Superpipeline processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline  processors.pptBIL406-Chapter-7-Superscalar and Superpipeline  processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline processors.ppt
 
arithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxarithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptx
 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip Basics
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
MA1.ppt
MA1.pptMA1.ppt
MA1.ppt
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
Aiar. unit ii. transfer lines
Aiar. unit ii. transfer linesAiar. unit ii. transfer lines
Aiar. unit ii. transfer lines
 
Low power
Low powerLow power
Low power
 
Pipelining
PipeliningPipelining
Pipelining
 
10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysis10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysis
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBR
 
Stormwater modeling 411_troilo
Stormwater modeling 411_troiloStormwater modeling 411_troilo
Stormwater modeling 411_troilo
 
Manja ppt
Manja pptManja ppt
Manja ppt
 
RIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsRIPE 80: Buffers and Protocols
RIPE 80: Buffers and Protocols
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 

More from Aneesh Raveendran

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranAneesh Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreAneesh Raveendran
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessAneesh Raveendran
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorAneesh Raveendran
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGAAneesh Raveendran
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLAneesh Raveendran
 

More from Aneesh Raveendran (8)

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
 
Branch prediction
Branch predictionBranch prediction
Branch prediction
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Pipelineing idealisam

  • 1. PIPELINING IDEALISM ANEESH R Center For Development of Advanced Computing (C-DAC) INDIA aneeshr2020@gmail.com ANEESH R
  • 2. Pipelining idealism • Motivation of a k-stage pipelined design is to achieve a k-folded increase in throughput. • The K-fold increase in throughput represents the ideal case. • Unavoidable deviations form the idealism in real pipeline make pipelined design more challenging . • Solution for idealism – realism gap in pipelining is more challenging. • Three points in pipelining idealism are :- • Uniform sub-computations : Computation to be performed is evenly partitioned into uniform latency computations. • Identical sub-computations : Same computation is to be performed repeatedly on a large number of input data sets • Independent sub-computations : All the repetitions of the same computations are mutually independent ANEESH R aneeshr2020@gmail.com
  • 3. Uniform sub-computations • The computation to be pipelined can be evenly partitioned into K-uniform latency subcomputations. • Original design can be evenly partitioned into K-balanced(i.e. having same latency) pipeline stages. • If the latency of the original computation and hence the clocking period of the non-pipelined design is “T”, then clocking period of a k-stage pipelined design is exactly “T/K”. • The k-folded increase in throughput is achieved due to the k-fold increase of the clocking rate. • • • This idealized concept may not be true in an actual pipeline design. It may not be possible to partition the computation into perfectly balanced stages. The latency of 400 ns of the non-pipelined computation is partitioned into three stages with latencies of 125, 150, and 125 ns, respectively. • The original latency has not been evenly partitioned into three balanced stages. ANEESH R aneeshr2020@gmail.com
  • 4. Uniform sub-computations (cont…) • The clocking period of a pipelined design is dictated by the stage with the longest latency. • The stages with shorter latencies in effect will incur some inefficiency or penalty. • The first and third stages have an inefficiency of 25 ns each. • These are the internal fragmentation of pipeline stages. • The total latency required for performing the same computation will increase from T to Tf • The clocking period of the pipelined design will be no longer T/k but Tf/k • The performance of the three sub-computations will require 450 ns instead of the original 400 ns • The clocking period will be not 133 ns (400/3 ns) but 150 ns ANEESH R aneeshr2020@gmail.com
  • 5. Uniform sub-computations (cont…) • In actual designs, an additional delay is introduced by the introduction of buffers between pipeline stages and an additional delay is also required for ensuring proper clocking of the pipeline stages. • An additional 22 ns is required to ensure proper clocking of the pipeline stages. • This results in the cycle time of 172 ns for the three-stage pipelined design. • The ideal cycle time for a three-stage pipelined design would have been 133 ns. • The difference between 172 and 133 ns for the clocking period accounts for the shortfall from the idealized three-fold increase of throughput. ANEESH R aneeshr2020@gmail.com
  • 6. Uniform sub-computations (cont…) • Uniform sub-computations basically assumes two things: • There is no inefficiency introduced due to the partitioning of the original computation into multiple sub-computations • There is no additional delay caused by the introduction of the inter-stage buffers and the clocking requirements • The additional delay incurred for proper pipeline clocking can be minimized by employing latches similar to the Earle latch • The partitioning of a computation into balanced pipeline stages constitutes the first challenge of pipelined design • • The goal is to achieve stages as balanced as possible to minimize internal fragmentation Internal fragmentation is the primary cause of deviation from the first point of pipelining idealism • This deviation leads to the shortfall from the idealized k-fold increase of throughput in a kstage pipelined design ANEESH R aneeshr2020@gmail.com
  • 7. Identical sub-computations • Many repetitions of the same computation are to be performed by the pipeline. • The same computation is repeated on multiple sets of input data. • Each repetition requires the same sequence of sub-computations provided by the pipeline stages. • This is certainly true for the Pipelined Floating-Point Multiplier. • Because this pipeline performs only one function, that is, floating-point multiplication. • Many pairs of floating-point numbers are to be multiplied. • Each pair of operands is sent through the same three pipeline stages. • All the pipeline stages are used by every repetition of the computation. ANEESH R aneeshr2020@gmail.com
  • 8. Identical sub-computations(cont…) • If a pipeline is designed to perform multiple functions, this assumption may not hold. • An arithmetic pipeline can be designed to perform both addition and multiplication • Not all the pipeline stages may be required by each of the functions supported by the pipeline • A different subset of pipeline stages is required for performing each of the functions • Each computation may not require all the pipeline stages • Some data sets will not require some pipeline stages and effectively will be idling during those stages • These unused or idling pipeline stages introduce another form of pipeline inefficiency • Called external fragmentation of pipeline stages • External fragmentation is a form of pipelining overhead and should be minimized in multifunction pipelines ANEESH R aneeshr2020@gmail.com
  • 9. Identical sub-computations(cont…) • Identical computations effectively assume that all pipeline stages are always utilized. • It also implies that there are many sets of data to be processed. • It takes k cycles for the first data set to reach the last stage of the pipeline. • These cycles are referred to as the pipeline fill time. • After the last data set has entered the first pipeline stage, an additional k cycles are needed to drain the pipeline. • During pipeline fill and drain times, not all the stages will be busy. • Assuming the processing of many sets of input data is that the pipeline fill and drain times constitute a very small fraction of the total time. • The pipeline stages can be considered, for all practical purposes, to be always busy. ANEESH R aneeshr2020@gmail.com
  • 10. Independent sub-computations • The repetitions of computation, or simply computations, to be processed by the pipeline are independent • All the computations that are concurrently resident in the pipeline stages are independent • They have no data or control dependences between any pair of the computations • This permits the pipeline to operate in "streaming" mode • A later computation needs not wait for the completion of an earlier computation due to a dependence between them • For our pipelined floating-point multiplier this assumption holds • If there are multiple pairs of operands to be multiplied, the multiplication of a pair of operands does not depend on the result from another multiplication • These pairs can be processed by the pipeline in streaming mode ANEESH R aneeshr2020@gmail.com
  • 11. Independent sub-computations (Cont…) • For some pipelines this point may not hold :• A later computation may require the result of an earlier computation • Both of these computations can be concurrently resident in the pipeline stages • If the later computation has entered the pipeline stage that needs the result while the earlier computation has not reached the pipeline stage that produces the needed result, the later computation must wait in that pipeline stage • Referred to as a pipeline stall • If a computation is stalled in a pipeline stage, all subsequent computations may have to be stalled • Pipeline stalls effectively introduce idling pipeline stages • This is essentially a dynamic form of external fragmentation and results in the reduction of pipeline throughput • In designing pipelines that need to process computations that are not necessarily independent, the goal is to produce a pipeline design that minimizes the amount of pipeline stalls ANEESH R aneeshr2020@gmail.com
  • 13. • This topic is adopted form “Micro-processor design” by authors “SHEN” and “LIPSATI” ANEESH R aneeshr2020@gmail.com