SlideShare a Scribd company logo
PIPELINING IDEALISM

ANEESH R
Center For Development of Advanced Computing (C-DAC)
INDIA
aneeshr2020@gmail.com

ANEESH R
Pipelining idealism
•

Motivation of a k-stage pipelined design is to achieve a k-folded increase in throughput.

•

The K-fold increase in throughput represents the ideal case.

•

Unavoidable deviations form the idealism in real pipeline make pipelined design more
challenging .

•

Solution for idealism – realism gap in pipelining is more challenging.

•

Three points in pipelining idealism are :-

•

Uniform sub-computations : Computation to be performed is evenly partitioned into
uniform latency computations.

•

Identical sub-computations : Same computation is to be performed repeatedly on a
large number of input data sets

•

Independent sub-computations : All the repetitions of the same computations are
mutually independent
ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations
•

The computation to be pipelined can be evenly partitioned into K-uniform latency subcomputations.
•

Original design can be evenly partitioned into K-balanced(i.e. having same latency) pipeline
stages.

•

If the latency of the original computation and hence the clocking period of the non-pipelined
design is “T”, then clocking period of a k-stage pipelined design is exactly “T/K”.

•

The k-folded increase in throughput is achieved due to the k-fold increase of the clocking rate.

•
•
•

This idealized concept may not be true in an actual pipeline design.
It may not be possible to partition the computation into perfectly balanced stages.

The latency of 400 ns of the non-pipelined computation is partitioned into three stages with
latencies of 125, 150, and 125 ns, respectively.

•

The original latency has not been evenly partitioned into three balanced stages.

ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations (cont…)
•

The clocking period of a pipelined design is dictated by the stage with the longest
latency.
• The stages with shorter latencies in effect will incur some inefficiency or
penalty.
• The first and third stages have an inefficiency of 25 ns each.
• These are the internal fragmentation of pipeline stages.
• The total latency required for performing the same computation will increase
from T to Tf
• The clocking period of the pipelined design will be no longer T/k but Tf/k
• The performance of the three sub-computations will require 450 ns instead of
the original 400 ns
• The clocking period will be not 133 ns (400/3 ns) but 150 ns
ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations (cont…)
•

In actual designs, an additional delay is introduced by the introduction of buffers
between pipeline stages and an additional delay is also required for ensuring
proper clocking of the pipeline stages.
• An additional 22 ns is required to ensure proper clocking of the pipeline stages.
• This results in the cycle time of 172 ns for the three-stage pipelined design.
• The ideal cycle time for a three-stage pipelined design would have been 133
ns.
• The difference between 172 and 133 ns for the clocking period accounts for
the shortfall from the idealized three-fold increase of throughput.

ANEESH R
aneeshr2020@gmail.com
Uniform sub-computations (cont…)
•

Uniform sub-computations basically assumes two things:
•

There is no inefficiency introduced due to the partitioning of the original computation into
multiple sub-computations

•

There is no additional delay caused by the introduction of the inter-stage buffers and the
clocking requirements

•

The additional delay incurred for proper pipeline clocking can be minimized by employing latches
similar to the Earle latch

•

The partitioning of a computation into balanced pipeline stages constitutes the first challenge of
pipelined design
•

•

The goal is to achieve stages as balanced as possible to minimize internal fragmentation

Internal fragmentation is the primary cause of deviation from the first point of pipelining idealism
•

This deviation leads to the shortfall from the idealized k-fold increase of throughput in a kstage pipelined design

ANEESH R
aneeshr2020@gmail.com
Identical sub-computations
•

Many repetitions of the same computation are to be performed by the pipeline.
• The same computation is repeated on multiple sets of input data.

• Each repetition requires the same sequence of sub-computations provided by
the pipeline stages.
•

This is certainly true for the Pipelined Floating-Point Multiplier.
• Because this pipeline performs only one function, that is, floating-point
multiplication.
• Many pairs of floating-point numbers are to be multiplied.
• Each pair of operands is sent through the same three pipeline stages.

• All the pipeline stages are used by every repetition of the computation.

ANEESH R
aneeshr2020@gmail.com
Identical sub-computations(cont…)
•

If a pipeline is designed to perform multiple functions, this assumption may not hold.
•

An arithmetic pipeline can be designed to perform both addition and multiplication

•

Not all the pipeline stages may be required by each of the functions supported by the
pipeline

•

A different subset of pipeline stages is required for performing each of the functions

•

Each computation may not require all the pipeline stages

•

Some data sets will not require some pipeline stages and effectively will be idling during
those stages

•

These unused or idling pipeline stages introduce another form of pipeline inefficiency
• Called external fragmentation of pipeline stages

•

External fragmentation is a form of pipelining overhead and should be minimized in
multifunction pipelines

ANEESH R
aneeshr2020@gmail.com
Identical sub-computations(cont…)
•

Identical computations effectively assume that all pipeline stages are always utilized.

•

It also implies that there are many sets of data to be processed.

• It takes k cycles for the first data set to reach the last stage of the pipeline.
• These cycles are referred to as the pipeline fill time.
• After the last data set has entered the first pipeline stage, an additional k cycles are
needed to drain the pipeline.

• During pipeline fill and drain times, not all the stages will be busy.
• Assuming the processing of many sets of input data is that the pipeline fill and
drain times constitute a very small fraction of the total time.
• The pipeline stages can be considered, for all practical purposes, to be always
busy.
ANEESH R
aneeshr2020@gmail.com
Independent sub-computations
•

The repetitions of computation, or simply computations, to be processed by the
pipeline are independent
• All the computations that are concurrently resident in the pipeline stages are
independent
• They have no data or control dependences between any pair of the
computations
• This permits the pipeline to operate in "streaming" mode

• A later computation needs not wait for the completion of an earlier computation
due to a dependence between them
• For our pipelined floating-point multiplier this assumption holds
• If there are multiple pairs of operands to be multiplied, the multiplication of a pair

of operands does not depend on the result from another multiplication
• These pairs can be processed by the pipeline in streaming mode
ANEESH R
aneeshr2020@gmail.com
Independent sub-computations (Cont…)
•

For some pipelines this point may not hold :•

A later computation may require the result of an earlier computation

•

Both of these computations can be concurrently resident in the pipeline stages

•

If the later computation has entered the pipeline stage that needs the result while the earlier
computation has not reached the pipeline stage that produces the needed result, the later
computation must wait in that pipeline stage
• Referred to as a pipeline stall

•

If a computation is stalled in a pipeline stage, all subsequent computations may have to be
stalled

•

Pipeline stalls effectively introduce idling pipeline stages

•

This is essentially a dynamic form of external fragmentation and results in the reduction of
pipeline throughput

•

In designing pipelines that need to process computations that are not necessarily independent, the
goal is to produce a pipeline design that minimizes the amount of pipeline stalls

ANEESH R
aneeshr2020@gmail.com
ANEESH R
aneeshr2020@gmail.com
• This topic is adopted form “Micro-processor
design” by authors “SHEN” and “LIPSATI”

ANEESH R
aneeshr2020@gmail.com

More Related Content

What's hot

PROBLEM SOLVING TECHNIQUES
PROBLEM SOLVING TECHNIQUESPROBLEM SOLVING TECHNIQUES
PROBLEM SOLVING TECHNIQUES
sudhanagarajan5
 
BINARY SEARCH TREE
BINARY SEARCH TREEBINARY SEARCH TREE
BINARY SEARCH TREE
ER Punit Jain
 
The Functional Programming Toolkit (NDC Oslo 2019)
The Functional Programming Toolkit (NDC Oslo 2019)The Functional Programming Toolkit (NDC Oslo 2019)
The Functional Programming Toolkit (NDC Oslo 2019)
Scott Wlaschin
 
Applications of Stack
Applications of StackApplications of Stack
Applications of Stack
Christalin Nelson
 
Chapter 8 ds
Chapter 8 dsChapter 8 ds
Chapter 8 ds
Hanif Durad
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
Archana Burujwale
 
Linked lists
Linked listsLinked lists
Sparse matrix and its representation data structure
Sparse matrix and its representation data structureSparse matrix and its representation data structure
Sparse matrix and its representation data structure
Vardhil Patel
 
Types of Tree in Data Structure in C++
Types of Tree in Data Structure in C++Types of Tree in Data Structure in C++
Types of Tree in Data Structure in C++
Himanshu Choudhary
 
Queue
QueueQueue
Queue
Raj Sarode
 
Stacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURESStacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURES
Sowmya Jyothi
 
(Binary tree)
(Binary tree)(Binary tree)
(Binary tree)
almario1988
 
Heaps
HeapsHeaps
Heaps
pratmash
 
Binary Search Tree in Data Structure
Binary Search Tree in Data StructureBinary Search Tree in Data Structure
Binary Search Tree in Data Structure
Dharita Chokshi
 
AD3251-Data Structures Design-Notes-Tree.pdf
AD3251-Data Structures  Design-Notes-Tree.pdfAD3251-Data Structures  Design-Notes-Tree.pdf
AD3251-Data Structures Design-Notes-Tree.pdf
Ramco Institute of Technology, Rajapalayam, Tamilnadu, India
 
1.7 avl tree
1.7 avl tree 1.7 avl tree
1.7 avl tree
Krish_ver2
 
Priority Queue in Data Structure
Priority Queue in Data StructurePriority Queue in Data Structure
Priority Queue in Data Structure
Meghaj Mallick
 
binary tree.pptx
binary tree.pptxbinary tree.pptx
binary tree.pptx
DhanushSrinivasulu
 
Binary tree
Binary treeBinary tree
Binary tree
Rajendran
 
[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware
[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware
[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware
Simen Li
 

What's hot (20)

PROBLEM SOLVING TECHNIQUES
PROBLEM SOLVING TECHNIQUESPROBLEM SOLVING TECHNIQUES
PROBLEM SOLVING TECHNIQUES
 
BINARY SEARCH TREE
BINARY SEARCH TREEBINARY SEARCH TREE
BINARY SEARCH TREE
 
The Functional Programming Toolkit (NDC Oslo 2019)
The Functional Programming Toolkit (NDC Oslo 2019)The Functional Programming Toolkit (NDC Oslo 2019)
The Functional Programming Toolkit (NDC Oslo 2019)
 
Applications of Stack
Applications of StackApplications of Stack
Applications of Stack
 
Chapter 8 ds
Chapter 8 dsChapter 8 ds
Chapter 8 ds
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Linked lists
Linked listsLinked lists
Linked lists
 
Sparse matrix and its representation data structure
Sparse matrix and its representation data structureSparse matrix and its representation data structure
Sparse matrix and its representation data structure
 
Types of Tree in Data Structure in C++
Types of Tree in Data Structure in C++Types of Tree in Data Structure in C++
Types of Tree in Data Structure in C++
 
Queue
QueueQueue
Queue
 
Stacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURESStacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURES
 
(Binary tree)
(Binary tree)(Binary tree)
(Binary tree)
 
Heaps
HeapsHeaps
Heaps
 
Binary Search Tree in Data Structure
Binary Search Tree in Data StructureBinary Search Tree in Data Structure
Binary Search Tree in Data Structure
 
AD3251-Data Structures Design-Notes-Tree.pdf
AD3251-Data Structures  Design-Notes-Tree.pdfAD3251-Data Structures  Design-Notes-Tree.pdf
AD3251-Data Structures Design-Notes-Tree.pdf
 
1.7 avl tree
1.7 avl tree 1.7 avl tree
1.7 avl tree
 
Priority Queue in Data Structure
Priority Queue in Data StructurePriority Queue in Data Structure
Priority Queue in Data Structure
 
binary tree.pptx
binary tree.pptxbinary tree.pptx
binary tree.pptx
 
Binary tree
Binary treeBinary tree
Binary tree
 
[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware
[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware
[ZigBee 嵌入式系統] ZigBee Architecture 與 TI Z-Stack Firmware
 

Similar to Pipelineing idealisam

Arithmatic pipline
Arithmatic piplineArithmatic pipline
Arithmatic pipline
A. Shamel
 
pipelining-190913185902.pptx
pipelining-190913185902.pptxpipelining-190913185902.pptx
pipelining-190913185902.pptx
AshokRachapalli1
 
Pipelining powerpoint presentation
Pipelining powerpoint presentationPipelining powerpoint presentation
Pipelining powerpoint presentation
bhavanadonthi
 
BIL406-Chapter-7-Superscalar and Superpipeline processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline  processors.pptBIL406-Chapter-7-Superscalar and Superpipeline  processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline processors.ppt
Kadri20
 
arithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxarithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptx
AshokRachapalli1
 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
Medicaps University
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip Basics
A B Shinde
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
fika sweety
 
MA1.ppt
MA1.pptMA1.ppt
MA1.ppt
VivekC49
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
PrasantaKumarDash2
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Aiar. unit ii. transfer lines
Aiar. unit ii. transfer linesAiar. unit ii. transfer lines
Aiar. unit ii. transfer lines
Kunal mane
 
Low power
Low powerLow power
Low power
preeti banra
 
Pipelining
PipeliningPipelining
Pipelining
sarith divakar
 
10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysis10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysis
Usha Mehta
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBR
APNIC
 
Stormwater modeling 411_troilo
Stormwater modeling 411_troiloStormwater modeling 411_troilo
Manja ppt
Manja pptManja ppt
Manja ppt
Druva Gowda
 
RIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsRIPE 80: Buffers and Protocols
RIPE 80: Buffers and Protocols
APNIC
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
Aneesh Raveendran
 

Similar to Pipelineing idealisam (20)

Arithmatic pipline
Arithmatic piplineArithmatic pipline
Arithmatic pipline
 
pipelining-190913185902.pptx
pipelining-190913185902.pptxpipelining-190913185902.pptx
pipelining-190913185902.pptx
 
Pipelining powerpoint presentation
Pipelining powerpoint presentationPipelining powerpoint presentation
Pipelining powerpoint presentation
 
BIL406-Chapter-7-Superscalar and Superpipeline processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline  processors.pptBIL406-Chapter-7-Superscalar and Superpipeline  processors.ppt
BIL406-Chapter-7-Superscalar and Superpipeline processors.ppt
 
arithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxarithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptx
 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip Basics
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
MA1.ppt
MA1.pptMA1.ppt
MA1.ppt
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
Aiar. unit ii. transfer lines
Aiar. unit ii. transfer linesAiar. unit ii. transfer lines
Aiar. unit ii. transfer lines
 
Low power
Low powerLow power
Low power
 
Pipelining
PipeliningPipelining
Pipelining
 
10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysis10 static timing_analysis_1_concept_of_timing_analysis
10 static timing_analysis_1_concept_of_timing_analysis
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBR
 
Stormwater modeling 411_troilo
Stormwater modeling 411_troiloStormwater modeling 411_troilo
Stormwater modeling 411_troilo
 
Manja ppt
Manja pptManja ppt
Manja ppt
 
RIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsRIPE 80: Buffers and Protocols
RIPE 80: Buffers and Protocols
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 

More from Aneesh Raveendran

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
Aneesh Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
Aneesh Raveendran
 
Branch prediction
Branch predictionBranch prediction
Branch prediction
Aneesh Raveendran
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
Aneesh Raveendran
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
Aneesh Raveendran
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Aneesh Raveendran
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Aneesh Raveendran
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Aneesh Raveendran
 

More from Aneesh Raveendran (8)

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
 
Branch prediction
Branch predictionBranch prediction
Branch prediction
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
 

Recently uploaded

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 

Recently uploaded (20)

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 

Pipelineing idealisam

  • 1. PIPELINING IDEALISM ANEESH R Center For Development of Advanced Computing (C-DAC) INDIA aneeshr2020@gmail.com ANEESH R
  • 2. Pipelining idealism • Motivation of a k-stage pipelined design is to achieve a k-folded increase in throughput. • The K-fold increase in throughput represents the ideal case. • Unavoidable deviations form the idealism in real pipeline make pipelined design more challenging . • Solution for idealism – realism gap in pipelining is more challenging. • Three points in pipelining idealism are :- • Uniform sub-computations : Computation to be performed is evenly partitioned into uniform latency computations. • Identical sub-computations : Same computation is to be performed repeatedly on a large number of input data sets • Independent sub-computations : All the repetitions of the same computations are mutually independent ANEESH R aneeshr2020@gmail.com
  • 3. Uniform sub-computations • The computation to be pipelined can be evenly partitioned into K-uniform latency subcomputations. • Original design can be evenly partitioned into K-balanced(i.e. having same latency) pipeline stages. • If the latency of the original computation and hence the clocking period of the non-pipelined design is “T”, then clocking period of a k-stage pipelined design is exactly “T/K”. • The k-folded increase in throughput is achieved due to the k-fold increase of the clocking rate. • • • This idealized concept may not be true in an actual pipeline design. It may not be possible to partition the computation into perfectly balanced stages. The latency of 400 ns of the non-pipelined computation is partitioned into three stages with latencies of 125, 150, and 125 ns, respectively. • The original latency has not been evenly partitioned into three balanced stages. ANEESH R aneeshr2020@gmail.com
  • 4. Uniform sub-computations (cont…) • The clocking period of a pipelined design is dictated by the stage with the longest latency. • The stages with shorter latencies in effect will incur some inefficiency or penalty. • The first and third stages have an inefficiency of 25 ns each. • These are the internal fragmentation of pipeline stages. • The total latency required for performing the same computation will increase from T to Tf • The clocking period of the pipelined design will be no longer T/k but Tf/k • The performance of the three sub-computations will require 450 ns instead of the original 400 ns • The clocking period will be not 133 ns (400/3 ns) but 150 ns ANEESH R aneeshr2020@gmail.com
  • 5. Uniform sub-computations (cont…) • In actual designs, an additional delay is introduced by the introduction of buffers between pipeline stages and an additional delay is also required for ensuring proper clocking of the pipeline stages. • An additional 22 ns is required to ensure proper clocking of the pipeline stages. • This results in the cycle time of 172 ns for the three-stage pipelined design. • The ideal cycle time for a three-stage pipelined design would have been 133 ns. • The difference between 172 and 133 ns for the clocking period accounts for the shortfall from the idealized three-fold increase of throughput. ANEESH R aneeshr2020@gmail.com
  • 6. Uniform sub-computations (cont…) • Uniform sub-computations basically assumes two things: • There is no inefficiency introduced due to the partitioning of the original computation into multiple sub-computations • There is no additional delay caused by the introduction of the inter-stage buffers and the clocking requirements • The additional delay incurred for proper pipeline clocking can be minimized by employing latches similar to the Earle latch • The partitioning of a computation into balanced pipeline stages constitutes the first challenge of pipelined design • • The goal is to achieve stages as balanced as possible to minimize internal fragmentation Internal fragmentation is the primary cause of deviation from the first point of pipelining idealism • This deviation leads to the shortfall from the idealized k-fold increase of throughput in a kstage pipelined design ANEESH R aneeshr2020@gmail.com
  • 7. Identical sub-computations • Many repetitions of the same computation are to be performed by the pipeline. • The same computation is repeated on multiple sets of input data. • Each repetition requires the same sequence of sub-computations provided by the pipeline stages. • This is certainly true for the Pipelined Floating-Point Multiplier. • Because this pipeline performs only one function, that is, floating-point multiplication. • Many pairs of floating-point numbers are to be multiplied. • Each pair of operands is sent through the same three pipeline stages. • All the pipeline stages are used by every repetition of the computation. ANEESH R aneeshr2020@gmail.com
  • 8. Identical sub-computations(cont…) • If a pipeline is designed to perform multiple functions, this assumption may not hold. • An arithmetic pipeline can be designed to perform both addition and multiplication • Not all the pipeline stages may be required by each of the functions supported by the pipeline • A different subset of pipeline stages is required for performing each of the functions • Each computation may not require all the pipeline stages • Some data sets will not require some pipeline stages and effectively will be idling during those stages • These unused or idling pipeline stages introduce another form of pipeline inefficiency • Called external fragmentation of pipeline stages • External fragmentation is a form of pipelining overhead and should be minimized in multifunction pipelines ANEESH R aneeshr2020@gmail.com
  • 9. Identical sub-computations(cont…) • Identical computations effectively assume that all pipeline stages are always utilized. • It also implies that there are many sets of data to be processed. • It takes k cycles for the first data set to reach the last stage of the pipeline. • These cycles are referred to as the pipeline fill time. • After the last data set has entered the first pipeline stage, an additional k cycles are needed to drain the pipeline. • During pipeline fill and drain times, not all the stages will be busy. • Assuming the processing of many sets of input data is that the pipeline fill and drain times constitute a very small fraction of the total time. • The pipeline stages can be considered, for all practical purposes, to be always busy. ANEESH R aneeshr2020@gmail.com
  • 10. Independent sub-computations • The repetitions of computation, or simply computations, to be processed by the pipeline are independent • All the computations that are concurrently resident in the pipeline stages are independent • They have no data or control dependences between any pair of the computations • This permits the pipeline to operate in "streaming" mode • A later computation needs not wait for the completion of an earlier computation due to a dependence between them • For our pipelined floating-point multiplier this assumption holds • If there are multiple pairs of operands to be multiplied, the multiplication of a pair of operands does not depend on the result from another multiplication • These pairs can be processed by the pipeline in streaming mode ANEESH R aneeshr2020@gmail.com
  • 11. Independent sub-computations (Cont…) • For some pipelines this point may not hold :• A later computation may require the result of an earlier computation • Both of these computations can be concurrently resident in the pipeline stages • If the later computation has entered the pipeline stage that needs the result while the earlier computation has not reached the pipeline stage that produces the needed result, the later computation must wait in that pipeline stage • Referred to as a pipeline stall • If a computation is stalled in a pipeline stage, all subsequent computations may have to be stalled • Pipeline stalls effectively introduce idling pipeline stages • This is essentially a dynamic form of external fragmentation and results in the reduction of pipeline throughput • In designing pipelines that need to process computations that are not necessarily independent, the goal is to produce a pipeline design that minimizes the amount of pipeline stalls ANEESH R aneeshr2020@gmail.com
  • 13. • This topic is adopted form “Micro-processor design” by authors “SHEN” and “LIPSATI” ANEESH R aneeshr2020@gmail.com