2. Pipelining idealism
•
Motivation of a k-stage pipelined design is to achieve a k-folded increase in throughput.
•
The K-fold increase in throughput represents the ideal case.
•
Unavoidable deviations form the idealism in real pipeline make pipelined design more
challenging .
•
Solution for idealism – realism gap in pipelining is more challenging.
•
Three points in pipelining idealism are :-
•
Uniform sub-computations : Computation to be performed is evenly partitioned into
uniform latency computations.
•
Identical sub-computations : Same computation is to be performed repeatedly on a
large number of input data sets
•
Independent sub-computations : All the repetitions of the same computations are
mutually independent
ANEESH R
aneeshr2020@gmail.com
3. Uniform sub-computations
•
The computation to be pipelined can be evenly partitioned into K-uniform latency subcomputations.
•
Original design can be evenly partitioned into K-balanced(i.e. having same latency) pipeline
stages.
•
If the latency of the original computation and hence the clocking period of the non-pipelined
design is “T”, then clocking period of a k-stage pipelined design is exactly “T/K”.
•
The k-folded increase in throughput is achieved due to the k-fold increase of the clocking rate.
•
•
•
This idealized concept may not be true in an actual pipeline design.
It may not be possible to partition the computation into perfectly balanced stages.
The latency of 400 ns of the non-pipelined computation is partitioned into three stages with
latencies of 125, 150, and 125 ns, respectively.
•
The original latency has not been evenly partitioned into three balanced stages.
ANEESH R
aneeshr2020@gmail.com
4. Uniform sub-computations (cont…)
•
The clocking period of a pipelined design is dictated by the stage with the longest
latency.
• The stages with shorter latencies in effect will incur some inefficiency or
penalty.
• The first and third stages have an inefficiency of 25 ns each.
• These are the internal fragmentation of pipeline stages.
• The total latency required for performing the same computation will increase
from T to Tf
• The clocking period of the pipelined design will be no longer T/k but Tf/k
• The performance of the three sub-computations will require 450 ns instead of
the original 400 ns
• The clocking period will be not 133 ns (400/3 ns) but 150 ns
ANEESH R
aneeshr2020@gmail.com
5. Uniform sub-computations (cont…)
•
In actual designs, an additional delay is introduced by the introduction of buffers
between pipeline stages and an additional delay is also required for ensuring
proper clocking of the pipeline stages.
• An additional 22 ns is required to ensure proper clocking of the pipeline stages.
• This results in the cycle time of 172 ns for the three-stage pipelined design.
• The ideal cycle time for a three-stage pipelined design would have been 133
ns.
• The difference between 172 and 133 ns for the clocking period accounts for
the shortfall from the idealized three-fold increase of throughput.
ANEESH R
aneeshr2020@gmail.com
6. Uniform sub-computations (cont…)
•
Uniform sub-computations basically assumes two things:
•
There is no inefficiency introduced due to the partitioning of the original computation into
multiple sub-computations
•
There is no additional delay caused by the introduction of the inter-stage buffers and the
clocking requirements
•
The additional delay incurred for proper pipeline clocking can be minimized by employing latches
similar to the Earle latch
•
The partitioning of a computation into balanced pipeline stages constitutes the first challenge of
pipelined design
•
•
The goal is to achieve stages as balanced as possible to minimize internal fragmentation
Internal fragmentation is the primary cause of deviation from the first point of pipelining idealism
•
This deviation leads to the shortfall from the idealized k-fold increase of throughput in a kstage pipelined design
ANEESH R
aneeshr2020@gmail.com
7. Identical sub-computations
•
Many repetitions of the same computation are to be performed by the pipeline.
• The same computation is repeated on multiple sets of input data.
• Each repetition requires the same sequence of sub-computations provided by
the pipeline stages.
•
This is certainly true for the Pipelined Floating-Point Multiplier.
• Because this pipeline performs only one function, that is, floating-point
multiplication.
• Many pairs of floating-point numbers are to be multiplied.
• Each pair of operands is sent through the same three pipeline stages.
• All the pipeline stages are used by every repetition of the computation.
ANEESH R
aneeshr2020@gmail.com
8. Identical sub-computations(cont…)
•
If a pipeline is designed to perform multiple functions, this assumption may not hold.
•
An arithmetic pipeline can be designed to perform both addition and multiplication
•
Not all the pipeline stages may be required by each of the functions supported by the
pipeline
•
A different subset of pipeline stages is required for performing each of the functions
•
Each computation may not require all the pipeline stages
•
Some data sets will not require some pipeline stages and effectively will be idling during
those stages
•
These unused or idling pipeline stages introduce another form of pipeline inefficiency
• Called external fragmentation of pipeline stages
•
External fragmentation is a form of pipelining overhead and should be minimized in
multifunction pipelines
ANEESH R
aneeshr2020@gmail.com
9. Identical sub-computations(cont…)
•
Identical computations effectively assume that all pipeline stages are always utilized.
•
It also implies that there are many sets of data to be processed.
• It takes k cycles for the first data set to reach the last stage of the pipeline.
• These cycles are referred to as the pipeline fill time.
• After the last data set has entered the first pipeline stage, an additional k cycles are
needed to drain the pipeline.
• During pipeline fill and drain times, not all the stages will be busy.
• Assuming the processing of many sets of input data is that the pipeline fill and
drain times constitute a very small fraction of the total time.
• The pipeline stages can be considered, for all practical purposes, to be always
busy.
ANEESH R
aneeshr2020@gmail.com
10. Independent sub-computations
•
The repetitions of computation, or simply computations, to be processed by the
pipeline are independent
• All the computations that are concurrently resident in the pipeline stages are
independent
• They have no data or control dependences between any pair of the
computations
• This permits the pipeline to operate in "streaming" mode
• A later computation needs not wait for the completion of an earlier computation
due to a dependence between them
• For our pipelined floating-point multiplier this assumption holds
• If there are multiple pairs of operands to be multiplied, the multiplication of a pair
of operands does not depend on the result from another multiplication
• These pairs can be processed by the pipeline in streaming mode
ANEESH R
aneeshr2020@gmail.com
11. Independent sub-computations (Cont…)
•
For some pipelines this point may not hold :•
A later computation may require the result of an earlier computation
•
Both of these computations can be concurrently resident in the pipeline stages
•
If the later computation has entered the pipeline stage that needs the result while the earlier
computation has not reached the pipeline stage that produces the needed result, the later
computation must wait in that pipeline stage
• Referred to as a pipeline stall
•
If a computation is stalled in a pipeline stage, all subsequent computations may have to be
stalled
•
Pipeline stalls effectively introduce idling pipeline stages
•
This is essentially a dynamic form of external fragmentation and results in the reduction of
pipeline throughput
•
In designing pipelines that need to process computations that are not necessarily independent, the
goal is to produce a pipeline design that minimizes the amount of pipeline stalls
ANEESH R
aneeshr2020@gmail.com