Parallel Prefix Adders Presentation

7,362 views
6,856 views

Published on

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,362
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Parallel Prefix Adders Presentation

  1. 1. Parallel prefixaddersKostas Vitoroulis, 2006.Presented to Dr. A. J. Al-Khalili.Concordia University.
  2. 2. Overview of presentation Parallel prefix operations Binary addition as a parallel prefix operation Prefix graphs Adder topologies Summary
  3. 3. Parallel Prefix OperationTerminology background: Prefix: The outcome of the operation depends on the initial inputs. Parallel: Involves the execution of an operation in parallel. This is done by segmentation into smaller pieces that are computed in parallel. Operation: Any arbitrary primitive operator “ ° ” that is associative is parallelizable  it is fast because the processing is accomplished in a parallel fashion.
  4. 4. Example: Associative operations are parallelizableConsider the logical OR operation: a + bThe operation is associative:a + b + c + d = ((( a + b ) + c) + d ) = (( a + b ) + ( c + d)) Serial implementation: Parallel implementation: a+b a a+b b (a+b)+(c+d) (a+b)+c c d c+d ((a+b)+c)+d
  5. 5. Mathematical Formulation: Prefix Sum  Operator: “ ° ”  this is the unary operator known as “scan” or “prefix sum”  Input is a vector: A = AnAn-1 … A1  Output is another vector: B = BnBn-1 … B1 where B1 = A1  Bn represents the B2 = A1 ° A2 operator being applied to … all terms of the vector. Bn = A1 ° A2 … ° An
  6. 6. Example of prefix sumConsider the vector: A = AnAn-1 … A1 where element Ai is an integerThe “*” unary operator, defined as: *A = BWith B = BnBn-1 … B1 B 1 = A1 B2 = A1 * A2 B 3 = A 1 * A 1 * A3 …and ‘ * ’ here is the integer addition operation.
  7. 7. Example of prefix sumCalculation of *A, where A = 6 5 4 3 2 1 yields: B = *A = 21 15 10 6 3 1Because the summation is associative the calculation can be done in parallel in thefollowing manner: Parallel implementation versus Serial implementation 6 5 4 3 2 1 6 5 4 3 2 1 + + + + + + B = (A + A +3A3 B6 =3B61+…A1 + A2) = A 2 = A11 = 12 +A + 1 = (A6 + A+) + + + =6 5 ((A4+A3) +(A2 +A1)) + += 21 B6 B5 B4 B3 B2 B1 B6 B5 B4 B3 B2 B1
  8. 8. Binary Addition This is the pen and paper addition of two 4-bit binary numbers x and y. c represents the generated carries. s represents the produced sum bits. c3 c2 c1 c0 x3 x2 x1 x0 A stage of the addition is the set of + x and y bits being used to produce y3 y2 y1 y0 the appropriate sum and carry bits. s4 s3 s2 s1 s0 For example the highlighted bits x2, y2 constitute stage 2 which generates carry c2 and sum s2 .Each stage i adds bits ai, bi, ci-1 and produces bits si, ciThe following hold: ai bi ci Comment: Formal definition: 0 0 0 The stage “kills” an incoming carry. “Kill” bit: ki = xi + yi 0 1 ci-1 The stage “propagates” an incoming carry “Propagate” bit: pi = xi ⊕ yi 1 0 ci-1 The stage “propagates” an incoming carry 1 1 1 The stage “generates” a carry out “Generate” bit: g i = xi • yi
  9. 9. Binary Addition ai bi ci Comment: Formal definition: 0 0 0 The stage “kills” an incoming carry. “Kill” bit: ki = xi + yi 0 1 ci-1 The stage “propagates” an incoming carry “Propagate” bit: pi = xi ⊕ yi 1 0 ci-1 The stage “propagates” an incoming carry 1 1 1 The stage “generates” a carry out “Generate” bit: g i = xi • yiThe carry ci generated by a stage i is given by the equation: ci = g i + pi ⋅ ci −1 = xi ⋅ yi + ( xi ⊕ yi ) ⋅ ci −1 ci = xi ⋅ yi + ( xi + yi ) ⋅ ci −1 = g i + ai ⋅ ci −1This equation can be simplified to:The “ai” term in the equation being the “alive” bit.The later form of the equation uses an OR gate instead of an XOR which is a more efficient gate when implementedin CMOS technology. Note that: ai = kiWhere ki is the “kill” bit defined in the table above.
  10. 10. Carry Look Ahead addersThe CLA adder has the following 3-stage structure: Pre-calculation of pi, gi for each stage Calculation of carry ci for each stage. Combine ci and pi of each stage to generate the sum bits si Final sum.
  11. 11. Carry Look Ahead adders The pre-calculation stage is implemented using the equations for pi, gi shown at a previous slide: x2y2 x1y1 x0y0 g2 p2 g1 p1 g0 p0 Alternatively using the “alive” bit: x2y2 x1y1 x0y0 g2 a2 g1 a1 g0 a0 Note the symmetry when we use the “propagate” or the “alive” bit… We can use them interchangeably in the equations!
  12. 12. Carry Look Ahead adders The carry calculation stage is implemented using the equations produced when unfolding the recursive equation: ci = g i + pi ⋅ ci −1 = g i + ai ⋅ ci −1 g2 p2 g1 p1 g0 p0 c0 = g 0 c1 = g1 + p1 ⋅ g 0 c2 = g 2 + p2 ⋅ c1 = g 2 + p2 ⋅ ( g1 + p1 ⋅ g 0 ) Carry generator block = g 2 + p2 ⋅ g1 + p2 ⋅ p1 ⋅ g 0 etc  c2 c1 c0
  13. 13. Carry Look Ahead adders The final sum calculation stage is implemented using the carry and propagate bits ci,pi: si = pi ⊕ ci −1 , with pi = xi ⊕ yi Note : si = g i + ai ⋅ ci −1 , with ai = xi + yi c2p3 c1p2 c0p1 cinp0 s3 s2 s1 s0 If the ‘alive’ bit ai is used the final sum stage becomes more complex as implied by the equations above.
  14. 14. Binary addition as a prefix sum problem.  We define a new operator: “ ° ”  Input is a vector of pairs of ‘propagate’ and ‘generate’ bits: ( g n , pn )( g n−1 , pn−1 )  ( g 0 , p0 )  Output is a new vectornofGn −1 , Pn −1 )  ( G0 , P0 ) ( Gn , P )( pairs:  Each pair of the output vector is calculated by the following definition: i , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 ) (G Where : (G0 , P0 ) = ( g 0 , p0 ) ( g x , px )  ( g y , p y ) = ( g x + px ⋅ g y , px ⋅ p y ) with +,⋅ being the OR, AND operations
  15. 15. Binary addition as a prefix sum problem.  Properties of operator “ ° ”:  Associativity (hence parallelization)  Easy to prove based on the fact that the logical AND, OR operations are associative.  With the definition: (Gi , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 ) Where (G1 , P ) = ( g1 , p1 ) 1 Gi becomes the carry signal at stage i of an adder. Illustration on next slide.  The operation is idempotent ( g x , px )  ( g x , px ) = ( g x + px ⋅ g x , px ⋅ px ) = ( g x , px )  Which implies (Gi: j , Pi: j ) = (Gi:n , Pi:n )  (Gm: j , Pm: j ) Where i ≥ j and m≥n
  16. 16. Binary Addition as a prefix sum problem. A stage i will generate a carry if gi=aibi a3 a2 a1 a0 and propagate a carry if + pi=XOR(ai,bi) b3 b2 b1 b0 Hence for stage i: ci=gi+pici-1 With : Where : (Gi , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 ) (G0 , P0 ) = ( g 0 , p0 ) ( g x , px )  ( g y , p y ) = ( g x + px ⋅ g y , px ⋅ p y ) … The familiar We have : carry bit generating (G1 , P ) = ( g1 , p1 ) 1 equations for stage i (G2 , P2 ) = ( g 2 , p2 )  (G1 , P ) = ( g 2 + p2 ⋅ g1 , p2 ⋅ p1 ) 1 in a CLA adder. (G3 , P3 ) = ( g 3 , p3 )  (G2 , P2 ) = ( g 3 + p3 ⋅ ( g 2 + p2 ⋅ g1 ), p3 ⋅ p2 ⋅ p1 ) = ( g 3 + p3 ⋅ g 2 + p3 ⋅ p2 ⋅ g1 ), p3 ⋅ p2 ⋅ p1 ) etc 
  17. 17. Addition as a prefix sum problem.Conclusion:The equations of the well known CLA adder can be formulated as a parallelprefix problem by employing a special operator “ ° ”.This operator is associative hence it can be implemented in a parallelfashion.A Parallel Prefix Adder (PPA) is equivalent to the CLA adder… The twodiffer in the way their carry generation block is implemented.In subsequent slides we will see different topologies for the parallelgeneration of carries. Adders that use these topologies are called ParallelPrefix Adders.
  18. 18. Parallel Prefix Adders The parallel prefix adder employs the 3-stage structure of the CLA adder. The improvement is in the carry generation stage which is the most intensive one: Pre-calculation of Pi, Gi terms Straight forward as in the CLA adder Calculation of the carries. Prefix graphs can be used to This part is parallelizable to describe the reduce time. structure that performs this part. Simple adder to generate the sum Straight forward as in the CLA adder
  19. 19. Calculation of carries – PrefixGraphsThe components usually seen in a prefix graph are the following: processing component: buffer component: (g in1 , pin1 ) ( gin , pin ) ( g in2 , pin2 ) ( g out , pout ) ( g out , pout ) ( g out , pout ) ( g out , pout ) ( g out , pout ) = ( g in 1 + pin 1 ⋅ g in2 , pin1 ⋅ pin 2 ) ( g out , pout ) = ( g in , pin )
  20. 20. Prefix graphs for representation ofPrefix addition Example: serial adder carry generation represented by prefix graphs (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1
  21. 21. Key architectures for carry calculation: 1960: J. Sklansky – conditional adder 1973: Kogge-Stone adder 1980: Ladner-Fisher adder 1982: Brent-Kung adder 1987: Han Carlson adder 1999: S. KnowlesOther parallel adder architectures: 1981: H. Ling adder 2001: Beaumont-Smith
  22. 22. 1960: J. Sklansky – conditional adder
  23. 23. 1960: J. Sklansky – conditional adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1 The Sklansky adder has:  Minimal depth  High fan-out nodes
  24. 24. 1973: Kogge-Stone adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1 The Kogge-Stone adder has:  Low depth  High node count (implies more area).  Minimal fan-out of 1 at each node (implies faster performance).
  25. 25. 1980: Ladner-Fischer adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1 The Ladner-Fischer adder has:  Low depth  High fan-out nodes  This adder topology appears the same as the Schlanskly conditional sum adder. Ladner-Fischer formulated a parallel prefix network design space which included this minimal depth case. The actual adder they included as an application to their work had a structure that was slightly different than the above.
  26. 26. 1982: Brent-Kung adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1 The Brent-Kung adder is the extreme boundary case of:  Maximum logic depth in PP adders (implies longer calculation time).  Minimum number of nodes (implies minimum area).
  27. 27. 1987: Han Carlson adder The Han-Carlson adder combines the Brent-Kung and Kogge-Stone structures into a hybrid structure.  Efficient  Suitable for VLSI implementation.
  28. 28. 1999: S. Knowles  Knowles proposed adders that trade off: Brent-Kung topology (Minimum fan-out)  Depth, interconnect, area.  These adders are bound by the Knowles Lander-Fischer topologies (Varied fan-out (minimum depth) at each level ) and Brent-Kung (minimum Ladner-Fischer fanout) topologies. topology (Minimum depth, high fanout)
  29. 29. An interesting taxonomy: Harris[2003] presented an interesting 3-D taxonomy of the adders presented so far. Each axis represents a characteristic of the adders: -Fanout -Logic depth -Wire connections He also proposed the following structure:
  30. 30. 1981: H. Ling adder Ling Adders are a different family of adders. They can still be formulated as prefix adders. Ling adders differ from the “traditional” PP adders in that:  They are based on a different set of equations.  The new set of equations introduces the following tradeoffs: Precalculation of Pi, Gi terms is based on more complex equations Calculation of the carries is based on simpler equations Final addition stage is more complex
  31. 31. Beaumont-Smith 7 6 5 4 3 2 1 0 G1:0=G1+P1G0 G2:1=G2+P2G1 G2:0=G2+P2G1+P2P1G0 G7:0=G7:4+P7:4G4:0 G3:2=G3+P3G2 6-5 5-4 3-2 2-1 1-0 P7:4=P7P5P4 7-6 G3:1=G3+P3G2+P3P2G1 7-5 6-4 3-1 2-0 G3:0=G3+P3G2+ 7-4 3-0G7:4=G7+P7G6+P7P6G5+P7P6P5G4 P3P2G1+P3P2P1G0G7:5=G7+P7G6+P7P6G5 G4:0=G4+P4G3+P4P3G2 +P4P3P2G1+P4P3P2P1GO 6-0 5-0 4-0 7-0 G5:4=G5+P5G4 G5:0=G5:4+P5:4G4:0 c8 c7 c6 c5 c4 c3 c2 c1 P5:4=P5P4 G6:5=G6+P6G5 G7:6=G7+P7G6 G6:0=G6+P6G5:4+P6P5:4G4:0 G6:4=G6+P6G5+P6P5G4 P5:4=P5P4
  32. 32. Summary (1/3) The parallel prefix formulation of binary addition is a very convenient way to formally describe an entire family of parallel binary adders.
  33. 33. Summary (2/3) A parallel prefix adder can be seen as a 3-stage process: Pre-calculation of Pi, Gi terms Calculation of the carries. Simple adder to generate the sum There exist various architectures for the carry calculation part. Trade-offs in these architectures involve the  area of the adder  its depth  the fan-out of the nodes  the overall wiring network.
  34. 34. Summary (3/3) Variations of parallel adders have been proposed. These variations are based on:  Modifying the carry generation equations and reformulating the prefix definition (Ling)  Restructuring the carry calculation trees based by optimizing for a specific technology (Beaumond- Smith)  Other optimizations.
  35. 35. References:Beaumont-Smith, Cheng-Chew Lim, “Parallel Prefix Adder Design”, IEEE, 2001Han, Carlson, “Fast Area-Efficient VLSI Adders, IEEE, 1987Dimitrakopoulos, Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders”, IEEE 2005Kogge, Stone, “A Parallel Algorithm for the Efficient solution of a General Class of Recurrence equations”, IEEE, 1973Simon Knowles, “A Family of adders”, IEEE, 2001Ladner, Fischer, “Parallel Prefix Computation”, ACM, 1980Brent, Kung, “A regular Layout for Parallel Adders”, IEEE, 1982H. Ling, “High-Speed Binary Adder”, IBM J. Res. And Dev., 1980J. Sklansky, “Conditional-Sum Addition Logic”, IRE transactions on computers, 1960D. Harris, “A Taxonomy of Parallel Prefix Networks”, IEEE, 2003

×