Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Augmented Arithmetic Operations Proposed for IEEE-754 2018

113 views

Published on

Algorithms for extending arithmetic precision through compensated summation or arithmetics like double-double rely on operations commonly called twoSum and twoProduct. The current draft of the IEEE 754 standard specifies these operations under the names augmentedAddition and augmentedMultiplication. These operations were included after three decades of experience because of a motivating new use: bitwise reproducible arithmetic. Standardizing the operations provides a hardware acceleration target that can provide at least a 33% speed improvements in reproducible dot product, placing reproducible dot product almost within a factor of two of common dot product. This paper provides history and motivation for standardizing these operations. We also define the operations, explain the rationale for all the specific choices, and provide parameterized test cases for new boundary behaviors.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Augmented Arithmetic Operations Proposed for IEEE-754 2018

  1. 1. Augmented Arithmetic Operations Proposed for IEEE 754-2018 E. Jason Riedy Georgia Institute of Technology 25th IEEE Symposium on Computer Arithmetic, 26 June 2018
  2. 2. Outline History and Definitions Decisions Testing Summary Augmented Arith Ops — ARITH 25, 26 June 2018 1/18
  3. 3. History and Definitions
  4. 4. Quick Definitions twoSum Two summands x and y ⇒ sum s and error e fastTwoSum Two summands x and y with |x| ≥ |y| ⇒ sum s and error e twoProd Two multiplicands x and y ⇒ product p and error e All can overflow. Only twoProd can underflow with gradual underflow. Otherwise, relationships are exact for finite x and y. These are for binary floating point. Augmented Arith Ops — ARITH 25, 26 June 2018 2/18
  5. 5. Partial History • Møller (1965), Knuth (1965): twoSum for quasi double precision • Kahan (1965): fastTwoSum for compensated summation • Dekker (1971): fastTwoSum for expansions • Shewchuk (1997): Geometric predicates • Brigg (1998): quasi double precision implementation as double-double • Hida, Li, Bailey (2001): quad-double • Ogita, Rump, Oishi (2005...): Error-free transformations • IEEE 754-2008: Punted to the future • Ahrens, Nguyen, Demmel (2013...): ReproBLAS Augmented Arith Ops — ARITH 25, 26 June 2018 3/18
  6. 6. “Typical” twoSum void twoSum (const double x, const double y, double * head, double * tail) // Knuth, second volume of TAoCP { const double s = x + y; const double bb = s − x; const double err = (x − (s − bb)) + (y − bb); *head = s; *tail = err; } Six instructions, long dependency chain. Reducing to one or two instructions: More uses. Augmented Arith Ops — ARITH 25, 26 June 2018 4/18
  7. 7. twoProduct void twoProduct (const double a, const double b, double *head, double *tail) { double p = a * b; double t = fma (a, b, −p); *head = p; *tail = t; } Already two instructions. With the right instruction. Augmented Arith Ops — ARITH 25, 26 June 2018 5/18
  8. 8. twoSum, twoProd Unusual Exceptional Cases x y head tail signal ∞ + ∞ ⇒ ∞ NaN invalid −∞ + −∞ ⇒ −∞ NaN invalid x + y ⇒ ∞ NaN invalid, overflow, inexact (x + y overflows) −x + −y ⇒ −∞ NaN invalid, overflow, inexact (−x − y overflows) −0 + −0 ⇒ −0 +0 (none) x × y ⇒ ∞ −∞ overflow, inexact (x × y overflows) −x × y ⇒ −∞ ∞ overflow, inexact (−x × y overflows) −0 × 0 ⇒ −0 0 (none) Augmented Arith Ops — ARITH 25, 26 June 2018 6/18
  9. 9. Other Orders, Other Cases • Algebraic re-orderings produce different cases • Different signs • Different NaN locations • Similarly for twoProd and fastTwoSum. • Hence, double-double behaves strangely... • And reprodicibility becomes tricky. Augmented Arith Ops — ARITH 25, 26 June 2018 7/18
  10. 10. Reproducible Linear Algebra • Ahrens, Nguyen, Demmel: K-fold indexed sum • Base of the ReproBLAS, associative • Core operation: fastTwoSum! • But rounding must not depend on the output value. Details: • Demmel and Nguyen, “Fast Reproducible Floating-Point Summation,” ARITH21, 2013. • Ahrens, Nguyen, and Demmel, “Effcient Reproducible Floating Point Summation and BLAS,” UCB Tech Report, 2016. • bebop.cs.berkeley.edu/reproblas/ Note: There are other methods, e.g. ARM’s. Augmented Arith Ops — ARITH 25, 26 June 2018 8/18
  11. 11. Performance? Emulating augmentedAddition as two instructions improves double-double: Operation Skylake Haswell Addition latency −55% −45% throughput +36% +18% Multiplication latency −3% 0% throughput +11% +16% DDGEMM MFLOP/s: Operation Intel Skylake Intel Haswell “Typical” implementation 1732 (≈ 1/37 DP) 1199 (≈ 1/45 DP) Two-insn augmentedAddition 3344 (≈ 1/19 DP) 2283 (≈ 1/24 DP) Dukhan, Riedy, Vuduc. “Wanted: Floating-point add round-off error instruction,” PMAA 2016. ReproBLAS dot product: 33% rate improvement, only 2× slower than non-reproducible. Augmented Arith Ops — ARITH 25, 26 June 2018 9/18
  12. 12. Decisions
  13. 13. Decisions Made for Standardization • Time to standardize? • Was considered for 2008, but only use was extending precision. • Now more uses that can benefit. • Names? • twoSum, etc. already exist. • So augmented arithmetic operations. • augmentedAddition, augmentedSubtraction, augmentedMultiplication • Exceptional behavior? • My first proposal was twoSum above. • Generally not wise... • Rounding... Augmented Arith Ops — ARITH 25, 26 June 2018 10/18
  14. 14. Standard Behavior augmentedAddition (I) x y head tail signal NaN NaN NaN NaN invalid on sNaN ±∞ NaN NaN NaN invalid on sNaN NaN ±∞ NaN NaN invalid on sNaN ∞ ∞ ∞ ∞ (none) ∞ −∞ NaN NaN invalid −∞ ∞ NaN NaN invalid −∞ −∞ −∞ −∞ (none) x y ∞ ∞ overrflow, inexact (x + y overflows) −x −y −∞ −∞ overrflow, inexact (−x − y overflows) Augmented Arith Ops — ARITH 25, 26 June 2018 11/18
  15. 15. Standard Behavior augmentedAddition (II) x y head tail signal +0 +0 +0 +0 (none) +0 −0 +0 +0 (none) −0 +0 +0 +0 (none) −0 −0 −0 −0 (none) Double-double, etc. look more like IEEE 754. Augmented Arith Ops — ARITH 25, 26 June 2018 12/18
  16. 16. Rounding • Reproducibility cannot round ties to nearest even. • Depends on the output value. • Leaves rounding ties away... • Already exists for binary. • Surveyed users. • Shewchuck: triangle could not use it. • Rump, et al.: Proofs need reworked. • So rounding ties toward zero. • roundTiesToZero from decimal • Only these operations. • sigh. Augmented Arith Ops — ARITH 25, 26 June 2018 13/18
  17. 17. Testing
  18. 18. Testing augmentedAddition • Assume p bits of precision, and augmentedAddition(x, y) ⇒ (h, t). • Typical cases are fine, only need special cases to test for roundTiesToZero. • Generate x, y such that y immediately follows x and causes a tie that rounds to a value easy to check. • Params: s sign, T significand, E exponent • x = (−1)1−s · T · 2E with T odd • y = (−1)1−s · 2E−p−1 • Results: • roundTiesToZero: (x, y) • Away or to even: (x + 2y, −y) Augmented Arith Ops — ARITH 25, 26 June 2018 14/18
  19. 19. Testing augmentedMultiplication: Rounding • Similarly need special cases to test for roundTiesToZero and underflow. • Generate x, y such that xy requires p + 1 bits and has 11 as the final bits. • xm × ym = (2p+1 + 4M + 3) · 2E with 0 ≤ M < 2p−1 and emin ≤ E + p − 1 < emax • Sample M • Factor the significand 2p+1 + 4M + 3 • Sample random exponents Exm and Eym with emin ≤ Exm + Eym < emax • But also need underflow... Augmented Arith Ops — ARITH 25, 26 June 2018 15/18
  20. 20. Testing augmentedMultiplication: Underflow • Testing underflow: Product needs the tail chopped off by underflow. • Let 0 ≤ k < p − 1 be the number of additional bits we want below the hard underflow threshold. • Sample: • 0 ≤ M < 2p−1 − 1 • 0 ≤ N < 2p−k−1 • 0 ≤ T < 2k xm ×ym = ( (2p−1 + M) · 2p head + N · 2k tail + 2T + 1 below underflow ) ·2emin−p−k • This case must signal inexact & underflow. Augmented Arith Ops — ARITH 25, 26 June 2018 16/18
  21. 21. Reference Code Not done yet, but... 1. Handle special cases. 2. If no ties (far case), twoSum. 3. If ties (near case), split and take care. • Tools from the next talk. Should be slow but clear. Anyone feel like adding this to the RISC-V Rocket core? Augmented Arith Ops — ARITH 25, 26 June 2018 17/18
  22. 22. Summary
  23. 23. Summary • Three “new” recommended ops in draft IEEE 754-2018 • augmentedAddition, augmentedSubtraction, augmentedMultiplication • Specified to support extending precision (double double), reproducible linear algebra • Two instruction implementation can provide good performance benefits. • Quite easily testible • Future: Decimal? Hardware? Augmented Arith Ops — ARITH 25, 26 June 2018 18/18
  24. 24. fastTwoSum void fastTwoSum (const double x, const double y, double * head, double * tail) /* Assumes that |x| ≤|y| */ { const double s = x + y; const double bb = s − x; const double err = y − bb; *head = s; *tail = err; } Augmented Arith Ops — ARITH 25, 26 June 2018
  25. 25. Standard Behavior augmentedMultiplication (I) x y head tail signal NaN NaN NaN NaN invalid on sNaN ±∞ NaN NaN NaN invalid on sNaN NaN ±∞ NaN NaN invalid on sNaN x y ∞ ∞ overflow, inexact (x × y overflows) −x y −∞ −∞ overflow, inexact (−x × y overflows) x −y −∞ −∞ overflow, inexact (x × −y overflows) −x −y ∞ ∞ overflow, inexact (−x × −y overflows) Augmented Arith Ops — ARITH 25, 26 June 2018
  26. 26. Standard Behavior augmentedMultiplication (II) x y head tail signal ∞ ∞ ∞ ∞ (none) ∞ −∞ −∞ −∞ (none) −∞ ∞ −∞ −∞ (none) −∞ −∞ ∞ ∞ (none) +0 +0 +0 +0 (none) +0 −0 −0 −0 (none) −0 +0 −0 −0 (none) −0 −0 +0 +0 (none) Double-double, etc. look more like IEEE 754. Augmented Arith Ops — ARITH 25, 26 June 2018

×