Successfully reported this slideshow.
Upcoming SlideShare
×

# Augmented Arithmetic Operations Proposed for IEEE-754 2018

133 views

Published on

Algorithms for extending arithmetic precision through compensated summation or arithmetics like double-double rely on operations commonly called twoSum and twoProduct. The current draft of the IEEE 754 standard specifies these operations under the names augmentedAddition and augmentedMultiplication. These operations were included after three decades of experience because of a motivating new use: bitwise reproducible arithmetic. Standardizing the operations provides a hardware acceleration target that can provide at least a 33% speed improvements in reproducible dot product, placing reproducible dot product almost within a factor of two of common dot product. This paper provides history and motivation for standardizing these operations. We also define the operations, explain the rationale for all the specific choices, and provide parameterized test cases for new boundary behaviors.

Published in: Engineering
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Augmented Arithmetic Operations Proposed for IEEE-754 2018

1. 1. Augmented Arithmetic Operations Proposed for IEEE 754-2018 E. Jason Riedy Georgia Institute of Technology 25th IEEE Symposium on Computer Arithmetic, 26 June 2018
2. 2. Outline History and Deﬁnitions Decisions Testing Summary Augmented Arith Ops — ARITH 25, 26 June 2018 1/18
3. 3. History and Deﬁnitions
4. 4. Quick Deﬁnitions twoSum Two summands x and y ⇒ sum s and error e fastTwoSum Two summands x and y with |x| ≥ |y| ⇒ sum s and error e twoProd Two multiplicands x and y ⇒ product p and error e All can overﬂow. Only twoProd can underﬂow with gradual underﬂow. Otherwise, relationships are exact for ﬁnite x and y. These are for binary ﬂoating point. Augmented Arith Ops — ARITH 25, 26 June 2018 2/18
5. 5. Partial History • Møller (1965), Knuth (1965): twoSum for quasi double precision • Kahan (1965): fastTwoSum for compensated summation • Dekker (1971): fastTwoSum for expansions • Shewchuk (1997): Geometric predicates • Brigg (1998): quasi double precision implementation as double-double • Hida, Li, Bailey (2001): quad-double • Ogita, Rump, Oishi (2005...): Error-free transformations • IEEE 754-2008: Punted to the future • Ahrens, Nguyen, Demmel (2013...): ReproBLAS Augmented Arith Ops — ARITH 25, 26 June 2018 3/18
6. 6. “Typical” twoSum void twoSum (const double x, const double y, double * head, double * tail) // Knuth, second volume of TAoCP { const double s = x + y; const double bb = s − x; const double err = (x − (s − bb)) + (y − bb); *head = s; *tail = err; } Six instructions, long dependency chain. Reducing to one or two instructions: More uses. Augmented Arith Ops — ARITH 25, 26 June 2018 4/18
7. 7. twoProduct void twoProduct (const double a, const double b, double *head, double *tail) { double p = a * b; double t = fma (a, b, −p); *head = p; *tail = t; } Already two instructions. With the right instruction. Augmented Arith Ops — ARITH 25, 26 June 2018 5/18
8. 8. twoSum, twoProd Unusual Exceptional Cases x y head tail signal ∞ + ∞ ⇒ ∞ NaN invalid −∞ + −∞ ⇒ −∞ NaN invalid x + y ⇒ ∞ NaN invalid, overﬂow, inexact (x + y overﬂows) −x + −y ⇒ −∞ NaN invalid, overﬂow, inexact (−x − y overﬂows) −0 + −0 ⇒ −0 +0 (none) x × y ⇒ ∞ −∞ overﬂow, inexact (x × y overﬂows) −x × y ⇒ −∞ ∞ overﬂow, inexact (−x × y overﬂows) −0 × 0 ⇒ −0 0 (none) Augmented Arith Ops — ARITH 25, 26 June 2018 6/18
9. 9. Other Orders, Other Cases • Algebraic re-orderings produce different cases • Different signs • Different NaN locations • Similarly for twoProd and fastTwoSum. • Hence, double-double behaves strangely... • And reprodicibility becomes tricky. Augmented Arith Ops — ARITH 25, 26 June 2018 7/18
10. 10. Reproducible Linear Algebra • Ahrens, Nguyen, Demmel: K-fold indexed sum • Base of the ReproBLAS, associative • Core operation: fastTwoSum! • But rounding must not depend on the output value. Details: • Demmel and Nguyen, “Fast Reproducible Floating-Point Summation,” ARITH21, 2013. • Ahrens, Nguyen, and Demmel, “Effcient Reproducible Floating Point Summation and BLAS,” UCB Tech Report, 2016. • bebop.cs.berkeley.edu/reproblas/ Note: There are other methods, e.g. ARM’s. Augmented Arith Ops — ARITH 25, 26 June 2018 8/18
11. 11. Performance? Emulating augmentedAddition as two instructions improves double-double: Operation Skylake Haswell Addition latency −55% −45% throughput +36% +18% Multiplication latency −3% 0% throughput +11% +16% DDGEMM MFLOP/s: Operation Intel Skylake Intel Haswell “Typical” implementation 1732 (≈ 1/37 DP) 1199 (≈ 1/45 DP) Two-insn augmentedAddition 3344 (≈ 1/19 DP) 2283 (≈ 1/24 DP) Dukhan, Riedy, Vuduc. “Wanted: Floating-point add round-off error instruction,” PMAA 2016. ReproBLAS dot product: 33% rate improvement, only 2× slower than non-reproducible. Augmented Arith Ops — ARITH 25, 26 June 2018 9/18
12. 12. Decisions
13. 13. Decisions Made for Standardization • Time to standardize? • Was considered for 2008, but only use was extending precision. • Now more uses that can beneﬁt. • Names? • twoSum, etc. already exist. • So augmented arithmetic operations. • augmentedAddition, augmentedSubtraction, augmentedMultiplication • Exceptional behavior? • My ﬁrst proposal was twoSum above. • Generally not wise... • Rounding... Augmented Arith Ops — ARITH 25, 26 June 2018 10/18
14. 14. Standard Behavior augmentedAddition (I) x y head tail signal NaN NaN NaN NaN invalid on sNaN ±∞ NaN NaN NaN invalid on sNaN NaN ±∞ NaN NaN invalid on sNaN ∞ ∞ ∞ ∞ (none) ∞ −∞ NaN NaN invalid −∞ ∞ NaN NaN invalid −∞ −∞ −∞ −∞ (none) x y ∞ ∞ overrﬂow, inexact (x + y overﬂows) −x −y −∞ −∞ overrﬂow, inexact (−x − y overﬂows) Augmented Arith Ops — ARITH 25, 26 June 2018 11/18
15. 15. Standard Behavior augmentedAddition (II) x y head tail signal +0 +0 +0 +0 (none) +0 −0 +0 +0 (none) −0 +0 +0 +0 (none) −0 −0 −0 −0 (none) Double-double, etc. look more like IEEE 754. Augmented Arith Ops — ARITH 25, 26 June 2018 12/18
16. 16. Rounding • Reproducibility cannot round ties to nearest even. • Depends on the output value. • Leaves rounding ties away... • Already exists for binary. • Surveyed users. • Shewchuck: triangle could not use it. • Rump, et al.: Proofs need reworked. • So rounding ties toward zero. • roundTiesToZero from decimal • Only these operations. • sigh. Augmented Arith Ops — ARITH 25, 26 June 2018 13/18
17. 17. Testing
18. 18. Testing augmentedAddition • Assume p bits of precision, and augmentedAddition(x, y) ⇒ (h, t). • Typical cases are ﬁne, only need special cases to test for roundTiesToZero. • Generate x, y such that y immediately follows x and causes a tie that rounds to a value easy to check. • Params: s sign, T signiﬁcand, E exponent • x = (−1)1−s · T · 2E with T odd • y = (−1)1−s · 2E−p−1 • Results: • roundTiesToZero: (x, y) • Away or to even: (x + 2y, −y) Augmented Arith Ops — ARITH 25, 26 June 2018 14/18
19. 19. Testing augmentedMultiplication: Rounding • Similarly need special cases to test for roundTiesToZero and underﬂow. • Generate x, y such that xy requires p + 1 bits and has 11 as the ﬁnal bits. • xm × ym = (2p+1 + 4M + 3) · 2E with 0 ≤ M < 2p−1 and emin ≤ E + p − 1 < emax • Sample M • Factor the signiﬁcand 2p+1 + 4M + 3 • Sample random exponents Exm and Eym with emin ≤ Exm + Eym < emax • But also need underﬂow... Augmented Arith Ops — ARITH 25, 26 June 2018 15/18
20. 20. Testing augmentedMultiplication: Underﬂow • Testing underﬂow: Product needs the tail chopped off by underﬂow. • Let 0 ≤ k < p − 1 be the number of additional bits we want below the hard underﬂow threshold. • Sample: • 0 ≤ M < 2p−1 − 1 • 0 ≤ N < 2p−k−1 • 0 ≤ T < 2k xm ×ym = ( (2p−1 + M) · 2p head + N · 2k tail + 2T + 1 below underﬂow ) ·2emin−p−k • This case must signal inexact & underﬂow. Augmented Arith Ops — ARITH 25, 26 June 2018 16/18
21. 21. Reference Code Not done yet, but... 1. Handle special cases. 2. If no ties (far case), twoSum. 3. If ties (near case), split and take care. • Tools from the next talk. Should be slow but clear. Anyone feel like adding this to the RISC-V Rocket core? Augmented Arith Ops — ARITH 25, 26 June 2018 17/18
22. 22. Summary
23. 23. Summary • Three “new” recommended ops in draft IEEE 754-2018 • augmentedAddition, augmentedSubtraction, augmentedMultiplication • Speciﬁed to support extending precision (double double), reproducible linear algebra • Two instruction implementation can provide good performance beneﬁts. • Quite easily testible • Future: Decimal? Hardware? Augmented Arith Ops — ARITH 25, 26 June 2018 18/18
24. 24. fastTwoSum void fastTwoSum (const double x, const double y, double * head, double * tail) /* Assumes that |x| ≤|y| */ { const double s = x + y; const double bb = s − x; const double err = y − bb; *head = s; *tail = err; } Augmented Arith Ops — ARITH 25, 26 June 2018
25. 25. Standard Behavior augmentedMultiplication (I) x y head tail signal NaN NaN NaN NaN invalid on sNaN ±∞ NaN NaN NaN invalid on sNaN NaN ±∞ NaN NaN invalid on sNaN x y ∞ ∞ overﬂow, inexact (x × y overﬂows) −x y −∞ −∞ overﬂow, inexact (−x × y overﬂows) x −y −∞ −∞ overﬂow, inexact (x × −y overﬂows) −x −y ∞ ∞ overﬂow, inexact (−x × −y overﬂows) Augmented Arith Ops — ARITH 25, 26 June 2018
26. 26. Standard Behavior augmentedMultiplication (II) x y head tail signal ∞ ∞ ∞ ∞ (none) ∞ −∞ −∞ −∞ (none) −∞ ∞ −∞ −∞ (none) −∞ −∞ ∞ ∞ (none) +0 +0 +0 +0 (none) +0 −0 −0 −0 (none) −0 +0 −0 −0 (none) −0 −0 +0 +0 (none) Double-double, etc. look more like IEEE 754. Augmented Arith Ops — ARITH 25, 26 June 2018