SlideShare a Scribd company logo
1 of 46
Download to read offline
LLNL-PRES-748385
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory
under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Universal Coding of the Reals:
Alternatives to IEEE Floating Point
Conference for Next Generation Arithmetic 2018
Peter Lindstrom
Scott Lloyd, Jeff Hittinger
March 28, 2018
LLNL-PRES-748385
2
Data movement will dictate performance and power usage at
exascale
Mega
Giga
Tera
Peta
Exa
1995 2000 2005 2010 2015 2020
O(106)
increase
O(103)
increase
LLNL-PRES-748385
3
§ IEEE assumes “one exponent size fits all”
— Insufficient precision for some computations, e.g. catastrophic cancellation
— Insufficient dynamic range for others, e.g. under- or overflow in exponentials
— For these reasons, double precision is preferred in scientific computing
§ Many bits of a double are contaminated by error
— Error from round-off, truncation, iteration, model, observation, …
— Pushing error around and computing on it is wasteful
§ To significantly reduce data movement, we must make every bit count!
— Next generation floating types must be re-designed from ground up
Uneconomical floating types and excessive precision are
contributing to data movement
LLNL-PRES-748385
4
POSITS [Gustafson 2017] are a new float
representation that improves on IEEE
IEEE 754 floating point Posits: UNUM version 3.0
Bit Length {16, 32, 64, 128} fixed length but arbitrary
Sign sign-magnitude (−0 ≠ +0) two’s complement (−0 = +0)
Exponent fixed length (biased binary) variable length (Golomb-Rice)
Fraction Map linear (φ(x) = 1 + x) linear (φ(x) = 1 + x)
Infinities {−∞, +∞} ±∞ (single point at infinity)
NaNs many (9 quadrillion) one
Underflow gradual (subnormals) gradual (natural)
Overflow 1 / FLT_TRUE_MIN = ∞ (oops!) never (exception: 1 / 0 = ±∞)
Base {2, 10} 22m
∈ {2, 4, 16, 256, …}
LLNL-PRES-748385
5
§ Both represent real value as y = (-1)s 2e (1 + f)
§ IEEE: fixed-length encoding of exponent
§ POSITS: variable-length encoding of exponent
— Alternative interpretation [Gustafson 2017]: leading one-bits encode “regime”
— More fraction bits are available when exponent is small
In terms of encoding scheme, IEEE and POSITS differ in one
important way
IEEE float IEEE double POSIT(7)
e =	D(1 x)	=	1 +	x
1-bit exponent “sign”
7-bit exponent magnitude x
e =	D(1 000 x)	=	1	+	x e =	D(1 0	m)	=	x
e	=	D(1 001 x) =	129	+	x e =	D(1 10	m)	=	128	+	x
e	=	D(1 010 x)	=	257	+	x e =	D(1 110	m)	=	256	+	x
… …
e =	D(1 111 x)	=	897	+	x e =	D(1 11111110	m)	=	896	+	x
LLNL-PRES-748385
6
§ Decompose integer z ≥ 1 as z = 2e + r
— e ≥ 0 is exponent
— r = z mod 2e is e-bit residual
§ Elias proposed three “prefix-free” variable-length codes for integers
— Elias γ code is equivalent to POSIT(0) [Gustafson & Yonemoto 2017]
— Elias δ code is equivalent to URR [Hamada 1983]
Related work: Universal coding of positive integers [Elias 1975]
Elias γ code Elias δ code Elias ω code
Exponent unary γ ω (recursive)
Code(1) 0 0 0
Code(2e + r) 1e 0 r 1 γ(e) r 1 ω(e) r
Code(17 = 24 + 1) 1111 0 0001 1 11 0 00 0001 1 1 1 0 0 00 0001
LLNL-PRES-748385
7
§ From z ∈ ℕ (positive integers) to y ∈ ℝ (reals)
— Rationals: 1 ≤ z < y < z + 1 encoded by appending fraction bits (as in IEEE, POSITS)
— Subunitaries: 0 < y < 1 encoded via two’s complement exponent (as in POSITS)
— Negatives: y < 0 encoded via two’s complement binary representation (as in POSITS)
— 2-bit prefix tells where we are on the real line (as in POSITS)
We extend Elias codes from positive integers to reals
+1 +2 +3 +4+1 +2 +3 +40 +1 +2 +3 +44 3 2 1 0 +1 +2 +3 +4
00 011110
4 3 2 1 0 +1 +2 +3 +4
LLNL-PRES-748385
8
POSITS, Elias recursively refine intervals (0, ±1), (±1, ±∞)
Example: base-2 posits (aka. Elias γ)
+1 0 111 1
000±110
+1 0 1011 10
0000±1100
0
11
+
2
0
01
+
1
2
2
1
01
1
2
1
11
+1 0 10 011 10 0
00000±11000
0
110
+
2
0
01
0
+
1
2
2
1
01
0
1
2
1
110
+
3
2
0 10 1
0111
+4
0 01 1
+ 3
4
0001
+
1
4
4
1001
3
2
1 01 1
3
4
1 10 1
1
4
1111
+1 0 10 0011 10 00
000000±110000
0
110
0
+
2
0
01
00
+
1
2
2
1
01
00
1
2
1
110
0
+
3
2
0 10 10
01110
+4
0 01 10
+ 3
4
00010
+
1
4
4
10010
3
2
1 01 10
3
4
1 10 10
1
4
11110
+
5
4
0 10 01
7
8
1 10 01
00001
+
1
8
8
10001
01101
+3
00101
+ 5
8
7
4
10101
3
8
11101
+
7
4
01011
01111
+8
0 01 11
+7
8
00011
+
3
8
3
10011
5
4
1 01 11
5
8
11011
1
8
11111
sign bit
exponent sign
exponent value
fraction value
LLNL-PRES-748385
9
§ NUMREP is a modular C++ framework for defining number systems
— Exponent coding scheme (unary, binary, Golomb-Rice, gamma, omega, …)
— Fraction map (linear, reciprocal, exponential, rational, quadratic, logarithmic, …)
• Conjugate fraction maps for sub- & superunitaries enable reciprocal closure
— Handling of under- & overflow
— Rounding rules (to nearest, toward {−∞, 0, +∞})
§ NUMREP Unifies IEEE, POSITS, Elias, URR, LNS, …, under a single schema
— Uses auxiliary arithmetic type to perform numerical computations (e.g. IEEE, MPFR)
— Uses operator overloading to mimic intrinsic floating types
— Supports 8, 16, 32, 64-bit types
Beyond POSITS and Elias: NUMREP templated framework
This talk focuses on NUMREP’s exponent coding schemes and fraction maps
LLNL-PRES-748385
10
Many useful NUMREPS are given by simple interpolation and
extrapolation rules (plus closure under negation)
+1 0 10 011 10 0
00000±11000
0
110
0
01
0
1
01
0
1
110
0 10 1
0111
0 01 1
0001
1001
1 01 1
1 10 1
1111
+fmax
fmax
+fmin
fmin
extrapolation on (fmax, ∞)
interpolation on (fmin, fmax)
reciprocation on (0, fmin)
r
e
c
I
p
r
negation o negation
c
a
t
I
o
n
LLNL-PRES-748385
11
§ Extrapolation rule: 1 ≤ fmax(p) < fmax(p + 1) < ∞
§ Reciprocation rule: fmin(p) = 1 / fmax(p)
Extrapolation gives next fmax between previous fmax and ±∞
when adding one more bit of precision
type fmax(p + 1) sequence
Unary 1 + fmax(p) 1, 2, 3, 4, 5, …
Elias γ 2 × fmax(p) 1, 2, 4, 8, 16, …
POSITS (base b) b × fmax(p) 1, b, b2, b3, b4, …
Elias δ fmax(p)2 1, 2, 4, 16, 256, …
Elias ω 2fmax(p) 1, 2, 4, 16, 65536, …
fmax(1) 0 1
±11
0
11
fm
ax(2)
0111
fmax(3)
01111
fmax(4)
LLNL-PRES-748385
12
§ For POSITS, Elias {γ, δ, ω}:
§ Examples
— g(2, 4) = 3 arithmetic mean
— g(4, 16) = 2g(2, 4) = 23 = 8 geometric mean
— g(16, 65536) = 2g(4, 16) = 22g(2,4)
= 223
= 28 = 256 “hypergeometric” mean (for ω only)
§ For LNS:
Interpolation rule generates new number g(a, b) between two
finite positive numbers a < b
𝑔 𝑎, 𝑏 = &
𝑎 + 𝑏
2
if	𝑏 ≤ 2𝑎
2-(/0 1,	/0 2)
otherwise
𝑔 𝑎, 𝑏 = 𝑎𝑏
LLNL-PRES-748385
13
§ Templated on generator that implements two rules
— Interpolation function, g(a, b)
— Extrapolation recurrence, ai+1 = f(ai) = g(ai, ∞)
§ Rules + bisection implement encoding, decoding, rounding, fraction map
— Recursively bisect (−∞, +∞) using applicable interpolation or extrapolation rule
• g(−∞, +∞) = 0
• g(0, ±∞) = ±1
— Exploit symmetry of negation, reciprocation
— Encode: bracket number and recursively test if less (0) or greater (1) than bisection point
— Decode: narrow interval toward lower (0) or upper (1) bound; return final lower bound
— Round: round number to lower bound if less than next bisection point, otherwise to upper
— Map: given by interpolation rule
§ NOTE: Not very performant, but great for experimentation & validation
— Our 14-page CoNGA paper implemented in 200 lines of C++ code (vs. ~1,000 lines for NUMREP)
FLEX (FLexible EXponent): NUMREP on a shoestring
LLNL-PRES-748385
14
FLEX’s POSIT implementation is 8 lines of code
// posit generator with m-bit exponent and base 2^(2^m)
template <int m = 0, typename real = flex_real>
class Posit {
public:
real operator()(real x) const { return real(base) * x; }
real operator()(real x, real y) const { return real(2) * x >= y ? (x + y) / real(2) : std::sqrt(x * y); }
protected:
static const int base = 1 << (1 << m);
};
LLNL-PRES-748385
15
Elias delta and the Kessel Run:
What is the distance to Proxima Centauri?
0 01111111 01001100111101100101011
0 10 01001100111101100101010111000
IEEE
ELIAS δ
error = 1.2e−08
error = 4.8e−10
1.3 parsecs
2 bits
0 10000001 00001111011111101001000
0 11100 00001111011111101001000100
IEEE
ELIAS δ
error = 5.6e−08
error = 9.0e−11
4.2 lightyears
5 bits
0 10110110 00011101001010100010101
0 1111111010111 000111010010101001
IEEE
ELIAS δ
error = 1.5e−08
error = 1.2e−06
4.0 × 1016 meters
13 bits
0 11111111 00000000000000000000000
0 11111111100101010 10101000110001
IEEE
ELIAS δ
error = ∞
error = 1.4e−05
2.4 × 1051 Planck lengths
17 bits
LLNL-PRES-748385
16
Variable-length exponents lead to tapered precision: 6-9 more
bits of precision than IEEE for numbers near one
0
4
8
12
16
20
24
28
32
-256 -192 -128 -64 0 64 128 192 256
precision
exponent
float binary(8) gamma posit(1) posit(2) delta omega
subnormals
29 bits
LLNL-PRES-748385
17
Tapered precision (close-up): tension between precision and
dynamic range
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-48 -44 -40 -36 -32 -28 -24 -20 -16 -12 -8 -4 0 4 8 12 16 20 24 28 32 36 40 44 48
precision
exponent
float binary(8) gamma posit(1) posit(2) delta omega
LLNL-PRES-748385
18
§ Universality property (POSITS, Elias {γ, δ, ω}, but not IEEE)
1. Each integer is represented by a prefix code
• No codeword is a prefix of any other codeword
• Codeword length is self-describing—remaining available bits represent fraction
2. Integer representation has monotonic length
• Larger integers have longer codewords
3. Codeword length of integer y is O(lg y)
• Codeword length is proportional to the number of significant bits
§ Asymptotic optimality property (Elias {δ, ω})
— Relative cost of coding exponent (leading-one position) vanishes in the limit
Elias delta and omega are asymptotically optimal universal
representations
LLNL-PRES-748385
19
Elias delta and omega are asymptotically optimal: exponent
overhead approaches zero in the limit
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0 1 2 4 8 16 32 64 128 256
relativecodingcost
lg(y)
binary(8) gamma posit(1) posit(2) delta omega
LLNL-PRES-748385
20
§ So far we have assumed that fraction (significand), f ∈ [0, 1), is mapped linearly
— y = (-1)s 2e (1 + f)
— We may replace (1 + f) with transfer function φ(f)
§ Fraction map, φ : [0, 1) ⟶ [1, 2), maps fraction bits, f, to value, φ(f)
— Subunitary map φ–: |y| < 1
— Superunitary map φ+: |y| ≥ 1
Fraction maps dictate how fraction bits map to values
map φ(f) notes
Linear 1 + f introduces implicit leading-one bit
Linear reciprocal 2 / (2 – f) conjugate with linear map ⇒ reciprocal closure
Exponential 2f LNS (logarithmic number system): y = 2e 2f = 2e+f
Linear rational (1 + p f) / (1 – q f) p = 2	− 1, q = (1 – p) / 2
LLNL-PRES-748385
21
1.0
1.5
2.0
0.0 0.5 1.0
valueφ(f)
fraction f
linear reciprocal exponential rational
§ Conjugacy: φ–(f) φ+(1 – f) = 2 0 ≤ f ≤ 1
— E.g. 2 / (2 – f) × (1 + (1 – f)) = 2
§ Cn continuity: φ(n)(1) / φ(n)(0) = 2
Conjugate pair (φ–, φ+) guarantees reciprocal closure
LLNL-PRES-748385
22
§ Arithmetic closure and relative error (vs. analytic solution) 𝑧 = 𝑥 + 𝑦
— Addition and multiplication of 16-bit operands
§ Addition of long floating-point sequences (vs. analytic solution) s = ∑ 𝑎A
	
A
— Using naïve and best-of-breed addition algorithms
§ Matrix inversion (vs. analytic solution) 𝐴 = 𝐻DE
— LU-based inversion of Hilbert and Vandermonde matrices
§ Symmetric matrix eigenvalues (vs. analytic solution) Λ = 𝑄𝐴𝑄H
— QR-based dense eigenvalue solver
§ Euler2D PDE solver (vs. IEEE quad precision) 𝜕J 𝑢 + ∇ M 𝑓 𝑢 = 0
— Complex physics involving interacting shock waves
We present extensive results that span a spectrum of simple
to increasingly complex numerical computations
LLNL-PRES-748385
23
Closure rate (C) and mean relative error (E) for 16-bit addition
suggest POSIT(0) (aka. Elias gamma) performs best
negative error no error positive error infinite/NaN
LLNL-PRES-748385
24
Optimal multiplicative closure of LNS does not imply superior
relative errors
negative error no error positive error infinite/NaN
LLNL-PRES-748385
25
Simple summation algorithms benefit from tapered precision:
Sum of Gaussian PDF and its first derivative
FLT_EPSILON
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
1E-01
sequential increasing insertion cascade Kahan
absoluteerror float posit(2) delta
LLNL-PRES-748385
26
New types excel in 64-bit Hilbert matrix inversion (Eigen3)
1E-18
1E-17
1E-16
1E-15
1E-14
1E-13
1E-12
1E-11
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
2 3 4 5 6 7 8 9 10
matrixinverseRMSerror
matrix rows
IEEE gamma posit(1) posit(2) delta omega
LLNL-PRES-748385
27
Elias outperforms IEEE by O(103) in 64-bit eigenvalue accuracy
1E-19
1E-18
1E-17
1E-16
1E-15
1E-14
1E-13
1E-12
1E-11
4 16 64 256 1024
eigenvalueRMSerror
matrix rows
IEEE gamma posit(1) posit(2) delta omega
LLNL-PRES-748385
28
Euler2D shock propagation illustrates benefits of new types
LLNL-PRES-748385
29
IEEE consistently is the least accurate floating-point
representation in numerical calculations (64-bit types)
1E-19
1E-18
1E-17
1E-16
1E-15
1E-14
1E-13
1E-12
1E-11
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
densityRMSerror
time
IEEE binary(11) LNS(11) gamma posit(1)
posit(2) posit(3) delta(0) delta(1) omega(3)
LLNL-PRES-748385
30
Nonlinear fraction maps sometimes improve accuracy
substantially (32-bit types)
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
densityRMSerror
time
IEEE posit(1) posreclin(1) poslinrec(1) posexp(1)
LLNL-PRES-748385
31
§ Many numerical computations involve continuous fields
— E.g. partial differential equations over ℝ2, ℝ3
— Nearby scalars on Cartesian grids have similar values
• In statistical speak, fields exhibit autocorrelation
• Such redundancy is not exploited by any of the representations discussed so far
Beyond IEEE and posit representations of ℝ:
Compressed representations of ℝn
0
1
2
3
0 64 128 192 256 320 384
functionvalue
grid point
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 8 16 24 32 40 48 56 64
storage
precision
entropy uncompressed
LLNL-PRES-748385
32
(64-bit) floating-point data does not compress well losslessly
1.11 1.07 1.05 1.04 1.03 1.02 1.01
[Lindstrom'06]
[Alted'09]
[Burtscher'07]
[Burtscher'16]
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
fpzip BLOSC FPC gzip szip SPDP bzip2
compressionratio(uncomp./comp.size)
LLNL-PRES-748385
33
§ Large improvements in compression are
possible by allowing even small errors
— Simulation often computes on meaningless bits
• Round-off, truncation, iteration, model errors abound
• Last few floating-point bits are effectively random noise
§ Still, lossy compression often makes scientists nervous
— Even though lossy data reduction is ubiquitous
• Decimation in space and/or time (e.g. store every 100 time steps)
• Averaging (hourly vs. daily vs. monthly averages)
• Truncation to single precision (e.g. for history files)
— State-of-the-art compressors support error tolerances
Lossy compression enables greater reduction, but is often met
with skepticism by scientists
0.27 bits/value
240x compression
LLNL-PRES-748385
34
Can lossy-compressed in-memory storage of numerical
simulation state be tolerated?
Simulation Outer Loop
Update elementsUpdate nodes
Update
elements
Compute time
constraints
Decompress
block
Element
kernel(s)
Compress
block
Our Approach
Decompress
all state
Update nodes
Update
elements
Compute time
constraints
Compress
all state
LLNL-PRES-748385
35
High-order Eulerian hydrodynamics
• QoI: Rayleigh-Taylor mixing layer thickness
• 10,000 time steps
• At 4x compression, relative error < 0.2%
Laser-plasma multi-physics
• QoI: backscattered laser energy
• At 4x compression, relative error < 0.1%
Lagrangian shock hydrodynamics
• QoI: radial shock position
• 25 state variables compressed over 2,100 time steps
• At 4x compression, relative error < 0.06%
Using lossy FPZIP to store simulation state compressed, we
have shown that 4x lossy compression can be tolerated
20 bits/value uncompressed16 bits/value
Lossy compression of state is viable, but streaming compression increases data movement—need inline compression
LLNL-PRES-748385
36
§ Inspired by ideas from h/w texture compression
— d-dimensional array divided into blocks of 4d values
— Each block is independently (de)compressed
• e.g. to a user-specified number of bits or quality
— Fixed-size blocks Þ random read/write access
— (De)compression is done inline, on demand
— Write-back cache of uncompressed blocks limits data loss
§ Compressed arrays via C++ operator overloading
— Can be dropped into existing code by changing type declarations
— double a[n] Û std::vector<double> a(n) Û zfp::array<double> a(n, precision)
ZFP is an inline compressor for floating-point arrays
LLNL-PRES-748385
37
ZFP’S C++ compressed arrays can replace STL vectors and C
arrays with minimal code changes
// example using STL vectors
std::vector<double> u(nx * ny, 0.0);
u[x0 + nx*y0] = 1;
for (double t = 0; t < tfinal; t += dt) {
std::vector<double> du(nx * ny, 0.0);
for (int y = 1; y < ny - 1; y++)
for (int x = 1; x < nx - 1; x++) {
double uxx = (u[(x-1)+nx*y] - 2*u[x+nx*y] + u[(x+1)+nx*y]) / dxx;
double uyy = (u[x+nx*(y-1)] - 2*u[x+nx*y] + u[x+nx*(y+1)]) / dyy;
du[x + nx*y] = k * dt * (uxx + uyy);
}
for (int i = 0; i < u.size(); i++)
u[i] += du[i];
}
// example using ZFP arrays
zfp::array2<double> u(nx, ny, bits_per_value);
u(x0, y0) = 1;
for (double t = 0; t < tfinal; t += dt) {
zfp::array2<double> du(nx, ny, bits_per_value);
for (int y = 1; y < ny - 1; y++)
for (int x = 1; x < nx - 1; x++) {
double uxx = (u(x-1, y) - 2*u(x, y) + u(x+1, y)) / dxx;
double uyy = (u(x, y-1) - 2*u(x, y) + u(x, y+1)) / dyy;
du(x, y) = k * dt * (uxx + uyy);
}
for (int i = 0; i < u.size(); i++)
u[i] += du[i];
}
required changes
optional changes for improved readability
LLNL-PRES-748385
38
The ZFP compressor is comprised of three distinct components
3
Raw floating-
point array
Block floating-
point transform
Orthogonal
block transform
Embedded
coding
Compressed
bit stream
§ Align values in a 4d block to a
common largest exponent
§ Transmit exponent verbatim
§ Lifted, separable transform using
integer adds and shifts
§ Similar to but faster and more
effective than JPEG DCT
§ Encode one bit plane at a time
from MSB using group testing
§ Each bit increases quality—can
truncate stream anywhere
LLNL-PRES-748385
39
6-bit ZFP gives one more digit of accuracy than 32-bit IEEE in
2nd, 3rd derivative computations (Laplacian and highlight lines)
IEEE half (16 bits) POSIT(1) (16 bits) IEEE float (32 bits)
ZFP (6 bits) IEEE double (64 bits) IEEE double (64 bits)
LLNL-PRES-748385
40
ZFP improves accuracy in finite difference computations using
less precision than IEEE
LLNL-PRES-748385
41
ZFP variable-rate arrays adapt storage spatially and allocate
bits to regions where they are most needed
16 bits/value8 bits/value 32 bits/value 64 bits/value
LLNL-PRES-748385
42
ZFP adaptive arrays improve accuracy in PDE solution over IEEE
by 6 orders of magnitude using less storage
LLNL-PRES-748385
43
ZFP adaptive arrays improve accuracy in PDE solution over IEEE
by 6 orders of magnitude using less storage
1E-15
1E-14
1E-13
1E-12
1E-11
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
RMSerror
time
IEEE (32-bit) posit1 (32-bit) zfp (32-bit) arc (28-bit)
LLNL-PRES-748385
44
§ Emerging alternatives to IEEE floats show promising improvements in accuracy
— Tapered precision allows for better balance between range and precision
— New types have many desirable properties
• No IEEE warts: subnormals, excess of NaNs, signed zeros, overflow, exponent size dichotomy, …
• Features: nesting, monotonicity, reciprocal closure, universality, asymptotic optimality, …
§ NUMREP framework allows for experimentation with new number types in apps
— Modular design supports plug-and-play of independent concepts
— Nonlinear maps are expensive but worth investigating
— POSITS are often among the best performing types; IEEE among the worst
§ NextGen scalar representations still leave considerable room for improvement
— ZFP fixed-rate compressed arrays consistently outperform IEEE and POSITS
— ZFP variable-rate arrays optimize bit distribution and and make very bit count
Conclusions: Novel scalar representations are driving research
into new vector/matrix/tensor representations
LLNL-PRES-748385
45
ZFP is available as open source on GitHub:
https://github.com/LLNL/zfp
Universal Coding of the Reals: Alternatives to IEEE Floating Point

More Related Content

What's hot

Reed Solomon encoder and decoder \ ريد سلمون
Reed Solomon encoder and decoder \ ريد سلمونReed Solomon encoder and decoder \ ريد سلمون
Reed Solomon encoder and decoder \ ريد سلمونMuhammed Abdulmahdi
 
The Mathematics of RSA Encryption
The Mathematics of RSA EncryptionThe Mathematics of RSA Encryption
The Mathematics of RSA EncryptionNathan F. Dunn
 
Computational Linguistics week 5
Computational Linguistics  week 5Computational Linguistics  week 5
Computational Linguistics week 5Mark Chang
 
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabSneheshDutta
 
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE ShivangiSingh241
 
AP PGECET Instrumentation 2016 question paper
AP PGECET Instrumentation 2016 question paperAP PGECET Instrumentation 2016 question paper
AP PGECET Instrumentation 2016 question paperEneutron
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroSiby Jose Plathottam
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Reed solomon Encoder and Decoder
Reed solomon Encoder and DecoderReed solomon Encoder and Decoder
Reed solomon Encoder and DecoderAmeer H Ali
 
A SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTS
A SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTSA SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTS
A SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTScsandit
 
Computer architecture
Computer architectureComputer architecture
Computer architecturevishnu973656
 
Csr2011 june15 12_00_davydow
Csr2011 june15 12_00_davydowCsr2011 june15 12_00_davydow
Csr2011 june15 12_00_davydowCSR2011
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learningsimaokasonse
 
Introduction to Elliptic Curve Cryptography
Introduction to Elliptic Curve CryptographyIntroduction to Elliptic Curve Cryptography
Introduction to Elliptic Curve CryptographyDavid Evans
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofArunanand Ta
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOWMark Chang
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsMark Chang
 

What's hot (20)

Reed Solomon encoder and decoder \ ريد سلمون
Reed Solomon encoder and decoder \ ريد سلمونReed Solomon encoder and decoder \ ريد سلمون
Reed Solomon encoder and decoder \ ريد سلمون
 
The Mathematics of RSA Encryption
The Mathematics of RSA EncryptionThe Mathematics of RSA Encryption
The Mathematics of RSA Encryption
 
Computational Linguistics week 5
Computational Linguistics  week 5Computational Linguistics  week 5
Computational Linguistics week 5
 
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlab
 
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
 
AP PGECET Instrumentation 2016 question paper
AP PGECET Instrumentation 2016 question paperAP PGECET Instrumentation 2016 question paper
AP PGECET Instrumentation 2016 question paper
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An Intro
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
IntrRSCode
IntrRSCodeIntrRSCode
IntrRSCode
 
Reed solomon Encoder and Decoder
Reed solomon Encoder and DecoderReed solomon Encoder and Decoder
Reed solomon Encoder and Decoder
 
A SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTS
A SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTSA SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTS
A SURVEY ON ELLIPTIC CURVE DIGITAL SIGNATURE ALGORITHM AND ITS VARIANTS
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Section9 stochastic
Section9 stochasticSection9 stochastic
Section9 stochastic
 
Csr2011 june15 12_00_davydow
Csr2011 june15 12_00_davydowCsr2011 june15 12_00_davydow
Csr2011 june15 12_00_davydow
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
Introduction to Elliptic Curve Cryptography
Introduction to Elliptic Curve CryptographyIntroduction to Elliptic Curve Cryptography
Introduction to Elliptic Curve Cryptography
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge Proof
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOW
 
Data linklayer
Data linklayerData linklayer
Data linklayer
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 

Similar to Universal Coding of the Reals: Alternatives to IEEE Floating Point

chapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptx
chapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptxchapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptx
chapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptxSurendra Loya
 
Chapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbersChapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbersMohammad Bashartullah
 
Unit 1 PDF.pptx
Unit 1 PDF.pptxUnit 1 PDF.pptx
Unit 1 PDF.pptxChandraV13
 
Digital Electronics Notes
Digital Electronics Notes Digital Electronics Notes
Digital Electronics Notes Srikrishna Thota
 
ADE UNIT-III (Digital Fundamentals).pptx
ADE UNIT-III (Digital Fundamentals).pptxADE UNIT-III (Digital Fundamentals).pptx
ADE UNIT-III (Digital Fundamentals).pptxKUMARS641064
 
Chapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.pptChapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.pptAparnaDas827261
 
f33-ft-computing-lec09-correct.ppt
f33-ft-computing-lec09-correct.pptf33-ft-computing-lec09-correct.ppt
f33-ft-computing-lec09-correct.pptMaddulaCharishma
 
Number_Systems_and_Boolean_Algebra.ppt
Number_Systems_and_Boolean_Algebra.pptNumber_Systems_and_Boolean_Algebra.ppt
Number_Systems_and_Boolean_Algebra.pptVEERA BOOPATHY E
 
digital logic circuits, digital component floting and fixed point
digital logic circuits, digital component floting and fixed pointdigital logic circuits, digital component floting and fixed point
digital logic circuits, digital component floting and fixed pointRai University
 
Chapter 1 digital design.pptx
Chapter 1 digital design.pptxChapter 1 digital design.pptx
Chapter 1 digital design.pptxAliaaTarek5
 
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...Rai University
 
02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...
02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...
02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...Deltares
 
Number system and codes
Number system and codesNumber system and codes
Number system and codesAbhiraj Bohra
 
Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3Umang Gupta
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 

Similar to Universal Coding of the Reals: Alternatives to IEEE Floating Point (20)

chapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptx
chapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptxchapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptx
chapter1digitalsystemsandbinarynumbers-151021072016-lva1-app6891.pptx
 
Digital logic design part1
Digital logic design part1Digital logic design part1
Digital logic design part1
 
Chapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbersChapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbers
 
Unit 1 PDF.pptx
Unit 1 PDF.pptxUnit 1 PDF.pptx
Unit 1 PDF.pptx
 
Digital Electronics Notes
Digital Electronics Notes Digital Electronics Notes
Digital Electronics Notes
 
ADE UNIT-III (Digital Fundamentals).pptx
ADE UNIT-III (Digital Fundamentals).pptxADE UNIT-III (Digital Fundamentals).pptx
ADE UNIT-III (Digital Fundamentals).pptx
 
Chapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.pptChapter 1 Digital Systems and Binary Numbers.ppt
Chapter 1 Digital Systems and Binary Numbers.ppt
 
f33-ft-computing-lec09-correct.ppt
f33-ft-computing-lec09-correct.pptf33-ft-computing-lec09-correct.ppt
f33-ft-computing-lec09-correct.ppt
 
Number_Systems_and_Boolean_Algebra.ppt
Number_Systems_and_Boolean_Algebra.pptNumber_Systems_and_Boolean_Algebra.ppt
Number_Systems_and_Boolean_Algebra.ppt
 
digital logic circuits, digital component floting and fixed point
digital logic circuits, digital component floting and fixed pointdigital logic circuits, digital component floting and fixed point
digital logic circuits, digital component floting and fixed point
 
Unit-1.pptx
Unit-1.pptxUnit-1.pptx
Unit-1.pptx
 
Chapter 1 digital design.pptx
Chapter 1 digital design.pptxChapter 1 digital design.pptx
Chapter 1 digital design.pptx
 
Class10
Class10Class10
Class10
 
06 floating point
06 floating point06 floating point
06 floating point
 
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
 
02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...
02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...
02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en...
 
Number system and codes
Number system and codesNumber system and codes
Number system and codes
 
DLD-Introduction.pptx
DLD-Introduction.pptxDLD-Introduction.pptx
DLD-Introduction.pptx
 
Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 

Recently uploaded (20)

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 

Universal Coding of the Reals: Alternatives to IEEE Floating Point

  • 1. LLNL-PRES-748385 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Universal Coding of the Reals: Alternatives to IEEE Floating Point Conference for Next Generation Arithmetic 2018 Peter Lindstrom Scott Lloyd, Jeff Hittinger March 28, 2018
  • 2. LLNL-PRES-748385 2 Data movement will dictate performance and power usage at exascale Mega Giga Tera Peta Exa 1995 2000 2005 2010 2015 2020 O(106) increase O(103) increase
  • 3. LLNL-PRES-748385 3 § IEEE assumes “one exponent size fits all” — Insufficient precision for some computations, e.g. catastrophic cancellation — Insufficient dynamic range for others, e.g. under- or overflow in exponentials — For these reasons, double precision is preferred in scientific computing § Many bits of a double are contaminated by error — Error from round-off, truncation, iteration, model, observation, … — Pushing error around and computing on it is wasteful § To significantly reduce data movement, we must make every bit count! — Next generation floating types must be re-designed from ground up Uneconomical floating types and excessive precision are contributing to data movement
  • 4. LLNL-PRES-748385 4 POSITS [Gustafson 2017] are a new float representation that improves on IEEE IEEE 754 floating point Posits: UNUM version 3.0 Bit Length {16, 32, 64, 128} fixed length but arbitrary Sign sign-magnitude (−0 ≠ +0) two’s complement (−0 = +0) Exponent fixed length (biased binary) variable length (Golomb-Rice) Fraction Map linear (φ(x) = 1 + x) linear (φ(x) = 1 + x) Infinities {−∞, +∞} ±∞ (single point at infinity) NaNs many (9 quadrillion) one Underflow gradual (subnormals) gradual (natural) Overflow 1 / FLT_TRUE_MIN = ∞ (oops!) never (exception: 1 / 0 = ±∞) Base {2, 10} 22m ∈ {2, 4, 16, 256, …}
  • 5. LLNL-PRES-748385 5 § Both represent real value as y = (-1)s 2e (1 + f) § IEEE: fixed-length encoding of exponent § POSITS: variable-length encoding of exponent — Alternative interpretation [Gustafson 2017]: leading one-bits encode “regime” — More fraction bits are available when exponent is small In terms of encoding scheme, IEEE and POSITS differ in one important way IEEE float IEEE double POSIT(7) e = D(1 x) = 1 + x 1-bit exponent “sign” 7-bit exponent magnitude x e = D(1 000 x) = 1 + x e = D(1 0 m) = x e = D(1 001 x) = 129 + x e = D(1 10 m) = 128 + x e = D(1 010 x) = 257 + x e = D(1 110 m) = 256 + x … … e = D(1 111 x) = 897 + x e = D(1 11111110 m) = 896 + x
  • 6. LLNL-PRES-748385 6 § Decompose integer z ≥ 1 as z = 2e + r — e ≥ 0 is exponent — r = z mod 2e is e-bit residual § Elias proposed three “prefix-free” variable-length codes for integers — Elias γ code is equivalent to POSIT(0) [Gustafson & Yonemoto 2017] — Elias δ code is equivalent to URR [Hamada 1983] Related work: Universal coding of positive integers [Elias 1975] Elias γ code Elias δ code Elias ω code Exponent unary γ ω (recursive) Code(1) 0 0 0 Code(2e + r) 1e 0 r 1 γ(e) r 1 ω(e) r Code(17 = 24 + 1) 1111 0 0001 1 11 0 00 0001 1 1 1 0 0 00 0001
  • 7. LLNL-PRES-748385 7 § From z ∈ ℕ (positive integers) to y ∈ ℝ (reals) — Rationals: 1 ≤ z < y < z + 1 encoded by appending fraction bits (as in IEEE, POSITS) — Subunitaries: 0 < y < 1 encoded via two’s complement exponent (as in POSITS) — Negatives: y < 0 encoded via two’s complement binary representation (as in POSITS) — 2-bit prefix tells where we are on the real line (as in POSITS) We extend Elias codes from positive integers to reals +1 +2 +3 +4+1 +2 +3 +40 +1 +2 +3 +44 3 2 1 0 +1 +2 +3 +4 00 011110 4 3 2 1 0 +1 +2 +3 +4
  • 8. LLNL-PRES-748385 8 POSITS, Elias recursively refine intervals (0, ±1), (±1, ±∞) Example: base-2 posits (aka. Elias γ) +1 0 111 1 000±110 +1 0 1011 10 0000±1100 0 11 + 2 0 01 + 1 2 2 1 01 1 2 1 11 +1 0 10 011 10 0 00000±11000 0 110 + 2 0 01 0 + 1 2 2 1 01 0 1 2 1 110 + 3 2 0 10 1 0111 +4 0 01 1 + 3 4 0001 + 1 4 4 1001 3 2 1 01 1 3 4 1 10 1 1 4 1111 +1 0 10 0011 10 00 000000±110000 0 110 0 + 2 0 01 00 + 1 2 2 1 01 00 1 2 1 110 0 + 3 2 0 10 10 01110 +4 0 01 10 + 3 4 00010 + 1 4 4 10010 3 2 1 01 10 3 4 1 10 10 1 4 11110 + 5 4 0 10 01 7 8 1 10 01 00001 + 1 8 8 10001 01101 +3 00101 + 5 8 7 4 10101 3 8 11101 + 7 4 01011 01111 +8 0 01 11 +7 8 00011 + 3 8 3 10011 5 4 1 01 11 5 8 11011 1 8 11111 sign bit exponent sign exponent value fraction value
  • 9. LLNL-PRES-748385 9 § NUMREP is a modular C++ framework for defining number systems — Exponent coding scheme (unary, binary, Golomb-Rice, gamma, omega, …) — Fraction map (linear, reciprocal, exponential, rational, quadratic, logarithmic, …) • Conjugate fraction maps for sub- & superunitaries enable reciprocal closure — Handling of under- & overflow — Rounding rules (to nearest, toward {−∞, 0, +∞}) § NUMREP Unifies IEEE, POSITS, Elias, URR, LNS, …, under a single schema — Uses auxiliary arithmetic type to perform numerical computations (e.g. IEEE, MPFR) — Uses operator overloading to mimic intrinsic floating types — Supports 8, 16, 32, 64-bit types Beyond POSITS and Elias: NUMREP templated framework This talk focuses on NUMREP’s exponent coding schemes and fraction maps
  • 10. LLNL-PRES-748385 10 Many useful NUMREPS are given by simple interpolation and extrapolation rules (plus closure under negation) +1 0 10 011 10 0 00000±11000 0 110 0 01 0 1 01 0 1 110 0 10 1 0111 0 01 1 0001 1001 1 01 1 1 10 1 1111 +fmax fmax +fmin fmin extrapolation on (fmax, ∞) interpolation on (fmin, fmax) reciprocation on (0, fmin) r e c I p r negation o negation c a t I o n
  • 11. LLNL-PRES-748385 11 § Extrapolation rule: 1 ≤ fmax(p) < fmax(p + 1) < ∞ § Reciprocation rule: fmin(p) = 1 / fmax(p) Extrapolation gives next fmax between previous fmax and ±∞ when adding one more bit of precision type fmax(p + 1) sequence Unary 1 + fmax(p) 1, 2, 3, 4, 5, … Elias γ 2 × fmax(p) 1, 2, 4, 8, 16, … POSITS (base b) b × fmax(p) 1, b, b2, b3, b4, … Elias δ fmax(p)2 1, 2, 4, 16, 256, … Elias ω 2fmax(p) 1, 2, 4, 16, 65536, … fmax(1) 0 1 ±11 0 11 fm ax(2) 0111 fmax(3) 01111 fmax(4)
  • 12. LLNL-PRES-748385 12 § For POSITS, Elias {γ, δ, ω}: § Examples — g(2, 4) = 3 arithmetic mean — g(4, 16) = 2g(2, 4) = 23 = 8 geometric mean — g(16, 65536) = 2g(4, 16) = 22g(2,4) = 223 = 28 = 256 “hypergeometric” mean (for ω only) § For LNS: Interpolation rule generates new number g(a, b) between two finite positive numbers a < b 𝑔 𝑎, 𝑏 = & 𝑎 + 𝑏 2 if 𝑏 ≤ 2𝑎 2-(/0 1, /0 2) otherwise 𝑔 𝑎, 𝑏 = 𝑎𝑏
  • 13. LLNL-PRES-748385 13 § Templated on generator that implements two rules — Interpolation function, g(a, b) — Extrapolation recurrence, ai+1 = f(ai) = g(ai, ∞) § Rules + bisection implement encoding, decoding, rounding, fraction map — Recursively bisect (−∞, +∞) using applicable interpolation or extrapolation rule • g(−∞, +∞) = 0 • g(0, ±∞) = ±1 — Exploit symmetry of negation, reciprocation — Encode: bracket number and recursively test if less (0) or greater (1) than bisection point — Decode: narrow interval toward lower (0) or upper (1) bound; return final lower bound — Round: round number to lower bound if less than next bisection point, otherwise to upper — Map: given by interpolation rule § NOTE: Not very performant, but great for experimentation & validation — Our 14-page CoNGA paper implemented in 200 lines of C++ code (vs. ~1,000 lines for NUMREP) FLEX (FLexible EXponent): NUMREP on a shoestring
  • 14. LLNL-PRES-748385 14 FLEX’s POSIT implementation is 8 lines of code // posit generator with m-bit exponent and base 2^(2^m) template <int m = 0, typename real = flex_real> class Posit { public: real operator()(real x) const { return real(base) * x; } real operator()(real x, real y) const { return real(2) * x >= y ? (x + y) / real(2) : std::sqrt(x * y); } protected: static const int base = 1 << (1 << m); };
  • 15. LLNL-PRES-748385 15 Elias delta and the Kessel Run: What is the distance to Proxima Centauri? 0 01111111 01001100111101100101011 0 10 01001100111101100101010111000 IEEE ELIAS δ error = 1.2e−08 error = 4.8e−10 1.3 parsecs 2 bits 0 10000001 00001111011111101001000 0 11100 00001111011111101001000100 IEEE ELIAS δ error = 5.6e−08 error = 9.0e−11 4.2 lightyears 5 bits 0 10110110 00011101001010100010101 0 1111111010111 000111010010101001 IEEE ELIAS δ error = 1.5e−08 error = 1.2e−06 4.0 × 1016 meters 13 bits 0 11111111 00000000000000000000000 0 11111111100101010 10101000110001 IEEE ELIAS δ error = ∞ error = 1.4e−05 2.4 × 1051 Planck lengths 17 bits
  • 16. LLNL-PRES-748385 16 Variable-length exponents lead to tapered precision: 6-9 more bits of precision than IEEE for numbers near one 0 4 8 12 16 20 24 28 32 -256 -192 -128 -64 0 64 128 192 256 precision exponent float binary(8) gamma posit(1) posit(2) delta omega subnormals 29 bits
  • 17. LLNL-PRES-748385 17 Tapered precision (close-up): tension between precision and dynamic range 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 -48 -44 -40 -36 -32 -28 -24 -20 -16 -12 -8 -4 0 4 8 12 16 20 24 28 32 36 40 44 48 precision exponent float binary(8) gamma posit(1) posit(2) delta omega
  • 18. LLNL-PRES-748385 18 § Universality property (POSITS, Elias {γ, δ, ω}, but not IEEE) 1. Each integer is represented by a prefix code • No codeword is a prefix of any other codeword • Codeword length is self-describing—remaining available bits represent fraction 2. Integer representation has monotonic length • Larger integers have longer codewords 3. Codeword length of integer y is O(lg y) • Codeword length is proportional to the number of significant bits § Asymptotic optimality property (Elias {δ, ω}) — Relative cost of coding exponent (leading-one position) vanishes in the limit Elias delta and omega are asymptotically optimal universal representations
  • 19. LLNL-PRES-748385 19 Elias delta and omega are asymptotically optimal: exponent overhead approaches zero in the limit 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0 1 2 4 8 16 32 64 128 256 relativecodingcost lg(y) binary(8) gamma posit(1) posit(2) delta omega
  • 20. LLNL-PRES-748385 20 § So far we have assumed that fraction (significand), f ∈ [0, 1), is mapped linearly — y = (-1)s 2e (1 + f) — We may replace (1 + f) with transfer function φ(f) § Fraction map, φ : [0, 1) ⟶ [1, 2), maps fraction bits, f, to value, φ(f) — Subunitary map φ–: |y| < 1 — Superunitary map φ+: |y| ≥ 1 Fraction maps dictate how fraction bits map to values map φ(f) notes Linear 1 + f introduces implicit leading-one bit Linear reciprocal 2 / (2 – f) conjugate with linear map ⇒ reciprocal closure Exponential 2f LNS (logarithmic number system): y = 2e 2f = 2e+f Linear rational (1 + p f) / (1 – q f) p = 2 − 1, q = (1 – p) / 2
  • 21. LLNL-PRES-748385 21 1.0 1.5 2.0 0.0 0.5 1.0 valueφ(f) fraction f linear reciprocal exponential rational § Conjugacy: φ–(f) φ+(1 – f) = 2 0 ≤ f ≤ 1 — E.g. 2 / (2 – f) × (1 + (1 – f)) = 2 § Cn continuity: φ(n)(1) / φ(n)(0) = 2 Conjugate pair (φ–, φ+) guarantees reciprocal closure
  • 22. LLNL-PRES-748385 22 § Arithmetic closure and relative error (vs. analytic solution) 𝑧 = 𝑥 + 𝑦 — Addition and multiplication of 16-bit operands § Addition of long floating-point sequences (vs. analytic solution) s = ∑ 𝑎A A — Using naïve and best-of-breed addition algorithms § Matrix inversion (vs. analytic solution) 𝐴 = 𝐻DE — LU-based inversion of Hilbert and Vandermonde matrices § Symmetric matrix eigenvalues (vs. analytic solution) Λ = 𝑄𝐴𝑄H — QR-based dense eigenvalue solver § Euler2D PDE solver (vs. IEEE quad precision) 𝜕J 𝑢 + ∇ M 𝑓 𝑢 = 0 — Complex physics involving interacting shock waves We present extensive results that span a spectrum of simple to increasingly complex numerical computations
  • 23. LLNL-PRES-748385 23 Closure rate (C) and mean relative error (E) for 16-bit addition suggest POSIT(0) (aka. Elias gamma) performs best negative error no error positive error infinite/NaN
  • 24. LLNL-PRES-748385 24 Optimal multiplicative closure of LNS does not imply superior relative errors negative error no error positive error infinite/NaN
  • 25. LLNL-PRES-748385 25 Simple summation algorithms benefit from tapered precision: Sum of Gaussian PDF and its first derivative FLT_EPSILON 1E-09 1E-08 1E-07 1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 sequential increasing insertion cascade Kahan absoluteerror float posit(2) delta
  • 26. LLNL-PRES-748385 26 New types excel in 64-bit Hilbert matrix inversion (Eigen3) 1E-18 1E-17 1E-16 1E-15 1E-14 1E-13 1E-12 1E-11 1E-10 1E-09 1E-08 1E-07 1E-06 1E-05 1E-04 2 3 4 5 6 7 8 9 10 matrixinverseRMSerror matrix rows IEEE gamma posit(1) posit(2) delta omega
  • 27. LLNL-PRES-748385 27 Elias outperforms IEEE by O(103) in 64-bit eigenvalue accuracy 1E-19 1E-18 1E-17 1E-16 1E-15 1E-14 1E-13 1E-12 1E-11 4 16 64 256 1024 eigenvalueRMSerror matrix rows IEEE gamma posit(1) posit(2) delta omega
  • 28. LLNL-PRES-748385 28 Euler2D shock propagation illustrates benefits of new types
  • 29. LLNL-PRES-748385 29 IEEE consistently is the least accurate floating-point representation in numerical calculations (64-bit types) 1E-19 1E-18 1E-17 1E-16 1E-15 1E-14 1E-13 1E-12 1E-11 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 densityRMSerror time IEEE binary(11) LNS(11) gamma posit(1) posit(2) posit(3) delta(0) delta(1) omega(3)
  • 30. LLNL-PRES-748385 30 Nonlinear fraction maps sometimes improve accuracy substantially (32-bit types) 1E-09 1E-08 1E-07 1E-06 1E-05 1E-04 1E-03 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 densityRMSerror time IEEE posit(1) posreclin(1) poslinrec(1) posexp(1)
  • 31. LLNL-PRES-748385 31 § Many numerical computations involve continuous fields — E.g. partial differential equations over ℝ2, ℝ3 — Nearby scalars on Cartesian grids have similar values • In statistical speak, fields exhibit autocorrelation • Such redundancy is not exploited by any of the representations discussed so far Beyond IEEE and posit representations of ℝ: Compressed representations of ℝn 0 1 2 3 0 64 128 192 256 320 384 functionvalue grid point 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 8 16 24 32 40 48 56 64 storage precision entropy uncompressed
  • 32. LLNL-PRES-748385 32 (64-bit) floating-point data does not compress well losslessly 1.11 1.07 1.05 1.04 1.03 1.02 1.01 [Lindstrom'06] [Alted'09] [Burtscher'07] [Burtscher'16] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 fpzip BLOSC FPC gzip szip SPDP bzip2 compressionratio(uncomp./comp.size)
  • 33. LLNL-PRES-748385 33 § Large improvements in compression are possible by allowing even small errors — Simulation often computes on meaningless bits • Round-off, truncation, iteration, model errors abound • Last few floating-point bits are effectively random noise § Still, lossy compression often makes scientists nervous — Even though lossy data reduction is ubiquitous • Decimation in space and/or time (e.g. store every 100 time steps) • Averaging (hourly vs. daily vs. monthly averages) • Truncation to single precision (e.g. for history files) — State-of-the-art compressors support error tolerances Lossy compression enables greater reduction, but is often met with skepticism by scientists 0.27 bits/value 240x compression
  • 34. LLNL-PRES-748385 34 Can lossy-compressed in-memory storage of numerical simulation state be tolerated? Simulation Outer Loop Update elementsUpdate nodes Update elements Compute time constraints Decompress block Element kernel(s) Compress block Our Approach Decompress all state Update nodes Update elements Compute time constraints Compress all state
  • 35. LLNL-PRES-748385 35 High-order Eulerian hydrodynamics • QoI: Rayleigh-Taylor mixing layer thickness • 10,000 time steps • At 4x compression, relative error < 0.2% Laser-plasma multi-physics • QoI: backscattered laser energy • At 4x compression, relative error < 0.1% Lagrangian shock hydrodynamics • QoI: radial shock position • 25 state variables compressed over 2,100 time steps • At 4x compression, relative error < 0.06% Using lossy FPZIP to store simulation state compressed, we have shown that 4x lossy compression can be tolerated 20 bits/value uncompressed16 bits/value Lossy compression of state is viable, but streaming compression increases data movement—need inline compression
  • 36. LLNL-PRES-748385 36 § Inspired by ideas from h/w texture compression — d-dimensional array divided into blocks of 4d values — Each block is independently (de)compressed • e.g. to a user-specified number of bits or quality — Fixed-size blocks Þ random read/write access — (De)compression is done inline, on demand — Write-back cache of uncompressed blocks limits data loss § Compressed arrays via C++ operator overloading — Can be dropped into existing code by changing type declarations — double a[n] Û std::vector<double> a(n) Û zfp::array<double> a(n, precision) ZFP is an inline compressor for floating-point arrays
  • 37. LLNL-PRES-748385 37 ZFP’S C++ compressed arrays can replace STL vectors and C arrays with minimal code changes // example using STL vectors std::vector<double> u(nx * ny, 0.0); u[x0 + nx*y0] = 1; for (double t = 0; t < tfinal; t += dt) { std::vector<double> du(nx * ny, 0.0); for (int y = 1; y < ny - 1; y++) for (int x = 1; x < nx - 1; x++) { double uxx = (u[(x-1)+nx*y] - 2*u[x+nx*y] + u[(x+1)+nx*y]) / dxx; double uyy = (u[x+nx*(y-1)] - 2*u[x+nx*y] + u[x+nx*(y+1)]) / dyy; du[x + nx*y] = k * dt * (uxx + uyy); } for (int i = 0; i < u.size(); i++) u[i] += du[i]; } // example using ZFP arrays zfp::array2<double> u(nx, ny, bits_per_value); u(x0, y0) = 1; for (double t = 0; t < tfinal; t += dt) { zfp::array2<double> du(nx, ny, bits_per_value); for (int y = 1; y < ny - 1; y++) for (int x = 1; x < nx - 1; x++) { double uxx = (u(x-1, y) - 2*u(x, y) + u(x+1, y)) / dxx; double uyy = (u(x, y-1) - 2*u(x, y) + u(x, y+1)) / dyy; du(x, y) = k * dt * (uxx + uyy); } for (int i = 0; i < u.size(); i++) u[i] += du[i]; } required changes optional changes for improved readability
  • 38. LLNL-PRES-748385 38 The ZFP compressor is comprised of three distinct components 3 Raw floating- point array Block floating- point transform Orthogonal block transform Embedded coding Compressed bit stream § Align values in a 4d block to a common largest exponent § Transmit exponent verbatim § Lifted, separable transform using integer adds and shifts § Similar to but faster and more effective than JPEG DCT § Encode one bit plane at a time from MSB using group testing § Each bit increases quality—can truncate stream anywhere
  • 39. LLNL-PRES-748385 39 6-bit ZFP gives one more digit of accuracy than 32-bit IEEE in 2nd, 3rd derivative computations (Laplacian and highlight lines) IEEE half (16 bits) POSIT(1) (16 bits) IEEE float (32 bits) ZFP (6 bits) IEEE double (64 bits) IEEE double (64 bits)
  • 40. LLNL-PRES-748385 40 ZFP improves accuracy in finite difference computations using less precision than IEEE
  • 41. LLNL-PRES-748385 41 ZFP variable-rate arrays adapt storage spatially and allocate bits to regions where they are most needed 16 bits/value8 bits/value 32 bits/value 64 bits/value
  • 42. LLNL-PRES-748385 42 ZFP adaptive arrays improve accuracy in PDE solution over IEEE by 6 orders of magnitude using less storage
  • 43. LLNL-PRES-748385 43 ZFP adaptive arrays improve accuracy in PDE solution over IEEE by 6 orders of magnitude using less storage 1E-15 1E-14 1E-13 1E-12 1E-11 1E-10 1E-09 1E-08 1E-07 1E-06 1E-05 1E-04 1E-03 1E-02 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 RMSerror time IEEE (32-bit) posit1 (32-bit) zfp (32-bit) arc (28-bit)
  • 44. LLNL-PRES-748385 44 § Emerging alternatives to IEEE floats show promising improvements in accuracy — Tapered precision allows for better balance between range and precision — New types have many desirable properties • No IEEE warts: subnormals, excess of NaNs, signed zeros, overflow, exponent size dichotomy, … • Features: nesting, monotonicity, reciprocal closure, universality, asymptotic optimality, … § NUMREP framework allows for experimentation with new number types in apps — Modular design supports plug-and-play of independent concepts — Nonlinear maps are expensive but worth investigating — POSITS are often among the best performing types; IEEE among the worst § NextGen scalar representations still leave considerable room for improvement — ZFP fixed-rate compressed arrays consistently outperform IEEE and POSITS — ZFP variable-rate arrays optimize bit distribution and and make very bit count Conclusions: Novel scalar representations are driving research into new vector/matrix/tensor representations
  • 45. LLNL-PRES-748385 45 ZFP is available as open source on GitHub: https://github.com/LLNL/zfp