Upcoming SlideShare
×

# Class03

393 views
200 views

Published on

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
393
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
5
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Class03

1. 1. 15-213 “The course that gives CMU its Zip!” Floating Point Sept 6, 2006 Topics  IEEE Floating Point Standard  Rounding  Floating Point Operations  Mathematical propertiesclass03.ppt 15-213, F’06
2. 2. Floating Point Puzzles  For each of the following C expressions, either: Argue that it is true for all argument values Explain why not true • x == (int)(float) x • x == (int)(double) x int x = …; • f == (float)(double) f float f = …; • d == (float) d double d = …; • f == -(-f); • 2/3 == 2/3.0 Assume neither d nor f is NaN • d < 0.0 ⇒ ((d*2) < 0.0) • d > f ⇒ -f > -d • d * d >= 0.0 • (d+f)-d == f–2– 15-213, F’06
3. 3. IEEE Floating PointIEEE Standard 754  Established in 1985 as uniform standard for floating point arithmetic  Before that, many idiosyncratic formats  Supported by all major CPUsDriven by Numerical Concerns  Nice standards for rounding, overflow, underflow  Hard to make go fast  Numerical analysts predominated over hardware types in defining standard–3– 15-213, F’06
4. 4. Fractional Binary Numbers 2i 2i–1 4 ••• 2 1 bi bi–1 ••• b2 b1 b0 . b–1 b–2 b–3 ••• b–j 1/2 1/4 ••• 1/8 2–jRepresentation  Bits to right of “binary point” represent fractional powers of 2 i  Represents rational number: ∑ bk ⋅2 k k =− j–4– 15-213, F’06
5. 5. Frac. Binary Number ExamplesValue Representation 5-3/4 101.112 2-7/8 10.1112 63/64 0.1111112Observations  Divide by 2 by shifting right  Multiply by 2 by shifting left  Numbers of form 0.111111…2 just below 1.0 1/2 + 1/4 + 1/8 + … + 1/2 i + … → 1.0 Use notation 1.0 – ε–5– 15-213, F’06
6. 6. Representable NumbersLimitation  Can only exactly represent numbers of the form x/2k  Other numbers have repeating bit representationsValue Representation 1/3 0.0101010101[01]…2 1/5 0.001100110011[0011]…2 1/10 0.0001100110011[0011]…2–6– 15-213, F’06
7. 7. Floating Point RepresentationNumerical Form  –1s M 2E Sign bit s determines whether number is negative or positive Significand M normally a fractional value in range [1.0,2.0). Exponent E weights value by power of twoEncoding s exp frac  MSB is sign bit  exp field encodes E  frac field encodes M–7– 15-213, F’06
8. 8. Floating Point PrecisionsEncoding s exp frac  MSB is sign bit  exp field encodes E  frac field encodes MSizes  Single precision: 8 exp bits, 23 frac bits 32 bits total  Double precision: 11 exp bits, 52 frac bits 64 bits total  Extended precision: 15 exp bits, 63 frac bits Only found in Intel-compatible machines Stored in 80 bits » 1 bit wasted–8– 15-213, F’06
9. 9. “Normalized” Numeric ValuesCondition  exp ≠ 000…0 and exp ≠ 111…1Exponent coded as biased value E = Exp – Bias Exp : unsigned value denoted by exp Bias : Bias value » Single precision: 127 ( Exp: 1…254, E: -126…127) » Double precision: 1023 ( Exp: 1…2046, E: -1022…1023) » in general: Bias = 2 e-1 - 1, where e is number of exponent bitsSignificand coded with implied leading 1 M = 1.xxx…x2  xxx…x: bits of frac Minimum when 000…0 (M = 1.0) Maximum when 111…1 (M = 2.0 – ε) Get extra leading bit for “free”–9– 15-213, F’06
10. 10. Normalized Encoding Example Value Float F = 15213.0;  1521310 = 111011011011012 = 1.11011011011012 X 213 Significand M = 1.11011011011012 frac = 110110110110100000000002 Exponent E = 13 Bias = 127 Exp = 140 = 100011002 Floating Point Representation: Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 140: 100 0110 0– 10 – 15213: 1110 1101 1011 01 15-213, F’06
11. 11. Denormalized Values Condition  exp = 000…0 Value  Exponent value E = –Bias + 1  Significand value M = 0.xxx…x2  xxx…x: bits of frac Cases  exp = 000…0, frac = 000…0  Represents value 0  Note that have distinct values +0 and –0  exp = 000…0, frac ≠ 000…0  Numbers very close to 0.0  Lose precision as get smaller– 11 –  “Gradual underflow” 15-213, F’06
12. 12. Special Values Condition  exp = 111…1 Cases  exp = 111…1, frac = 000…0  Represents value ∞ (infinity)  Operation that overflows  Both positive and negative  E.g., 1.0/0.0 = −1.0/−0.0 = + ∞, 1.0/−0.0 = − ∞  exp = 111…1, frac ≠ 000…0  Not-a-Number (NaN)  Represents case when no numeric value can be determined  E.g., sqrt(–1), ∞ − ∞, ∞ ∗ 0– 12 – 15-213, F’06
13. 13. Summary of Floating Point Real Number Encodings −∞ -Normalized -Denorm +Denorm +Normalized +∞ NaN NaN −0 +0– 13 – 15-213, F’06
14. 14. Tiny Floating Point Example 8-bit Floating Point Representation  the sign bit is in the most significant bit.  the next four bits are the exponent, with a bias of 7.  the last three bits are the frac  Same General Form as IEEE Format  normalized, denormalized  representation of 0, NaN, infinity 7 6 3 2 0 s exp frac– 14 – 15-213, F’06
15. 15. Values Related to the Exponent Exp exp E 2E 0 0000 -6 1/64 (denorms) 1 0001 -6 1/64 2 0010 -5 1/32 3 0011 -4 1/16 4 0100 -3 1/8 5 0101 -2 1/4 6 0110 -1 1/2 7 0111 0 1 8 1000 +1 2 9 1001 +2 4 10 1010 +3 8 11 1011 +4 16 12 1100 +5 32 13 1101 +6 64 14 1110 +7 128 15 1111 n/a (inf, NaN)– 15 – 15-213, F’06
16. 16. Dynamic Range s exp frac E Value 0 0000 000 -6 0 0 0000 001 -6 1/8*1/64 = 1/512 closest to zeroDenormalized 0 0000 010 -6 2/8*1/64 = 2/512 numbers … 0 0000 110 -6 6/8*1/64 = 6/512 0 0000 111 -6 7/8*1/64 = 7/512 largest denorm 0 0001 000 -6 8/8*1/64 = 8/512 smallest norm 0 0001 001 -6 9/8*1/64 = 9/512 … 0 0110 110 -1 14/8*1/2 = 14/16 0 0110 111 -1 15/8*1/2 = 15/16 closest to 1 below Normalized 0 0111 000 0 8/8*1 = 1 numbers closest to 1 above 0 0111 001 0 9/8*1 = 9/8 0 0111 010 0 10/8*1 = 10/8 … 0 1110 110 7 14/8*128 = 224 0 1110 111 7 15/8*128 = 240 largest norm 0 1111 000 n/a inf – 16 – 15-213, F’06
17. 17. Distribution of Values 6-bit IEEE-like format  e = 3 exponent bits  f = 2 fraction bits  Bias is 3 Notice how the distribution gets denser toward zero. -15 -10 -5 0 5 10 15 Denormalized Normalized Infinity– 17 – 15-213, F’06
18. 18. Distribution of Values (close-up view) 6-bit IEEE-like format  e = 3 exponent bits  f = 2 fraction bits  Bias is 3 -1 -0.5 0 0.5 1 Denormalized Normalized Infinity– 18 – 15-213, F’06
19. 19. Interesting Numbers Description exp frac Numeric Value Zero 00…00 00…00 0.0 Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}  Single ≈ 1.4 X 10–45  Double ≈ 4.9 X 10–324 Largest Denormalized 00…00 11…11 (1.0 – ε) X 2– {126,1022}  Single ≈ 1.18 X 10–38  Double ≈ 2.2 X 10–308 Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}  Just larger than largest denormalized One 01…11 00…00 1.0 Largest Normalized 11…10 11…11 (2.0 – ε) X 2{127,1023}  Single ≈ 3.4 X 1038  Double ≈ 1.8 X 10308– 19 – 15-213, F’06
20. 20. Special Properties of Encoding FP Zero Same as Integer Zero  All bits = 0 Can (Almost) Use Unsigned Integer Comparison  Must first compare sign bits  Must consider -0 = 0  NaNs problematic  Will be greater than any other values  What should comparison yield?  Otherwise OK  Denorm vs. normalized  Normalized vs. infinity– 20 – 15-213, F’06
21. 21. Floating Point Operations Conceptual View  First compute exact result  Make it fit into desired precision Possibly overflow if exponent too large Possibly round to fit into frac Rounding Modes (illustrate with \$ rounding) \$1.40 \$1.60 \$1.50 \$2.50 –\$1.50  Zero \$1 \$1 \$1 \$2 –\$1  Round down (-∞) \$1 \$1 \$1 \$2 –\$2  Round up (+∞) \$2 \$2 \$2 \$3 –\$1  Nearest Even (default) \$1 \$2 \$2 \$2–\$2Note:1. Round down: rounded result is close to but no greater than true result.2. Round up: rounded result is close to but no less than true result.– 21 – 15-213, F’06
22. 22. Closer Look at Round-To- Even Default Rounding M ode  Hard to get any other kind without dropping into assembly  All others are statistically biased Sum of set of positive numbers will consistently be over- or under- estimated Applying to Other Decimal Places / Bit Positions  When exactly halfway between two possible values Round so that least significant digit is even  E.g., round to nearest hundredth 1.2349999 1.23 (Less than half way) 1.2350001 1.24 (Greater than half way) 1.2350000 1.24 (Half way—round up) 1.2450000 1.24 (Half way—round down)– 22 – 15-213, F’06
23. 23. Rounding Binary Numbers Binary Fractional Numbers  “Even” when least significant bit is 0  Half way when bits to right of rounding position = 100… 2 Examples  Round to nearest 1/4 (2 bits right of binary point) Value Binary Rounded Action Rounded Value 2 3/32 10.000112 10.002 (<1/2—down) 2 2 3/16 10.001102 10.012 (>1/2—up) 2 1/4 2 7/8 10.111002 11.002 (1/2—up) 3 2 5/8 10.101002 10.102 (1/2—down) 2 1/2– 23 – 15-213, F’06
24. 24. FP Multiplication Operands * (–1)s1 M1 2E1 (–1)s2 M2 2E2 Exact Result (–1)s M 2E  Sign s: s1 ^ s2  Significand M: M1 * M2  Exponent E: E1 + E2 Fixing  If M ≥ 2, shift M right, increment E  If E out of range, overflow  Round M to fit frac precision Implementation  Biggest chore is multiplying significands– 24 – 15-213, F’06
25. 25. FP Addition Operands (–1)s1 M1 2E1 E1–E2 (–1)s2 M2 2E2 (–1)s1 M1  Assume E1 > E2 + (–1)s2 M2 Exact Result (–1)s M 2E (–1)s M  Sign s, significand M:  Result of signed align & add  Exponent E: E1 Fixing  If M ≥ 2, shift M right, increment E  if M < 1, shift M left k positions, decrement E by k  Overflow if E out of range– 25 –  Round M to fit frac precision 15-213, F’06
26. 26. Mathematical Properties of FPAdd Compare to those of Abelian Group  Closed under addition? YES But may generate infinity or NaN  Commutative? YES  Associative? NO Overflow and inexactness of rounding  0 is additive identity? YES  Every element has additive inverse ALMOST Except for infinities & NaNs Monotonicity  a ≥ b ⇒ a+c ≥ b+c? ALMOST Except for infinities & NaNs– 26 – 15-213, F’06
27. 27. Math. Properties of FP Mult Compare to Commutative Ring  Closed under multiplication? YES But may generate infinity or NaN  Multiplication Commutative? YES  Multiplication is Associative? NO Possibility of overflow, inexactness of rounding  1 is multiplicative identity? YES  Multiplication distributes over addition? NO Possibility of overflow, inexactness of rounding Monotonicity  a ≥ b & c ≥ 0 ⇒ a *c ≥ b *c? ALMOST Except for infinities & NaNs– 27 – 15-213, F’06
28. 28. Creating Floating Point Number Steps 7 6 3 2 0  Normalize to have leading 1s exp frac  Round to fit within fraction  Postnormalize to deal with effects of rounding Case Study  Convert 8-bit unsigned numbers to tiny floating point format  Example Numbers 128 10000000 15 00001101 33 00010001 35 00010011 138 10001010– 28 – 63 00111111 15-213, F’06
29. 29. Normalize 7 6 3 2 0 s exp frac Requirement  Set binary point so that numbers of form 1.xxxxx  Adjust all to have leading one  Decrement exponent as shift left Value Binary Fraction Exponent 128 10000000 1.0000000 7 15 00001101 1.1010000 3 17 00010001 1.0001000 5 19 00010011 1.0011000 5 138 10001010 1.0001010 7 63 00111111 1.1111100 5– 29 – 15-213, F’06
30. 30. Rounding 1.BBGRXXX Guard bit: LSB of result Sticky bit: OR of remaining bitsRound bit: 1 st bit removed Round up conditions  Round = 1, Sticky = 1  > 0.5  Guard = 1, Round = 1, Sticky = 0  Round to even Value Fraction GRS Incr? Rounded 128 1.0000000 000 N 1.000 15 1.1010000 100 N 1.101 17 1.0001000 010 N 1.000 19 1.0011000 110 Y 1.010 138 1.0001010 111 Y 1.001 63 1.1111100 111 Y 10.000 – 30 – 15-213, F’06
31. 31. Postnormalize Issue  Rounding may have caused overflow  Handle by shifting right once & incrementing exponent Value Rounded Exp Adjusted Result 128 1.000 7 128 15 1.101 3 15 17 1.000 4 16 19 1.010 4 20 138 1.001 7 134 63 10.000 5 1.000/6 64– 31 – 15-213, F’06
32. 32. Floating Point in C C Guarantees Two Levels float single precision double double precision Conversions  Casting between int, float, and double changes numeric values  Double or float to int  Truncates fractional part  Like rounding toward zero  Not defined when out of range or NaN » Generally sets to TMin  int to double  Exact conversion, as long as int has ≤ 53 bit word size  int to float  Will round according to rounding mode– 32 – 15-213, F’06
33. 33. Curious Excel Behavior Number Subtract 16 Subtract .3 Subtract .01 Default Format 16.31 0.31 0.01 -1.2681E-15 Currency Format \$16.31 \$0.31 \$0.01 (\$0.00)  Spreadsheets use floating point for all computations  Some imprecision for decimal arithmetic  Can yield nonintuitive results to an accountant!– 33 – 15-213, F’06
34. 34. Summary IEEE Floating Point Has Clear M athematical Properties  Represents numbers of form M X 2 E  Can reason about operations independent of implementation  As if computed with perfect precision and then rounded  Not the same as real arithmetic  Violates associativity/distributivity  Makes life difficult for compilers & serious numerical applications programmers– 34 – 15-213, F’06