SlideShare a Scribd company logo
1 of 31
Floating Point Arithmetic
February 15, 2001
Topics
• IEEE Floating Point Standard
• Rounding
• Floating Point Operations
• Mathematical properties
• IA32 floating point
class10.ppt
15-213
“The course that gives CMU its Zip!”
CS 213 S’01
– 2 –
class10.ppt
Floating Point Puzzles
• For each of the following C expressions, either:
– Argue that it is true for all argument values
– Explain why not true
• x == (int)(float) x
• x == (int)(double) x
• f == (float)(double) f
• d == (float) d
• f == -(-f);
• 2/3 == 2/3.0
• d < 0.0  ((d*2) < 0.0)
• d > f  -f < -d
• d * d >= 0.0
• (d+f)-d == f
int x = …;
float f = …;
double d = …;
Assume neither
d nor f is NaN
CS 213 S’01
– 3 –
class10.ppt
IEEE Floating Point
IEEE Standard 754
• Established in 1985 as uniform standard for floating point arithmetic
– Before that, many idiosyncratic formats
• Supported by all major CPUs
Driven by Numerical Concerns
• Nice standards for rounding, overflow, underflow
• Hard to make go fast
– Numerical analysts predominated over hardware types in defining
standard
CS 213 S’01
– 4 –
class10.ppt
Fractional Binary Numbers
Representation
• Bits to right of “binary point” represent fractional powers of 2
• Represents rational number:
bi bi–1 b2 b1 b0 b–1 b–2 b–3 b–j
• • •
• • • .
1
2
4
2i–1
2i
• • •
•
•
•
1/2
1/4
1/8
2–j
bk 2k
k j
i

CS 213 S’01
– 5 –
class10.ppt
Fractional Binary Number Examples
Value Representation
5-3/4 101.112
2-7/8 10.1112
63/64 0.1111112
Observation
• Divide by 2 by shifting right
• Numbers of form 0.111111…2 just below 1.0
– Use notation 1.0 – 
Limitation
• Can only exactly represent numbers of the form x/2k
• Other numbers have repeating bit representations
Value Representation
1/3 0.0101010101[01]…2
1/5 0.001100110011[0011]…2
1/10 0.0001100110011[0011]…2
CS 213 S’01
– 6 –
class10.ppt
Numerical Form
• –1s M 2E
– Sign bit s determines whether number is negative or positive
– Significand M normally a fractional value in range [1.0,2.0).
– Exponent E weights value by power of two
Encoding
• MSB is sign bit
• exp field encodes E
• frac field encodes M
Sizes
• Single precision: 8 exp bits, 23 frac bits
– 32 bits total
• Double precision: 11 exp bits, 52 frac bits
– 64 bits total
Floating Point Representation
s exp frac
CS 213 S’01
– 7 –
class10.ppt
“Normalized” Numeric Values
Condition
• exp  000…0 and exp  111…1
Exponent coded as biased value
E = Exp – Bias
– Exp : unsigned value denoted by exp
– Bias : Bias value
» Single precision: 127 (Exp: 1…254, E: -126…127)
» Double precision: 1023 (Exp: 1…2046, E: -1022…1023)
» in general: Bias = 2m-1 - 1, where m is the number of exponent bits
Significand coded with implied leading 1
m = 1.xxx…x2
– xxx…x: bits of frac
– Minimum when 000…0 (M = 1.0)
– Maximum when 111…1 (M = 2.0 – )
– Get extra leading bit for “free”
CS 213 S’01
– 8 –
class10.ppt
Normalized Encoding Example
Value
Float F = 15213.0;
• 1521310 = 111011011011012 = 1.11011011011012 X 213
Significand
M = 1.11011011011012
frac = 110110110110100000000002
Exponent
E = 13
Bias = 127
Exp = 140 = 100011002
Floating Point Representation (Class 02):
Hex: 4 6 6 D B 4 0 0
Binary: 0100 0110 0110 1101 1011 0100 0000 0000
140: 100 0110 0
15213: 1110 1101 1011 01
CS 213 S’01
– 9 –
class10.ppt
Denormalized Values
Condition
• exp = 000…0
Value
• Exponent value E = –Bias + 1
• Significand value m = 0.xxx…x2
–xxx…x: bits of frac
Cases
• exp = 000…0, frac = 000…0
– Represents value 0
– Note that have distinct values +0 and –0
• exp = 000…0, frac  000…0
– Numbers very close to 0.0
– Lose precision as get smaller
– “Gradual underflow”
CS 213 S’01
– 10 –
class10.ppt
Special Values
Condition
• exp = 111…1
Cases
• exp = 111…1, frac = 000…0
– Represents value (infinity)
– Operation that overflows
– Both positive and negative
– E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 = 
• exp = 111…1, frac  000…0
– Not-a-Number (NaN)
– Represents case when no numeric value can be determined
– E.g., sqrt(–1), 
CS 213 S’01
– 11 –
class10.ppt
Summary of Floating Point
Real Number Encodings
NaN
NaN
+

0
+Denorm +Normalized
-Denorm
-Normalized
+0
CS 213 S’01
– 12 –
class10.ppt
Tiny floating point example
8-bit Floating Point Representation
• the sign bit is in the most significant bit.
• the next four bits are the exponent, with a bias of 7.
• the last three bits are the frac
• Same General Form as IEEE Format
• normalized, denormalized
• representation of 0, NaN, infinity
s exp frac
0
2
3
6
7
CS 213 S’01
– 13 –
class10.ppt
Values related to the exponent
Exp exp E 2E
0 0000 -6 1/64 (denorms)
1 0001 -6 1/64
2 0010 -5 1/32
3 0011 -4 1/16
4 0100 -3 1/8
5 0101 -2 1/4
6 0110 -1 1/2
7 0111 0 1
8 1000 +1 2
9 1001 +2 4
10 1010 +3 8
11 1011 +4 16
12 1100 +5 32
13 1101 +6 64
14 1110 +7 128
15 1111 n/a (inf, NaN)
CS 213 S’01
– 14 –
class10.ppt
Dynamic Range
s exp frac E Value
0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512
0 0000 010 -6 2/8*1/64 = 2/512
…
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512
0 0001 000 -6 8/8*1/64 = 8/512
0 0001 001 -6 9/8*1/64 = 9/512
…
0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16
0 0111 000 0 8/8*1 = 1
0 0111 001 0 9/8*1 = 9/8
0 0111 010 0 10/8*1 = 10/8
…
0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240
0 1111 000 n/a inf
closest to zero
largest denorm
smallest norm
closest to 1 below
closest to 1 above
largest norm
Denormalized
numbers
Normalized
numbers
CS 213 S’01
– 15 –
class10.ppt
Interesting Numbers
Description exp frac Numeric Value
Zero 00…00 00…00 0.0
Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}
• Single  1.4 X 10–45
• Double  4.9 X 10–324
Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022}
• Single  1.18 X 10–38
• Double  2.2 X 10–308
Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}
• Just larger than largest denormalized
One 01…11 00…00 1.0
Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023}
• Single  3.4 X 1038
• Double  1.8 X 10308
CS 213 S’01
– 16 –
class10.ppt
Special Properties of Encoding
FP Zero Same as Integer Zero
• All bits = 0
Can (Almost) Use Unsigned Integer Comparison
• Must first compare sign bits
• Must consider -0 = 0
• NaNs problematic
– Will be greater than any other values
– What should comparison yield?
• Otherwise OK
– Denorm vs. normalized
– Normalized vs. infinity
CS 213 S’01
– 17 –
class10.ppt
Floating Point Operations
Conceptual View
• First compute exact result
• Make it fit into desired precision
– Possibly overflow if exponent too large
– Possibly round to fit into frac
Rounding Modes (illustrate with $ rounding)
$1.40 $1.60 $1.50 $2.50 –$1.50
• Zero $1.00 $1.00 $1.00 $2.00 –$1.00
• Round down (-) $1.00 $1.00 $1.00 $2.00 –$2.00
• Round up (+) $2.00 $2.00 $2.00 $3.00 –$1.00
• Nearest Even (default) $1.00 $2.00 $2.00 $2.00 –$2.00
Note:
1. Round down: rounded result is close to but no greater than true result.
2. Round up: rounded result is close to but no less than true result.
CS 213 S’01
– 18 –
class10.ppt
A Closer Look at Round-To-Even
Default Rounding Mode
• Hard to get any other kind without dropping into assembly
• All others are statistically biased
– Sum of set of positive numbers will consistently be over- or under-
estimated
Applying to Other Decimal Places
• When exactly halfway between two possible values
– Round so that least significant digit is even
• E.g., round to nearest hundredth
1.2349999 1.23 (Less than half way)
1.2350001 1.24 (Greater than half way)
1.2350000 1.24 (Half way—round up)
1.2450000 1.24 (Half way—round down)
CS 213 S’01
– 19 –
class10.ppt
Rounding Binary Numbers
Binary Fractional Numbers
• “Even” when least significant bit is 0
• Half way when bits to right of rounding position = 100…2
Examples
• Round to nearest 1/4 (2 bits right of binary point)
Value Binary Rounded Action Rounded Value
2 3/32 10.000112 10.002 (<1/2—down) 2
2 3/16 10.001102 10.012 (>1/2—up) 2 1/4
2 7/8 10.111002 11.002 (1/2—up) 3
2 5/8 10.101002 10.102 (1/2—down) 2 1/2
CS 213 S’01
– 20 –
class10.ppt
FP Multiplication
Operands
(–1)s1 M1 2E1
(–1)s2 M2 2E2
Exact Result
(–1)s M 2E
• Sign s: s1 ^ s2
• Significand M: M1 * M2
• Exponent E: E1 + E2
Fixing
• If M ≥ 2, shift M right, increment E
• If E out of range, overflow
• Round M to fit frac precision
Implementation
• Biggest chore is multiplying significands
CS 213 S’01
– 21 –
class10.ppt
FP Addition
Operands
(–1)s1 M1 2E1
(–1)s2 M2 2E2
• Assume E1 > E2
Exact Result
(–1)s M 2E
• Sign s, significand M:
– Result of signed align & add
• Exponent E: E1
Fixing
• If M ≥ 2, shift M right, increment E
• if M < 1, shift M left k positions, decrement E by k
• Overflow if E out of range
• Round M to fit frac precision
(–1)s1 m1
(–1)s2 m2
E1–E2
+
(–1)s m
CS 213 S’01
– 22 –
class10.ppt
Mathematical Properties of FP Add
Compare to those of Abelian Group
• Closed under addition? YES
– But may generate infinity or NaN
• Commutative? YES
• Associative? NO
– Overflow and inexactness of rounding
• 0 is additive identity? YES
• Every element has additive inverse ALMOST
– Except for infinities & NaNs
Montonicity
• a ≥ b  a+c ≥ b+c? ALMOST
– Except for infinities & NaNs
CS 213 S’01
– 23 –
class10.ppt
Algebraic Properties of FP Mult
Compare to Commutative Ring
• Closed under multiplication? YES
– But may generate infinity or NaN
• Multiplication Commutative? YES
• Multiplication is Associative? NO
– Possibility of overflow, inexactness of rounding
• 1 is multiplicative identity? YES
• Multiplication distributes over addtion? NO
– Possibility of overflow, inexactness of rounding
Montonicity
• a ≥ b & c ≥ 0  a *c ≥ b *c? ALMOST
– Except for infinities & NaNs
CS 213 S’01
– 24 –
class10.ppt
Floating Point in C
C Guarantees Two Levels
float single precision
double double precision
Conversions
• Casting between int, float, and double changes numeric values
• Double or float to int
– Truncates fractional part
– Like rounding toward zero
– Not defined when out of range
» Generally saturates to TMin or TMax
• int to double
– Exact conversion, as long as int has ≤ 53 bit word size
• int to float
– Will round according to rounding mode
CS 213 S’01
– 25 –
class10.ppt
Answers to Floating Point Puzzles
• x == (int)(float) x No: 24 bit significand
• x == (int)(double) x Yes: 53 bit significand
• f == (float)(double) f Yes: increases precision
• d == (float) d No: loses precision
• f == -(-f); Yes: Just change sign bit
• 2/3 == 2/3.0 No: 2/3 == 0
• d < 0.0 ((d*2) < 0.0) Yes!
• d > f -f < -d Yes!
• d * d >= 0.0 Yes!
• (d+f)-d == f No: Not associative
int x = …;
float f = …;
double d = …;
Assume neither
d nor f is NAN
CS 213 S’01
– 26 –
class10.ppt
IA32 Floating Point
History
• 8086: first computer to implement IEEE FP
– separate 8087 FPU (floating point unit)
• 486: merged FPU and Integer Unit onto one chip
Summary
• Hardware to add, multiply, and divide
• Floating point data registers
• Various control & status registers
Floating Point Formats
• single precision (C float): 32 bits
• double precision (C double): 64 bits
• extended precision (C long double): 80 bits
Instruction
decoder and
sequencer
FPU
Integer
Unit
Data Bus
CS 213 S’01
– 27 –
class10.ppt
FPU Data Register Stack
FPU register format (extended precision)
s exp frac
0
63
64
78
79
R7
R6
R5
R4
R3
R2
R1
R0
st(0)
st(1)
st(2)
st(3)
st(4)
st(5)
st(6)
st(7)
Top
FPU register stack
• stack grows down
– wraps around from R0 -> R7
• FPU registers are typically
referenced relative to top of
stack
– st(0) is top of stack (Top)
– followed by st(1), st(2),…
• push: increment Top, load
• pop: store, decrement Top stack grows down
absolute view stack view
CS 213 S’01
– 28 –
class10.ppt
FPU instructions
Large number of floating point instructions and formats
• ~50 basic instruction types
• load, store, add, multiply
• sin, cos, tan, arctan, and log!
Sampling of instructions:
Instruction Effect Description
fldz push 0.0 Load zero
flds S push S Load single precision real
fmuls S st(0) <- st(0)*S Multiply
faddp st(1) <- st(0)+st(1); pop Add and pop
CS 213 S’01
– 29 –
class10.ppt
Floating Point Code Example
Compute Inner Product of Two Vectors
• Single precision arithmetic
• Scientific computing and
signal processing workhorse
float ipf (float x[],
float y[],
int n)
{
int i;
float result = 0.0;
for (i = 0; i < n; i++) {
result += x[i] * y[i];
}
return result;
}
pushl %ebp # setup
movl %esp,%ebp
pushl %ebx
movl 8(%ebp),%ebx # %ebx=&x
movl 12(%ebp),%ecx # %ecx=&y
movl 16(%ebp),%edx # %edx=n
fldz # push +0.0
xorl %eax,%eax # i=0
cmpl %edx,%eax # if i>=n done
jge .L3
.L5:
flds (%ebx,%eax,4) # push x[i]
fmuls (%ecx,%eax,4) # st(0)*=y[i]
faddp # st(1)+=st(0); pop
incl %eax # i++
cmpl %edx,%eax # if i<n repeat
jl .L5
.L3:
movl -4(%ebp),%ebx # finish
leave
ret # st(0) = result
CS 213 S’01
– 30 –
class10.ppt
Inner product stack trace
0
st(0)
1. fldz 2. flds (%ebx,%eax,4)
0
st(0) x[0]
st(1)
3. fmuls (%ecx,%eax,4)
0
st(0) x[0]*y[0]
st(1)
4. faddp %st,%st(1)
0 + x[0]*y[0]
st(0)
5. flds (%ebx,%eax,4)
0 + x[0]*y[0]
st(0) x[1]
6. fmuls (%ecx,%eax,4)
0 + x[0]*y[0]
st(1)
x[1]*y[1]
st(0)
7. faddp %st,%st(1)
0 + x[0]*y[0] + x[1]*y[1]
st(0)
st(1)
CS 213 S’01
– 31 –
class10.ppt
Summary
IEEE Floating Point Has Clear Mathematical Properties
• Represents numbers of form M X 2E
• Can reason about operations independent of implementation
– As if computed with perfect precision and then rounded
• Not the same as real arithmetic
– Violates associativity/distributivity
– Makes life difficult for compilers & serious numerical applications
programmers
IA32 Floating Point is a Mess
• Ill-conceived, pseudo-stack architecture
• Covered in notes

More Related Content

What's hot

Floating point representation
Floating point representation Floating point representation
Floating point representation SVisnuDharsini
 
Module 3 lp-simplex
Module 3 lp-simplexModule 3 lp-simplex
Module 3 lp-simplexraajeeradha
 
Simplex Method
Simplex MethodSimplex Method
Simplex MethodSachin MK
 
Simplex method
Simplex methodSimplex method
Simplex methodtatteya
 
Digital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systemsDigital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systemsImran Waris
 
Lesson 30: Duality In Linear Programming
Lesson 30: Duality In Linear ProgrammingLesson 30: Duality In Linear Programming
Lesson 30: Duality In Linear Programmingguest463822
 
LP Graphical Solution
LP Graphical SolutionLP Graphical Solution
LP Graphical Solutionunemployedmba
 
Operations Research Problem
Operations Research  ProblemOperations Research  Problem
Operations Research ProblemTaslima Mujawar
 
Introducing to number system
Introducing to number systemIntroducing to number system
Introducing to number systemtcc_joemarie
 
Data representation
Data representationData representation
Data representationChew Hoong
 
Logic Design - Chapter 1: Number Systems and Codes
Logic Design - Chapter 1: Number Systems and CodesLogic Design - Chapter 1: Number Systems and Codes
Logic Design - Chapter 1: Number Systems and CodesGouda Mando
 

What's hot (20)

Floating point representation
Floating point representation Floating point representation
Floating point representation
 
Lpp simplex method
Lpp simplex methodLpp simplex method
Lpp simplex method
 
Module 3 lp-simplex
Module 3 lp-simplexModule 3 lp-simplex
Module 3 lp-simplex
 
Simplex Method
Simplex MethodSimplex Method
Simplex Method
 
Simplex method
Simplex methodSimplex method
Simplex method
 
IEEE Floating Point
IEEE Floating PointIEEE Floating Point
IEEE Floating Point
 
Simplex two phase
Simplex two phaseSimplex two phase
Simplex two phase
 
Digital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systemsDigital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systems
 
Lesson 30: Duality In Linear Programming
Lesson 30: Duality In Linear ProgrammingLesson 30: Duality In Linear Programming
Lesson 30: Duality In Linear Programming
 
LP Graphical Solution
LP Graphical SolutionLP Graphical Solution
LP Graphical Solution
 
Z scores
Z scoresZ scores
Z scores
 
Two phase method lpp
Two phase method lppTwo phase method lpp
Two phase method lpp
 
Floating point Numbers
Floating point NumbersFloating point Numbers
Floating point Numbers
 
Operations Research Problem
Operations Research  ProblemOperations Research  Problem
Operations Research Problem
 
Number system
Number systemNumber system
Number system
 
Introducing to number system
Introducing to number systemIntroducing to number system
Introducing to number system
 
Number system converted
Number system convertedNumber system converted
Number system converted
 
Data representation
Data representationData representation
Data representation
 
30 Cryptography
30 Cryptography30 Cryptography
30 Cryptography
 
Logic Design - Chapter 1: Number Systems and Codes
Logic Design - Chapter 1: Number Systems and CodesLogic Design - Chapter 1: Number Systems and Codes
Logic Design - Chapter 1: Number Systems and Codes
 

Similar to IEEE Floating Point Standard and Operations

Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...Rai University
 
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...Rai University
 
Floating_point_representation.pdf
Floating_point_representation.pdfFloating_point_representation.pdf
Floating_point_representation.pdfRameshK531901
 
Number system and codes
Number system and codesNumber system and codes
Number system and codesAbhiraj Bohra
 
Aviraj --floating point representation and arithmetic.pptx
Aviraj --floating point representation and arithmetic.pptxAviraj --floating point representation and arithmetic.pptx
Aviraj --floating point representation and arithmetic.pptxIronmanhmarvel
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.pptssuser52a19e
 
Review on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and HexadecimalReview on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and HexadecimalUtkirjonUbaydullaev1
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.pptRAJKUMARP63
 
Unit 1 PDF.pptx
Unit 1 PDF.pptxUnit 1 PDF.pptx
Unit 1 PDF.pptxChandraV13
 
digital-electronics.pptx
digital-electronics.pptxdigital-electronics.pptx
digital-electronics.pptxsulekhasaxena2
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15Shani729
 
09 arithmetic
09 arithmetic09 arithmetic
09 arithmeticargiaggi
 

Similar to IEEE Floating Point Standard and Operations (20)

Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
 
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
 
Floating Point Numbers
Floating Point NumbersFloating Point Numbers
Floating Point Numbers
 
Floating_point_representation.pdf
Floating_point_representation.pdfFloating_point_representation.pdf
Floating_point_representation.pdf
 
Number system and codes
Number system and codesNumber system and codes
Number system and codes
 
Aviraj --floating point representation and arithmetic.pptx
Aviraj --floating point representation and arithmetic.pptxAviraj --floating point representation and arithmetic.pptx
Aviraj --floating point representation and arithmetic.pptx
 
number system.ppt
number system.pptnumber system.ppt
number system.ppt
 
Data Representation
Data RepresentationData Representation
Data Representation
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
Review on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and HexadecimalReview on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and Hexadecimal
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
binary-numbers.ppt
binary-numbers.pptbinary-numbers.ppt
binary-numbers.ppt
 
Unit 1 PDF.pptx
Unit 1 PDF.pptxUnit 1 PDF.pptx
Unit 1 PDF.pptx
 
digital-electronics.pptx
digital-electronics.pptxdigital-electronics.pptx
digital-electronics.pptx
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
09 arithmetic
09 arithmetic09 arithmetic
09 arithmetic
 
09 arithmetic
09 arithmetic09 arithmetic
09 arithmetic
 
09 arithmetic 2
09 arithmetic 209 arithmetic 2
09 arithmetic 2
 
L 2.10
L 2.10L 2.10
L 2.10
 

Recently uploaded

Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 

Recently uploaded (20)

Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 

IEEE Floating Point Standard and Operations

  • 1. Floating Point Arithmetic February 15, 2001 Topics • IEEE Floating Point Standard • Rounding • Floating Point Operations • Mathematical properties • IA32 floating point class10.ppt 15-213 “The course that gives CMU its Zip!”
  • 2. CS 213 S’01 – 2 – class10.ppt Floating Point Puzzles • For each of the following C expressions, either: – Argue that it is true for all argument values – Explain why not true • x == (int)(float) x • x == (int)(double) x • f == (float)(double) f • d == (float) d • f == -(-f); • 2/3 == 2/3.0 • d < 0.0  ((d*2) < 0.0) • d > f  -f < -d • d * d >= 0.0 • (d+f)-d == f int x = …; float f = …; double d = …; Assume neither d nor f is NaN
  • 3. CS 213 S’01 – 3 – class10.ppt IEEE Floating Point IEEE Standard 754 • Established in 1985 as uniform standard for floating point arithmetic – Before that, many idiosyncratic formats • Supported by all major CPUs Driven by Numerical Concerns • Nice standards for rounding, overflow, underflow • Hard to make go fast – Numerical analysts predominated over hardware types in defining standard
  • 4. CS 213 S’01 – 4 – class10.ppt Fractional Binary Numbers Representation • Bits to right of “binary point” represent fractional powers of 2 • Represents rational number: bi bi–1 b2 b1 b0 b–1 b–2 b–3 b–j • • • • • • . 1 2 4 2i–1 2i • • • • • • 1/2 1/4 1/8 2–j bk 2k k j i 
  • 5. CS 213 S’01 – 5 – class10.ppt Fractional Binary Number Examples Value Representation 5-3/4 101.112 2-7/8 10.1112 63/64 0.1111112 Observation • Divide by 2 by shifting right • Numbers of form 0.111111…2 just below 1.0 – Use notation 1.0 –  Limitation • Can only exactly represent numbers of the form x/2k • Other numbers have repeating bit representations Value Representation 1/3 0.0101010101[01]…2 1/5 0.001100110011[0011]…2 1/10 0.0001100110011[0011]…2
  • 6. CS 213 S’01 – 6 – class10.ppt Numerical Form • –1s M 2E – Sign bit s determines whether number is negative or positive – Significand M normally a fractional value in range [1.0,2.0). – Exponent E weights value by power of two Encoding • MSB is sign bit • exp field encodes E • frac field encodes M Sizes • Single precision: 8 exp bits, 23 frac bits – 32 bits total • Double precision: 11 exp bits, 52 frac bits – 64 bits total Floating Point Representation s exp frac
  • 7. CS 213 S’01 – 7 – class10.ppt “Normalized” Numeric Values Condition • exp  000…0 and exp  111…1 Exponent coded as biased value E = Exp – Bias – Exp : unsigned value denoted by exp – Bias : Bias value » Single precision: 127 (Exp: 1…254, E: -126…127) » Double precision: 1023 (Exp: 1…2046, E: -1022…1023) » in general: Bias = 2m-1 - 1, where m is the number of exponent bits Significand coded with implied leading 1 m = 1.xxx…x2 – xxx…x: bits of frac – Minimum when 000…0 (M = 1.0) – Maximum when 111…1 (M = 2.0 – ) – Get extra leading bit for “free”
  • 8. CS 213 S’01 – 8 – class10.ppt Normalized Encoding Example Value Float F = 15213.0; • 1521310 = 111011011011012 = 1.11011011011012 X 213 Significand M = 1.11011011011012 frac = 110110110110100000000002 Exponent E = 13 Bias = 127 Exp = 140 = 100011002 Floating Point Representation (Class 02): Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 140: 100 0110 0 15213: 1110 1101 1011 01
  • 9. CS 213 S’01 – 9 – class10.ppt Denormalized Values Condition • exp = 000…0 Value • Exponent value E = –Bias + 1 • Significand value m = 0.xxx…x2 –xxx…x: bits of frac Cases • exp = 000…0, frac = 000…0 – Represents value 0 – Note that have distinct values +0 and –0 • exp = 000…0, frac  000…0 – Numbers very close to 0.0 – Lose precision as get smaller – “Gradual underflow”
  • 10. CS 213 S’01 – 10 – class10.ppt Special Values Condition • exp = 111…1 Cases • exp = 111…1, frac = 000…0 – Represents value (infinity) – Operation that overflows – Both positive and negative – E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 =  • exp = 111…1, frac  000…0 – Not-a-Number (NaN) – Represents case when no numeric value can be determined – E.g., sqrt(–1), 
  • 11. CS 213 S’01 – 11 – class10.ppt Summary of Floating Point Real Number Encodings NaN NaN +  0 +Denorm +Normalized -Denorm -Normalized +0
  • 12. CS 213 S’01 – 12 – class10.ppt Tiny floating point example 8-bit Floating Point Representation • the sign bit is in the most significant bit. • the next four bits are the exponent, with a bias of 7. • the last three bits are the frac • Same General Form as IEEE Format • normalized, denormalized • representation of 0, NaN, infinity s exp frac 0 2 3 6 7
  • 13. CS 213 S’01 – 13 – class10.ppt Values related to the exponent Exp exp E 2E 0 0000 -6 1/64 (denorms) 1 0001 -6 1/64 2 0010 -5 1/32 3 0011 -4 1/16 4 0100 -3 1/8 5 0101 -2 1/4 6 0110 -1 1/2 7 0111 0 1 8 1000 +1 2 9 1001 +2 4 10 1010 +3 8 11 1011 +4 16 12 1100 +5 32 13 1101 +6 64 14 1110 +7 128 15 1111 n/a (inf, NaN)
  • 14. CS 213 S’01 – 14 – class10.ppt Dynamic Range s exp frac E Value 0 0000 000 -6 0 0 0000 001 -6 1/8*1/64 = 1/512 0 0000 010 -6 2/8*1/64 = 2/512 … 0 0000 110 -6 6/8*1/64 = 6/512 0 0000 111 -6 7/8*1/64 = 7/512 0 0001 000 -6 8/8*1/64 = 8/512 0 0001 001 -6 9/8*1/64 = 9/512 … 0 0110 110 -1 14/8*1/2 = 14/16 0 0110 111 -1 15/8*1/2 = 15/16 0 0111 000 0 8/8*1 = 1 0 0111 001 0 9/8*1 = 9/8 0 0111 010 0 10/8*1 = 10/8 … 0 1110 110 7 14/8*128 = 224 0 1110 111 7 15/8*128 = 240 0 1111 000 n/a inf closest to zero largest denorm smallest norm closest to 1 below closest to 1 above largest norm Denormalized numbers Normalized numbers
  • 15. CS 213 S’01 – 15 – class10.ppt Interesting Numbers Description exp frac Numeric Value Zero 00…00 00…00 0.0 Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022} • Single  1.4 X 10–45 • Double  4.9 X 10–324 Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022} • Single  1.18 X 10–38 • Double  2.2 X 10–308 Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022} • Just larger than largest denormalized One 01…11 00…00 1.0 Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023} • Single  3.4 X 1038 • Double  1.8 X 10308
  • 16. CS 213 S’01 – 16 – class10.ppt Special Properties of Encoding FP Zero Same as Integer Zero • All bits = 0 Can (Almost) Use Unsigned Integer Comparison • Must first compare sign bits • Must consider -0 = 0 • NaNs problematic – Will be greater than any other values – What should comparison yield? • Otherwise OK – Denorm vs. normalized – Normalized vs. infinity
  • 17. CS 213 S’01 – 17 – class10.ppt Floating Point Operations Conceptual View • First compute exact result • Make it fit into desired precision – Possibly overflow if exponent too large – Possibly round to fit into frac Rounding Modes (illustrate with $ rounding) $1.40 $1.60 $1.50 $2.50 –$1.50 • Zero $1.00 $1.00 $1.00 $2.00 –$1.00 • Round down (-) $1.00 $1.00 $1.00 $2.00 –$2.00 • Round up (+) $2.00 $2.00 $2.00 $3.00 –$1.00 • Nearest Even (default) $1.00 $2.00 $2.00 $2.00 –$2.00 Note: 1. Round down: rounded result is close to but no greater than true result. 2. Round up: rounded result is close to but no less than true result.
  • 18. CS 213 S’01 – 18 – class10.ppt A Closer Look at Round-To-Even Default Rounding Mode • Hard to get any other kind without dropping into assembly • All others are statistically biased – Sum of set of positive numbers will consistently be over- or under- estimated Applying to Other Decimal Places • When exactly halfway between two possible values – Round so that least significant digit is even • E.g., round to nearest hundredth 1.2349999 1.23 (Less than half way) 1.2350001 1.24 (Greater than half way) 1.2350000 1.24 (Half way—round up) 1.2450000 1.24 (Half way—round down)
  • 19. CS 213 S’01 – 19 – class10.ppt Rounding Binary Numbers Binary Fractional Numbers • “Even” when least significant bit is 0 • Half way when bits to right of rounding position = 100…2 Examples • Round to nearest 1/4 (2 bits right of binary point) Value Binary Rounded Action Rounded Value 2 3/32 10.000112 10.002 (<1/2—down) 2 2 3/16 10.001102 10.012 (>1/2—up) 2 1/4 2 7/8 10.111002 11.002 (1/2—up) 3 2 5/8 10.101002 10.102 (1/2—down) 2 1/2
  • 20. CS 213 S’01 – 20 – class10.ppt FP Multiplication Operands (–1)s1 M1 2E1 (–1)s2 M2 2E2 Exact Result (–1)s M 2E • Sign s: s1 ^ s2 • Significand M: M1 * M2 • Exponent E: E1 + E2 Fixing • If M ≥ 2, shift M right, increment E • If E out of range, overflow • Round M to fit frac precision Implementation • Biggest chore is multiplying significands
  • 21. CS 213 S’01 – 21 – class10.ppt FP Addition Operands (–1)s1 M1 2E1 (–1)s2 M2 2E2 • Assume E1 > E2 Exact Result (–1)s M 2E • Sign s, significand M: – Result of signed align & add • Exponent E: E1 Fixing • If M ≥ 2, shift M right, increment E • if M < 1, shift M left k positions, decrement E by k • Overflow if E out of range • Round M to fit frac precision (–1)s1 m1 (–1)s2 m2 E1–E2 + (–1)s m
  • 22. CS 213 S’01 – 22 – class10.ppt Mathematical Properties of FP Add Compare to those of Abelian Group • Closed under addition? YES – But may generate infinity or NaN • Commutative? YES • Associative? NO – Overflow and inexactness of rounding • 0 is additive identity? YES • Every element has additive inverse ALMOST – Except for infinities & NaNs Montonicity • a ≥ b  a+c ≥ b+c? ALMOST – Except for infinities & NaNs
  • 23. CS 213 S’01 – 23 – class10.ppt Algebraic Properties of FP Mult Compare to Commutative Ring • Closed under multiplication? YES – But may generate infinity or NaN • Multiplication Commutative? YES • Multiplication is Associative? NO – Possibility of overflow, inexactness of rounding • 1 is multiplicative identity? YES • Multiplication distributes over addtion? NO – Possibility of overflow, inexactness of rounding Montonicity • a ≥ b & c ≥ 0  a *c ≥ b *c? ALMOST – Except for infinities & NaNs
  • 24. CS 213 S’01 – 24 – class10.ppt Floating Point in C C Guarantees Two Levels float single precision double double precision Conversions • Casting between int, float, and double changes numeric values • Double or float to int – Truncates fractional part – Like rounding toward zero – Not defined when out of range » Generally saturates to TMin or TMax • int to double – Exact conversion, as long as int has ≤ 53 bit word size • int to float – Will round according to rounding mode
  • 25. CS 213 S’01 – 25 – class10.ppt Answers to Floating Point Puzzles • x == (int)(float) x No: 24 bit significand • x == (int)(double) x Yes: 53 bit significand • f == (float)(double) f Yes: increases precision • d == (float) d No: loses precision • f == -(-f); Yes: Just change sign bit • 2/3 == 2/3.0 No: 2/3 == 0 • d < 0.0 ((d*2) < 0.0) Yes! • d > f -f < -d Yes! • d * d >= 0.0 Yes! • (d+f)-d == f No: Not associative int x = …; float f = …; double d = …; Assume neither d nor f is NAN
  • 26. CS 213 S’01 – 26 – class10.ppt IA32 Floating Point History • 8086: first computer to implement IEEE FP – separate 8087 FPU (floating point unit) • 486: merged FPU and Integer Unit onto one chip Summary • Hardware to add, multiply, and divide • Floating point data registers • Various control & status registers Floating Point Formats • single precision (C float): 32 bits • double precision (C double): 64 bits • extended precision (C long double): 80 bits Instruction decoder and sequencer FPU Integer Unit Data Bus
  • 27. CS 213 S’01 – 27 – class10.ppt FPU Data Register Stack FPU register format (extended precision) s exp frac 0 63 64 78 79 R7 R6 R5 R4 R3 R2 R1 R0 st(0) st(1) st(2) st(3) st(4) st(5) st(6) st(7) Top FPU register stack • stack grows down – wraps around from R0 -> R7 • FPU registers are typically referenced relative to top of stack – st(0) is top of stack (Top) – followed by st(1), st(2),… • push: increment Top, load • pop: store, decrement Top stack grows down absolute view stack view
  • 28. CS 213 S’01 – 28 – class10.ppt FPU instructions Large number of floating point instructions and formats • ~50 basic instruction types • load, store, add, multiply • sin, cos, tan, arctan, and log! Sampling of instructions: Instruction Effect Description fldz push 0.0 Load zero flds S push S Load single precision real fmuls S st(0) <- st(0)*S Multiply faddp st(1) <- st(0)+st(1); pop Add and pop
  • 29. CS 213 S’01 – 29 – class10.ppt Floating Point Code Example Compute Inner Product of Two Vectors • Single precision arithmetic • Scientific computing and signal processing workhorse float ipf (float x[], float y[], int n) { int i; float result = 0.0; for (i = 0; i < n; i++) { result += x[i] * y[i]; } return result; } pushl %ebp # setup movl %esp,%ebp pushl %ebx movl 8(%ebp),%ebx # %ebx=&x movl 12(%ebp),%ecx # %ecx=&y movl 16(%ebp),%edx # %edx=n fldz # push +0.0 xorl %eax,%eax # i=0 cmpl %edx,%eax # if i>=n done jge .L3 .L5: flds (%ebx,%eax,4) # push x[i] fmuls (%ecx,%eax,4) # st(0)*=y[i] faddp # st(1)+=st(0); pop incl %eax # i++ cmpl %edx,%eax # if i<n repeat jl .L5 .L3: movl -4(%ebp),%ebx # finish leave ret # st(0) = result
  • 30. CS 213 S’01 – 30 – class10.ppt Inner product stack trace 0 st(0) 1. fldz 2. flds (%ebx,%eax,4) 0 st(0) x[0] st(1) 3. fmuls (%ecx,%eax,4) 0 st(0) x[0]*y[0] st(1) 4. faddp %st,%st(1) 0 + x[0]*y[0] st(0) 5. flds (%ebx,%eax,4) 0 + x[0]*y[0] st(0) x[1] 6. fmuls (%ecx,%eax,4) 0 + x[0]*y[0] st(1) x[1]*y[1] st(0) 7. faddp %st,%st(1) 0 + x[0]*y[0] + x[1]*y[1] st(0) st(1)
  • 31. CS 213 S’01 – 31 – class10.ppt Summary IEEE Floating Point Has Clear Mathematical Properties • Represents numbers of form M X 2E • Can reason about operations independent of implementation – As if computed with perfect precision and then rounded • Not the same as real arithmetic – Violates associativity/distributivity – Makes life difficult for compilers & serious numerical applications programmers IA32 Floating Point is a Mess • Ill-conceived, pseudo-stack architecture • Covered in notes