3. IEEE 754 FLOATING POINT For Comp. ORG.pdf

IEEE 754 FLOATING POINT
REPRESENTATION
Prof. Tanvi Goswami
Dept. Of Information Technology
DDU, Nadiad
Prof. Tanvi Goswami

Fixed Point and Floating Point Number
Representations
Storing Real Number
There are two major approaches to store real numbers (i.e.,
numbers with fractional component) in modern computing.
These are
(i) Fixed Point Notation and
(ii) Floating Point Notation.
In fixed point notation, there are a fixed number of digits after
the decimal point, whereas floating point number allows for
a varying number of digits after the decimal point.
Prof. Tanvi Goswami

Floating point decimal number
Prof. Tanvi Goswami

Floating point decimal number
• There are different representations for the
same number and there is no fixed position
for the decimal point.
• Given a fixed number of digits, there may be a
loss of precession.
• Three pieces of information represents a
number: sign of the number, the significant
value and the signed exponent of 10.
Prof. Tanvi Goswami

note
• Given a fixed number of digits, the floating-
point representation covers a wider range of
values compared to a fixed-point
representation.
Prof. Tanvi Goswami

IEEE 754 standard
• Most of the binary floating-point
representations follow the IEEE-754 standard.
→ The data type float uses IEEE 32-bit single
precision format and the data type double
uses IEEE 64-bit double precision format.
Prof. Tanvi Goswami

IEEE 754 32 bit format
N = (-1)^s * (1.M) * 2^(E-127)
S is sign
M is mantissa
E is exponent
Prof. Tanvi Goswami

Example:
3.5
Binary of 3.5 = 11.1
Sign is = 0 (no is positive) i.e. (-1)^0 = is
Normalize 11.1 => 1.11 x 2^1
Compare the exponent
 E-127 = 1
 E = 1+127
 E = 128
 Binary of 128 = 1000 0000
Mantissa is => .11
Therefore 1.M is 1.11
IEEE 754 representation is:
sign 8 bit Exponent 23 bits mantissa
0 1000 0000 11 000 …………..
Prof. Tanvi Goswami
N (-1)^s * (1.M) * 2^(E-127)

Reverse Example:
• 0 1000 0000 11 000 …………..
=> S bit is 0 i.e. number is positive
E-127 = 128 – 127 = 1
 (-1)^0 x (1. 11) x 2^1 (N (-1)^s * (1.M) * 2^(E-127) )
1.11 x 2^1
11.1
Therefore number is 3.5
Prof. Tanvi Goswami

Example: 85.125
Prof. Tanvi Goswami

Example: 85.125
85.125 85 = 1010101
0.125 = 001
85.125 = 1010101.001
=1.010101001 x 2^6
sign = 0
Single precision:
Exponent: E-127=6 => E = 127+6 = 133
133 = 10000101
Normalised mantisa = 010101001 we will add 0's to complete the 23
bits
The IEEE 754 Single precision is: =
0 10000101 01010100100000000000000
sign 8 bit Exponent 23 bits mantissa
0 1000 0101 01010100100000000000000
Prof. Tanvi Goswami

Single precision range
Prof. Tanvi Goswami

N = (-1)^s (1.M) 2^(E-1023)
S is sign
M is mantissa
E is exponent
Prof. Tanvi Goswami

What is 127 in IEEE 754?
• In “excess 127 form” negative exponents range from 0
to 126, and positive exponents range from 128 to 255.
The missing exponent, 127, is the one right in the
middle and represents a power of zero.
• The eight-bit exponent uses excess 127 notation.
What this means is that the exponent is represented in
the field by a number 127 greater than its value.
Why?
Because it lets us use an integer comparison to tell if one
floating point number is larger than another, so long as
both are the same sign.
Prof. Tanvi Goswami

Special conditions
Prof. Tanvi Goswami

Special conditions
• Special Values: IEEE has reserved some values that can ambiguity.
• Zero –
Zero is a special value denoted with an exponent and mantissa of 0. -0 and +0 are
distinct values, though they both are equal.
• Denormalised –
If the exponent is all zeros, but the mantissa is not then the value is a
denormalized number. This means this number does not have an assumed leading
one before the binary point.
• Infinity –
The values +infinity and -infinity are denoted with an exponent of all ones and a
mantissa of all zeros. The sign bit distinguishes between negative infinity and
positive infinity. Operations with infinite values are well defined in IEEE.
• Not A Number (NAN) –
The value NAN is used to represent a value that is an error. This is represented
when exponent field is all ones with a zero sign bit or a mantissa that it not 1
followed by zeros. This is a special value that might be used to denote a variable
that doesn’t yet hold a value.
Prof. Tanvi Goswami

Example
• The following scheme is used for floating point number
representation using 16 bits.
• Let the floating point number is represented as
N= (-1)^s * [ (1 + m * 2 ^(-9) ) ] * 2 ^ (e-31) , if
exponent is not equal to 111111 & 0 otherwise.
• What is the maximum difference between two
successive real numbers that can be represented in this
system?
Prof. Tanvi Goswami
Sign Exponent Mantissa
1 bit 6 bits 9 bits

Solution
For 1st number:
Let s=0 , e = 62
(as e != 111111, we assume e =
111110, m = 111 111 111)
N1= (1+511*2^-9) * 2 ^ (62-31)
= 2^31 + 511 * 2^22
Prof. Tanvi Goswami
For 2nd number:
Let s=0 , e = 62
111110, m = 111 111 110)
N1= (1+510*2^-9) * 2 ^ (62-31)
= 2^31 + 510 * 2^22
difference between two successive real numbers = N1-N2
= 2^31 + 511 * 2^22 - (2^31 + 510 * 2^22)
= 2^22

Example
• The following scheme is used for floating point number
representation using 16 bits.
• Let the floating point number is represented as (-1)^s (1
+ m * 2 ^(-9) ) * 2 ^ (e-31) , if exponent is not equal to
000000 & 0 otherwise.
• What is the maximum difference between two
SMALLEST real numbers that can be represented in
this system?
Prof. Tanvi Goswami
Sign Exponent Mantissa
1 bit 6 bits 9 bits

Solution
For 1st number:
Let s=0 , e = 1, M=0
000001, m = 000000000)
Prof. Tanvi Goswami
For 2nd number:
Let s=0 , e = 1, M=1
000001, m = 000000001)
difference between two successive real numbers = N1-N2

Example
(I)Convert the following IEEE-754 32 bit number
to decimal: 46800380(Hex)
(II)Convert the following IEEE-754 64 bit
number to decimal: 4041E00000000000(Hex)
Prof. Tanvi Goswami, D.D. University,
Nadiad

Solution
(I)Convert the following IEEE-754 32 bit number
to decimal: 46800380(Hex)
0100 0110 1000 0000 0000 0011 1000 0000
0 1000 1011 0000 0000000 0011 1000 0000
Nadiad

Example
(II)Convert the following IEEE-754 64 bit
number to decimal: 4041E00000000000(Hex)
Nadiad

3. IEEE 754 FLOATING POINT For Comp. ORG.pdf

More Related Content

What's hot

Similar to 3. IEEE 754 FLOATING POINT For Comp. ORG.pdf

Recently uploaded

3. IEEE 754 FLOATING POINT For Comp. ORG.pdf