Floating Point Representation premium.pptx

FLOATING POINT
REPRESENTATION OF
NUMBERS

WHAT IS COMPUTER
ARITHMETIC ?
COMPUTER ARITHMETIC is a field of
COMPUTER SCIENCE that investigates how
computers should represent numbers &
perform operations on them…

COMPUTER
ARITHMETIC
TWO TYPES OF COMPUTER ARITHMETIC
INTEGER
ARITHMETIC
REAL
ARITHMETIC

INTEGER ARITHMETIC
Arithmetic without fractions. A
computer performing integer arithmetic
ignores any fractions that are derived….
REAL ARITHMETIC
Arithmetic which uses numbers with fractional
parts & is used in most computations…

REAL
ARITHMETIC
TWO TYPES OF REAL ARITHMETIC
FIXED POINT
ARITHMETIC
FLOATING POINT
ARITHMETIC

FIXED POINT ARITHMETIC
In computing, FIXED-POINT number representation is a real
data type for a number. With the help of fixed number
representation ,data is converted into binary form, and then
data is processed, stored & used by the system
Sign bit Integral part Fractional part
1 bit(0 or 1) 9 bits 6 bits

The fixed-point numbers in binary uses a sign bit. A positive
number has a sign bit 0,while a negative number has a sign bit 1
Sign bit
Integral part
The integral part is of different lengths at different places. It
depends on the register’s size, like in an 8-bit register, integral
part is 4 bits…
Fractional part
Fractional part is also of different lengths at different places. It
depends on the register’s size, like in an 8-bit register, integral
part is of 3 bits

8 bits = 1 sign bit + 4 bits (integral) + 3bits (fractional part)
16 bits = 1 sign bit + 9 bits (integral) + 6 bits (fractional part)
•
•
= assigned as sign bit
= assigned as integral part
= assigned as fractional part
• = assigned as assumed binary point

Number is 4.5
Convert the number into binary form ,
4.5 = 100.1
Represent binary number in Fixed point notation
HOW TO WRITE THE NUMBER IN FIXED-POINT
NOTATION
0 0 1 0 0 1 0 0

Magnitude
 the maximum and minimum (in magnitude) numbers that may be
stored are:
111111111.1111112 = (29 - 1) + (1 - 2-6) (Maximum)
= 511.98437510
000000000.0000012 = 2-6 (Minimum)
= 0.01562510
Magnitude of fixed point representation : 0.01562510 to 511.98437510
This range is quite insufficient in practice and so a different rule is
adopted to represent real numbers.

FLOATING POINT REPRESENTATION
FLOATING POINT
NOTATION
SCIENTIFIC
NOTATION
NORMALIZED
NOTATION

Scientific Notation
Method of representing numbers into a× 𝒃𝒆 form. Scientific notation
is further converted into floating point notation because floating
notation only accepts scientific notation.
For example-
Number = 376.423 (its not scientific notation )
Number in scientific notation = 37.6423× 𝟏𝟎𝟏
or 3.76423× 𝟏𝟎𝟐

Normalized Notation
Where m means MANTISSA , b means BASE , e means
EXPONENTIAL
m * 𝒃𝒆
It is a special case of scientific notation. Normalized means The
shifting of the mantissa to the left till its most significant bit is non-
zero.
Normalized notation-

Sign bit
The fixed-point numbers in binary uses a sign bit. A positive number has a sign
bit 0,while a negative number has a sign bit 1. In floating point representation,
sign of a number always depends on mantissa, not on exponent. Hence sign bit
in the format is always for mantissa and not for the exponent.
Mantissa
Mantissa part is of different length at a difference place . It depends on the size
of the register like in 16 bit register, mantissa part is of 9 bits.
Exponent
Exponent is the power of the number. It depends on the registers' size; like in
the 16 bit register , exponent part is 7 bits. Excess 16,64,128,512 are used to
store exponent in this format.

Four things are used to represent a floating point number
Sign of Mantissa
Sign of Exponent
Magnitude of Mantissa
Magnitude of Exponent

The number 1011.0101 x 27
is represented in this notation as-
0.10110101 x 𝟐𝟏𝟏 = 0.10110101E01011
Where mantissa is 0.10110101 & the exponent is 1011
0 1 0 1 1 0 1 0 1 0 0 0 1 0 1 1
Mantissa (9 bits) Exponent(7bits)
Sign of
mantissa
Sign of exponent
Implied binary
point

Magnitude
 The range of numbers (magnitude) that may be stored will be:
Maximum = 0.11111111E0111111
= (1 – 2−8) x 𝟐(𝟐𝟔−𝟏)
≅ 263
Minimum = 0.10000000E1111111
= 2−1 x 𝟐−(𝟐𝟔−𝟏)
= 2−64
Magnitude of floating point representation: 2−64 to 263
This range is much larger than the range 29 to 2-6 obtained with the fixed point
representation
Calculation(mantissa)
n=-8
∑ (2 n)=2 -1 +2 -2 +…+2 -8
n=-1
=(1-2 -8 )
Calculation(exponent)
n=5
∑ (2 n)=2 0 +2 1 +…+2 5
n=5
=(2 6 -1)

Floating Point Representation premium.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Floating Point Representation premium.pptx

Similar to Floating Point Representation premium.pptx (20)

Recently uploaded

Recently uploaded (20)

Floating Point Representation premium.pptx