SlideShare a Scribd company logo
1 of 46
IEEE-754 standard format to handle Floating-
Point calculations in RISC-V CPUs
General Information and Hardware Implementation
Zeeshan Rafique
Research Associate at MERL
Agenda!
● What is standard and why we should to rely on it?
● What is Floating Point and why we need it? Precision and Accuracy
● General Information about Floating Point
○ Bits encoding for single and double precision
○ Conversion from floating point to IEEE 754 standard
○ Floating point just mimic mathematical arithmetic
○ Features of floating point
○ Range of single and double precision
○ Representation of floating point
● Floating Point Arithmetic examples
● Hardware Implementation of FPU with RISC-V Core
○ RISC-V Floating Extensions
○ Floating Point Register file
○ Rounding Mode
○ Floating Point CSRs (fcsr, frm, fflags)
○ Exception Flags , Implementation in Decoder and Controller
2
What is standard and why we should to rely on it?
3
● What does Standard mean?
○ Something established by authority, custom, or general consent as a model or for
reference or benchmark.
● Why do we need a Standard?
○ Standards are needed to assure safety of products, to ensure that products and materials
are tailored-made for their purpose, promote the interoperability of products and services,
facilitate trade by removing trade barriers, promote common understanding of a product
Fixed Notation
● We are accustomed to using a fixed notation where the decimal point is fixed and we
know that any numbers to the right of the decimal point are the decimal portion and
to the left if the integer part.
E.g. 10.75
10 is the integer portion and 0.75 is the decimal portion.
4
Floating Point Representation
● The number should be encoded into scientific notation.
● The structure of a floating point (real) number is as follow:
5.5 * 109
Mantissa Base
Exponent
5
● Only the mantissa and the exponent are stored. The base is implied (know already).
As it is not stored this save the memory capacity.
Why we need floating point?
● Floating point representation makes numerical computation much easier. You
could write all your programs using integers or fixed-point representations, but this is
tedious and error-prone.
● Many numeric applications need numbers over a huge range.
○ e.g., nanoseconds to centuries
● A programmer do arithmetic operation which can results in real numbers (π).
● In either case, the idea is to represent a real rational number in a way similar to
scientific notation.
● For example, the following number is given in scientific notation:
○ 6.022 × 1023 (an approximation to Avogadro’s constant)
6
Floating Point Applications
● Digital Signal Processing
● Domain-Specific Accelerators
● Vector Processing Units
● Medical Electronics
● Everywhere where you need more accurate result.
* TechInsight.com Apple iPhone XS teardown
7
Floating Point Disasters
● Scud Missiles get through, 28 die
○ In 1991, during the 1st Gulf War, patriot missile defense system let a Scud get through, hit a
barracks, and kill 28 people. The problem was due to a floating-point error when taking the
difference of a converted & scaled integer. (https://medium.com/nerd-for-tech/floating-point-
rounding-error-in-computers-6485cc26f5e8)
Source: https://slideplayer.com/slide/16330842/ slide 25
8
● $7B Rocket crashes (Ariane 5)
○ When the first ESA Ariane 5 was launched on June 4, 1996, it lasted only 39 seconds, then the
rocket veered off course and self-destructed. An inertial system, produced a floating-point
exception while trying to convert a 64-bit floating point number to an integer. Ironically, the same
code was used in the Ariane 4, but the larger values were never generated.
(https://around.com/ariane.html)
● Intel Ships and Denies Bugs
○ In 1994, Intel shipped its first Pentium processors with a floating-point divide bug. The bug was
due to bad look-up tables used to speed up quotient calculations. After months of denials, Intel
adopted a no-questions replacement policy, costing $300M.
(https://www.intel.com/support/processors/pentium/fdiv/)
Precision and Accuracy
● Precision: Maximum number p of significant digits that can be represented in a format.
● Accuracy: How accurately the number is defined in a format.
3 bit precision
8 bit precision
lost the accuracy and precision
● Lower the precision, lesser the accuracy. [Not true in all cases]
● In which cases you will get perfect accuracy with very lower precision? 9
Precision and Accuracy
● Precision: Maximum number p of significant digits that can be represented in a format.
● Accuracy: How accurately the number is defined in a format.
● Lower the precision, lesser the accuracy.
● In some cases you can still have perfect accuracy with very low precision.
● If we have 4 bits for precision then,
● 5 / 2 = 2.5 , here we have 100% accuracy.
● 10/3 = 3.33333333 , not 100% accuracy.
10
IEEE-754 Standard
Reference: https://en.wikipedia.org/wiki/IEEE_754-1985
11
IEEE 754-2008 bits encoding
Precision type
Bit types
Half (16-bit) Single (32-bit) Double (64-bit) Quadruple(128-bit)
Sign(MSB) 1 1 1 1
Exponent 5 8 11 15
Fraction 10 23 52 112
12
IEEE 754-2008 Binary encoding of floating point numbers
31 30 23 22
0
Single Precision (32-bit)
63 62 52 51
0
Double Precision (64-bit)
Exponent doping: S= 127, D= 1023
Sign bit (1) `Exponent bits(8) Mantissa / fraction bits(23)
Sign bit(1) `Exponent bits(11) Mantissa / fraction bits(52)
?
13
Quick conversion
Assume that we have a numbers 6.625, convert it according to IEEE 754 standard into
● single precision
● double precision
Solution:
For single precision:
Convert the number into binary. (6.625)10=(?)2
link
Conversion
(6.625)10=(0110.101)2
6 .625
6 / 2 = 3 -> 0 0.625 x 2 = 1.25
3 / 2 = 1 ->1 0.25 x 2 = 0.5
1 / 2 = 0 -> 1 0.5 x 2 = 1.0
Final value = 110 Final value = 101
LSB LSB
14
Quick conversion
● Concept behind conversion
● Let’s say we have 6.625
● 6 = 22 + 21 + 20 = 4 + 2 + 0 => 1 1 0
● 0.625 = 2-1 + 2-2 + 2-3 = 0.5 + 0 +0.125 => 1 0 1
● 6.625 = 110.101
15
● So, we have (6.625)10=(110.101)2
● Setting format of the number: 110.101 x 20 => 1.10101 x 22
● When we move decimal point to left the exponent will get incremented (+ive) and if
we move decimal point to right the power will get decremented (-ive).
+1.10101 x 22
● The orange is Sign, blue is Mantissa and red one is Exponent.
● For single precision:
S = 0
`E = E + Bias
`E = 2 + 127 => 129
`E = 10000001
M = 10101000000000000000000
31 30 29 23 22
0
Quick conversion - Single Precision
0 10000001 10101000000000000000000
Sign bit(1) `Exponent bits(8) Mantissa / fraction bits(23)
16
● So, we have (9.625)10=(110.101)2
● Setting format of the number: 110.101 x 20 => 1.10101 x 22
● When we move point left the power will get incremented (+ive) and if we move point
right the power will get decremented (-ive).
+1.10101 x 22
● The orange is Sign, blue is Mantissa and red one is Exponent.
● For single precision:
S = 0
`E = E + Bias
`E = 2 + 1023 => 1025
`E = 10000000001
M = 10101000000000000000000...000
63 62 61 52 51
0
Quick conversion - Double Precision
0 10000000001 10101000000000000000000...000
Sign bit(1) `Exponent bits(11) Mantissa / fraction bits(52)
17
Features of floating point numbers (IEEE 754-2008)
● Every floating point numbers has a sign. Every number is either positive or negative.
● There are two representations for zero: positive zero (i.e., +0.0) and negative zero
(i.e., -0.0).
● There are two representations of infinity: positive infinity (+∞ or +inf) and negative
infinity (-∞ or -inf).
● The exponent may be positive or negative, allowing both very large numbers and
very small numbers.
● There is a special representation called “not a number” (“NaN”). This value can
represent a missing value or the result of a undefined operation, such as divide by
zero. In some implementations there are two variations, called “quiet NaN” and
“signaling NaN”.
18
Exponent Bit Pattern for Single and Double Precision
Single Precision
Double Precision
19
Object representation
20
Range for Single and Double Precision
https://youtu.be/A2HflP5sa_0
Digits of accuracy = log10(2$bits(M))
21
Floating point just mimic mathematical arithmetic
● The exact value or result of an operation is not always representable, so the
computed answer is often not mathematically correct.
● Floating point addition is not always associative, due to rounding errors.
That is, (x + y) + z is not always equal to x + (y + z).
● Floating point multiplication is not always associative. That is, (x * y) * z is
not always equal to x * (y * z).
● Floating point multiplication does not always distribute over addition with the
exact same results. That is, x * (y + z) is not always equal to (x * y) + (x * z).
● Floating point addition and multiplication are commutative, like math. For
example, x+y = y+x, so you don’t have to worry about the order of operands
for a single operation.
22
Representations
● Positive zero (+0.0)
● Negative zero (-0.0)
● Positive infinity (+∞ or +inf)
● Negative infinity (-∞ or -inf)
● Not-a-number (NaN)
○ Quiet Nan (qNaN)
○ Signaling Nan (sNaN)
● Normal numbers (or “normalized numbers”)
● Denormalized numbers (or “denormals”)
● Subnormal numbers
23
Positive and Negative Zero (+0, -0)
● 1/+0 yields +∞
● 1/-0 yields –∞
● +0 will normally compare as equal to -0
● -0/-∞ yields +0
● Although +0 and -0 may compare as equal, they may also result in different
outcomes in some computations. This challenges our understanding of the meaning
of “equal”, to say the least.
24
Positive and Negative Infinity (+∞, -∞)
● Positive infinity can be represented as “all the bits in exponent field are set high, all
the bits in fraction field are set to low and the sign bit will define sign”.
31 30 23 22
0
Single precision
63 62 52 51
0
Double precision
0/1 11111111 00000000000000000000000
0/1 11111111111 00000000000000000000000...000
25
● An expression whose result is not possible in Mathematics.
○ 0/0 , ∞/∞ , 0 * ∞ , sqrt(-ive) , log(-ive)
● There are two types of NaN signaling and quiet NaN.
● Signaling NaN will raise an exception if NaN arrived at either operands or it becomes
the result of any operation.
● Quiet NaN will propagate and does not raise an error.
● In case of quite there will be a number that will move forward and that number will
known as canonical NaN and it will be further discussed in next slide.
● RISC-V set the implementation of signaling NaN optional.
● When implementing double precision floating point, and instruction encountered of F-
extension then the result will be stored in the lower 32 bits of the register and the
upper bits will be set high, so that if D-extension’s instruction read the value so it
found it as NaN.
Not a Number - NaN
26
Not a Number - NaN
● It can be represented as all 1’s in exponent field and other than all 0’s in mantissa or
fraction field. (all 0’s represents +∞ or -∞)
31 30 23 22
0
S
63 62 52 51
0
D
● The MSB bit in fraction bits is always set to represent that it is a NaN and following
bit is mask to define it is quiet NaN.
X 11111111 10000000000000000000000
X 11111111111 10000000000000000000000...000
27
Normalized and Denormalized Numbers
● The most of the floating point numbers are normalized number.
● The greater the integer part is, the less space is left for floating part precision.
● All denormalized numbers are very close to zero.
● Denormalized numbers extend on both the positive and negative sides of zero.
● +0.0 and -0.0 are themselves represented as denormalized numbers.
● The largest denormalized number is just less than the smallest positive normal
number.
● Likewise, the most negative denormalized number is just greater than the least
negative normal number.
● It is generally safe to ignore the distinction between normalized and denormalized
numbers when using floating point in your applications.
● Computation on very small values (denormalized numbers) may loose all of your
precision. 28
Subnormal Numbers
● A subnormal number is a nonzero floating-point number with magnitude less than the
magnitude of that format smallest normal number.
● As a result, a subnormal number in a given format fails to use the full precision
available to normal numbers of the same format.
● 0.0 is also a subnormal number.
● Subnormal numbers always have equals to 0.
● The largest subnormal number is:
○ 0.FFFFFE * 2 ^ (-126)
● Smallest non-subnormal number is:
○ 1.0 * 2 ^ (-126)
29
Floating Point Arithmetic
https://www.slideshare.net/prochwani95/06-floating-point/26
30
31
32
33
34
35
36
Hardware Implementation of FPU
with RISC-V Core
37
RISC-V Floating Extensions
● There are 3 Extensions for floating points.
● F- single precision (32-bit)
● D- double precision (64-bit)
● Q- quad precision (128-bit)
● L- decimal floating point (64/128-bits)
❏ Note: RISC-V floating point is complained with IEEE 754-2008
38
Floating Point Register File
● The register file contain 32 number of registers of WIDTH
○ 32 - while implementing F extension or Single precision
○ 64 - while implementing D extension or Double precision
● It has 3 read and 2 write ports because there are few instructions with 3 operands.
● It is a separate register file from integer register file.
● Things to remember while implementing D extension / Double precision:
○ Single precision or F extension is a prerequisite.
○ The single precision instruction will store the data in lower 32 bits of the register,
the upper 32 bits will set to all 1s or if it is NaN than all 1s.
39
Rounding modes
● There are two rounding mode, Dynamic or Static rounding mode.
● The rm field in the instruction will decide the mode.
● The 111 tells that its is Dynamic mode.
● In dynamic mode we choose the rm value from fcsr and in static mode from
instruction.
40
Rounding modes
● Round to nearest: The system chooses the nearer of the two possible outputs. If the
correct answer is exactly halfway between the two, the system chooses the output where
the least significant bit of Frac is zero. This behavior (round-to-even) prevents various
undesirable effects. This is the default mode when an application starts up. It is the only
mode supported by the ordinary floating-point libraries. Hardware floating-point
environments and the enhanced floating-point libraries support all four rounding modes.
● Round up, or round toward plus infinity: The system chooses the larger of the two
possible outputs (that is, the one further from zero if they are positive, and the one closer
to zero if they are negative).
● Round down, or round toward minus infinity: The system chooses the smaller of the
two possible outputs (that is, the one closer to zero if they are positive, and the one
further from zero if they are negative).
● Round toward zero, or chop, or truncate: The system chooses the output that is closer
to zero, in all cases.
41
FCSR (frm+fflags)
● The fcsr register has an address 0x003 and it is a Read/Write register.
● It has the two basic parts, ‘frm’ and ‘fflags’
● Both of the parts can be accessed individually by their respective addresses.
● All the upper bits should be set to zero when READ and WRITE.
42
Exception Flags
● The status flags are used to tell us the status of the operation currently done in fpu.
● The base RISC-V ISA does not support generating a trap on the setting of a floating-
point exception flag.
43
Implementing in Decoder and Controller
● An illegal instructions is raised if the coming instruction is not matching with the
format of any given format.
● If the detection of rounding mode in the instruction is 101 or 110, it will consider as
illegal instruction.
● If the rounding mode is dynamic, then the rm value ‘101-111’ in the fcsr will be
considered illegal instruction.
● Make sure, the following instruction will write back to the integer register file.
○ FCVT.W.S, FCVT.WU.S
○ FMV.X.W
○ FLT.S, FLE.S, FEQ.S
○ FCLASS.S
● The following instructions will read the rs1 from integer register file.
○ FCVT.S.W, FCVT.S.WU
○ FMV.W.X
○ FLW.S, FSW.S
44
FCLASS Instruction
45
Q/A session
Scan Me to access the slides!

More Related Content

What's hot

Quick tutorial on IEEE 754 FLOATING POINT representation
Quick tutorial on IEEE 754 FLOATING POINT representationQuick tutorial on IEEE 754 FLOATING POINT representation
Quick tutorial on IEEE 754 FLOATING POINT representationRitu Ranjan Shrivastwa
 
Topic 1 Data Representation
Topic 1 Data RepresentationTopic 1 Data Representation
Topic 1 Data Representationekul
 
The Intel 8086 microprocessor
The Intel 8086 microprocessorThe Intel 8086 microprocessor
The Intel 8086 microprocessorGeorge Thomas
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
The Future of Operating Systems on RISC-V
The Future of Operating Systems on RISC-VThe Future of Operating Systems on RISC-V
The Future of Operating Systems on RISC-VC4Media
 
Unit v memory & programmable logic devices
Unit v   memory & programmable logic devicesUnit v   memory & programmable logic devices
Unit v memory & programmable logic devicesKanmaniRajamanickam
 
Floating point units
Floating point unitsFloating point units
Floating point unitsdipugovind
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memoryashishgy
 
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...Saikiran Panjala
 
Microcontroller-8051.ppt
Microcontroller-8051.pptMicrocontroller-8051.ppt
Microcontroller-8051.pptDr.YNM
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3Linaro
 

What's hot (20)

Quick tutorial on IEEE 754 FLOATING POINT representation
Quick tutorial on IEEE 754 FLOATING POINT representationQuick tutorial on IEEE 754 FLOATING POINT representation
Quick tutorial on IEEE 754 FLOATING POINT representation
 
Wi-Fi Module
Wi-Fi ModuleWi-Fi Module
Wi-Fi Module
 
Topic 1 Data Representation
Topic 1 Data RepresentationTopic 1 Data Representation
Topic 1 Data Representation
 
Uart
UartUart
Uart
 
The Intel 8086 microprocessor
The Intel 8086 microprocessorThe Intel 8086 microprocessor
The Intel 8086 microprocessor
 
Logic Gate
Logic GateLogic Gate
Logic Gate
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Axi protocol
Axi protocolAxi protocol
Axi protocol
 
Intel IA 64
Intel IA 64Intel IA 64
Intel IA 64
 
Axi protocol
Axi protocolAxi protocol
Axi protocol
 
The Future of Operating Systems on RISC-V
The Future of Operating Systems on RISC-VThe Future of Operating Systems on RISC-V
The Future of Operating Systems on RISC-V
 
Unit v memory & programmable logic devices
Unit v   memory & programmable logic devicesUnit v   memory & programmable logic devices
Unit v memory & programmable logic devices
 
Floating point units
Floating point unitsFloating point units
Floating point units
 
IS-IS vs OSPF
IS-IS vs OSPFIS-IS vs OSPF
IS-IS vs OSPF
 
Ospf
 Ospf Ospf
Ospf
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memory
 
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
 
Microcontroller-8051.ppt
Microcontroller-8051.pptMicrocontroller-8051.ppt
Microcontroller-8051.ppt
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
 
Classification OF Microprocessor
Classification OF MicroprocessorClassification OF Microprocessor
Classification OF Microprocessor
 

Similar to IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs.pptx

Implementation of character translation integer and floating point values
Implementation of character translation integer and floating point valuesImplementation of character translation integer and floating point values
Implementation of character translation integer and floating point valuesغزالة
 
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...jmicro
 
fixed-point-vs-floating-point.ppt
fixed-point-vs-floating-point.pptfixed-point-vs-floating-point.ppt
fixed-point-vs-floating-point.pptRavikumarR77
 
Calculator scientific
Calculator scientificCalculator scientific
Calculator scientificAnil Sharma
 
number system: Floating Point representation.ppt
number system: Floating Point representation.pptnumber system: Floating Point representation.ppt
number system: Floating Point representation.pptNARENDRAKUMARCHAURAS1
 
A floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptxA floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptxNiveditaAcharyya2035
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationQualcomm Research
 
An FPGA Based Floating Point Arithmetic Unit Using Verilog
An FPGA Based Floating Point Arithmetic Unit Using VerilogAn FPGA Based Floating Point Arithmetic Unit Using Verilog
An FPGA Based Floating Point Arithmetic Unit Using VerilogIJMTST Journal
 
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...inventionjournals
 
Beyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer ArithmeticBeyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer Arithmeticinside-BigData.com
 
Introduction to the Arduino
Introduction to the ArduinoIntroduction to the Arduino
Introduction to the ArduinoWingston
 
Survey On Two-Term Dot Product Of Multiplier Using Floating Point
Survey On Two-Term Dot Product Of Multiplier Using Floating PointSurvey On Two-Term Dot Product Of Multiplier Using Floating Point
Survey On Two-Term Dot Product Of Multiplier Using Floating PointIRJET Journal
 

Similar to IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs.pptx (20)

ICT FIRST LECTURE.pptx
ICT FIRST LECTURE.pptxICT FIRST LECTURE.pptx
ICT FIRST LECTURE.pptx
 
Implementation of character translation integer and floating point values
Implementation of character translation integer and floating point valuesImplementation of character translation integer and floating point values
Implementation of character translation integer and floating point values
 
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
 
fixed-point-vs-floating-point.ppt
fixed-point-vs-floating-point.pptfixed-point-vs-floating-point.ppt
fixed-point-vs-floating-point.ppt
 
Calculator scientific
Calculator scientificCalculator scientific
Calculator scientific
 
At36276280
At36276280At36276280
At36276280
 
number system: Floating Point representation.ppt
number system: Floating Point representation.pptnumber system: Floating Point representation.ppt
number system: Floating Point representation.ppt
 
Complement.pdf
Complement.pdfComplement.pdf
Complement.pdf
 
Final
FinalFinal
Final
 
A floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptxA floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptx
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through Quantization
 
An FPGA Based Floating Point Arithmetic Unit Using Verilog
An FPGA Based Floating Point Arithmetic Unit Using VerilogAn FPGA Based Floating Point Arithmetic Unit Using Verilog
An FPGA Based Floating Point Arithmetic Unit Using Verilog
 
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
Implementation and Simulation of Ieee 754 Single-Precision Floating Point Mul...
 
COMPUTER AWARENESS.pptx
COMPUTER AWARENESS.pptxCOMPUTER AWARENESS.pptx
COMPUTER AWARENESS.pptx
 
Beyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer ArithmeticBeyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer Arithmetic
 
Ap32283286
Ap32283286Ap32283286
Ap32283286
 
Introduction to the Arduino
Introduction to the ArduinoIntroduction to the Arduino
Introduction to the Arduino
 
Jz2517611766
Jz2517611766Jz2517611766
Jz2517611766
 
Jz2517611766
Jz2517611766Jz2517611766
Jz2517611766
 
Survey On Two-Term Dot Product Of Multiplier Using Floating Point
Survey On Two-Term Dot Product Of Multiplier Using Floating PointSurvey On Two-Term Dot Product Of Multiplier Using Floating Point
Survey On Two-Term Dot Product Of Multiplier Using Floating Point
 

Recently uploaded

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...gragchanchal546
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 

Recently uploaded (20)

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs.pptx

  • 1. IEEE-754 standard format to handle Floating- Point calculations in RISC-V CPUs General Information and Hardware Implementation Zeeshan Rafique Research Associate at MERL
  • 2. Agenda! ● What is standard and why we should to rely on it? ● What is Floating Point and why we need it? Precision and Accuracy ● General Information about Floating Point ○ Bits encoding for single and double precision ○ Conversion from floating point to IEEE 754 standard ○ Floating point just mimic mathematical arithmetic ○ Features of floating point ○ Range of single and double precision ○ Representation of floating point ● Floating Point Arithmetic examples ● Hardware Implementation of FPU with RISC-V Core ○ RISC-V Floating Extensions ○ Floating Point Register file ○ Rounding Mode ○ Floating Point CSRs (fcsr, frm, fflags) ○ Exception Flags , Implementation in Decoder and Controller 2
  • 3. What is standard and why we should to rely on it? 3 ● What does Standard mean? ○ Something established by authority, custom, or general consent as a model or for reference or benchmark. ● Why do we need a Standard? ○ Standards are needed to assure safety of products, to ensure that products and materials are tailored-made for their purpose, promote the interoperability of products and services, facilitate trade by removing trade barriers, promote common understanding of a product
  • 4. Fixed Notation ● We are accustomed to using a fixed notation where the decimal point is fixed and we know that any numbers to the right of the decimal point are the decimal portion and to the left if the integer part. E.g. 10.75 10 is the integer portion and 0.75 is the decimal portion. 4
  • 5. Floating Point Representation ● The number should be encoded into scientific notation. ● The structure of a floating point (real) number is as follow: 5.5 * 109 Mantissa Base Exponent 5 ● Only the mantissa and the exponent are stored. The base is implied (know already). As it is not stored this save the memory capacity.
  • 6. Why we need floating point? ● Floating point representation makes numerical computation much easier. You could write all your programs using integers or fixed-point representations, but this is tedious and error-prone. ● Many numeric applications need numbers over a huge range. ○ e.g., nanoseconds to centuries ● A programmer do arithmetic operation which can results in real numbers (π). ● In either case, the idea is to represent a real rational number in a way similar to scientific notation. ● For example, the following number is given in scientific notation: ○ 6.022 × 1023 (an approximation to Avogadro’s constant) 6
  • 7. Floating Point Applications ● Digital Signal Processing ● Domain-Specific Accelerators ● Vector Processing Units ● Medical Electronics ● Everywhere where you need more accurate result. * TechInsight.com Apple iPhone XS teardown 7
  • 8. Floating Point Disasters ● Scud Missiles get through, 28 die ○ In 1991, during the 1st Gulf War, patriot missile defense system let a Scud get through, hit a barracks, and kill 28 people. The problem was due to a floating-point error when taking the difference of a converted & scaled integer. (https://medium.com/nerd-for-tech/floating-point- rounding-error-in-computers-6485cc26f5e8) Source: https://slideplayer.com/slide/16330842/ slide 25 8 ● $7B Rocket crashes (Ariane 5) ○ When the first ESA Ariane 5 was launched on June 4, 1996, it lasted only 39 seconds, then the rocket veered off course and self-destructed. An inertial system, produced a floating-point exception while trying to convert a 64-bit floating point number to an integer. Ironically, the same code was used in the Ariane 4, but the larger values were never generated. (https://around.com/ariane.html) ● Intel Ships and Denies Bugs ○ In 1994, Intel shipped its first Pentium processors with a floating-point divide bug. The bug was due to bad look-up tables used to speed up quotient calculations. After months of denials, Intel adopted a no-questions replacement policy, costing $300M. (https://www.intel.com/support/processors/pentium/fdiv/)
  • 9. Precision and Accuracy ● Precision: Maximum number p of significant digits that can be represented in a format. ● Accuracy: How accurately the number is defined in a format. 3 bit precision 8 bit precision lost the accuracy and precision ● Lower the precision, lesser the accuracy. [Not true in all cases] ● In which cases you will get perfect accuracy with very lower precision? 9
  • 10. Precision and Accuracy ● Precision: Maximum number p of significant digits that can be represented in a format. ● Accuracy: How accurately the number is defined in a format. ● Lower the precision, lesser the accuracy. ● In some cases you can still have perfect accuracy with very low precision. ● If we have 4 bits for precision then, ● 5 / 2 = 2.5 , here we have 100% accuracy. ● 10/3 = 3.33333333 , not 100% accuracy. 10
  • 12. IEEE 754-2008 bits encoding Precision type Bit types Half (16-bit) Single (32-bit) Double (64-bit) Quadruple(128-bit) Sign(MSB) 1 1 1 1 Exponent 5 8 11 15 Fraction 10 23 52 112 12
  • 13. IEEE 754-2008 Binary encoding of floating point numbers 31 30 23 22 0 Single Precision (32-bit) 63 62 52 51 0 Double Precision (64-bit) Exponent doping: S= 127, D= 1023 Sign bit (1) `Exponent bits(8) Mantissa / fraction bits(23) Sign bit(1) `Exponent bits(11) Mantissa / fraction bits(52) ? 13
  • 14. Quick conversion Assume that we have a numbers 6.625, convert it according to IEEE 754 standard into ● single precision ● double precision Solution: For single precision: Convert the number into binary. (6.625)10=(?)2 link Conversion (6.625)10=(0110.101)2 6 .625 6 / 2 = 3 -> 0 0.625 x 2 = 1.25 3 / 2 = 1 ->1 0.25 x 2 = 0.5 1 / 2 = 0 -> 1 0.5 x 2 = 1.0 Final value = 110 Final value = 101 LSB LSB 14
  • 15. Quick conversion ● Concept behind conversion ● Let’s say we have 6.625 ● 6 = 22 + 21 + 20 = 4 + 2 + 0 => 1 1 0 ● 0.625 = 2-1 + 2-2 + 2-3 = 0.5 + 0 +0.125 => 1 0 1 ● 6.625 = 110.101 15
  • 16. ● So, we have (6.625)10=(110.101)2 ● Setting format of the number: 110.101 x 20 => 1.10101 x 22 ● When we move decimal point to left the exponent will get incremented (+ive) and if we move decimal point to right the power will get decremented (-ive). +1.10101 x 22 ● The orange is Sign, blue is Mantissa and red one is Exponent. ● For single precision: S = 0 `E = E + Bias `E = 2 + 127 => 129 `E = 10000001 M = 10101000000000000000000 31 30 29 23 22 0 Quick conversion - Single Precision 0 10000001 10101000000000000000000 Sign bit(1) `Exponent bits(8) Mantissa / fraction bits(23) 16
  • 17. ● So, we have (9.625)10=(110.101)2 ● Setting format of the number: 110.101 x 20 => 1.10101 x 22 ● When we move point left the power will get incremented (+ive) and if we move point right the power will get decremented (-ive). +1.10101 x 22 ● The orange is Sign, blue is Mantissa and red one is Exponent. ● For single precision: S = 0 `E = E + Bias `E = 2 + 1023 => 1025 `E = 10000000001 M = 10101000000000000000000...000 63 62 61 52 51 0 Quick conversion - Double Precision 0 10000000001 10101000000000000000000...000 Sign bit(1) `Exponent bits(11) Mantissa / fraction bits(52) 17
  • 18. Features of floating point numbers (IEEE 754-2008) ● Every floating point numbers has a sign. Every number is either positive or negative. ● There are two representations for zero: positive zero (i.e., +0.0) and negative zero (i.e., -0.0). ● There are two representations of infinity: positive infinity (+∞ or +inf) and negative infinity (-∞ or -inf). ● The exponent may be positive or negative, allowing both very large numbers and very small numbers. ● There is a special representation called “not a number” (“NaN”). This value can represent a missing value or the result of a undefined operation, such as divide by zero. In some implementations there are two variations, called “quiet NaN” and “signaling NaN”. 18
  • 19. Exponent Bit Pattern for Single and Double Precision Single Precision Double Precision 19
  • 21. Range for Single and Double Precision https://youtu.be/A2HflP5sa_0 Digits of accuracy = log10(2$bits(M)) 21
  • 22. Floating point just mimic mathematical arithmetic ● The exact value or result of an operation is not always representable, so the computed answer is often not mathematically correct. ● Floating point addition is not always associative, due to rounding errors. That is, (x + y) + z is not always equal to x + (y + z). ● Floating point multiplication is not always associative. That is, (x * y) * z is not always equal to x * (y * z). ● Floating point multiplication does not always distribute over addition with the exact same results. That is, x * (y + z) is not always equal to (x * y) + (x * z). ● Floating point addition and multiplication are commutative, like math. For example, x+y = y+x, so you don’t have to worry about the order of operands for a single operation. 22
  • 23. Representations ● Positive zero (+0.0) ● Negative zero (-0.0) ● Positive infinity (+∞ or +inf) ● Negative infinity (-∞ or -inf) ● Not-a-number (NaN) ○ Quiet Nan (qNaN) ○ Signaling Nan (sNaN) ● Normal numbers (or “normalized numbers”) ● Denormalized numbers (or “denormals”) ● Subnormal numbers 23
  • 24. Positive and Negative Zero (+0, -0) ● 1/+0 yields +∞ ● 1/-0 yields –∞ ● +0 will normally compare as equal to -0 ● -0/-∞ yields +0 ● Although +0 and -0 may compare as equal, they may also result in different outcomes in some computations. This challenges our understanding of the meaning of “equal”, to say the least. 24
  • 25. Positive and Negative Infinity (+∞, -∞) ● Positive infinity can be represented as “all the bits in exponent field are set high, all the bits in fraction field are set to low and the sign bit will define sign”. 31 30 23 22 0 Single precision 63 62 52 51 0 Double precision 0/1 11111111 00000000000000000000000 0/1 11111111111 00000000000000000000000...000 25
  • 26. ● An expression whose result is not possible in Mathematics. ○ 0/0 , ∞/∞ , 0 * ∞ , sqrt(-ive) , log(-ive) ● There are two types of NaN signaling and quiet NaN. ● Signaling NaN will raise an exception if NaN arrived at either operands or it becomes the result of any operation. ● Quiet NaN will propagate and does not raise an error. ● In case of quite there will be a number that will move forward and that number will known as canonical NaN and it will be further discussed in next slide. ● RISC-V set the implementation of signaling NaN optional. ● When implementing double precision floating point, and instruction encountered of F- extension then the result will be stored in the lower 32 bits of the register and the upper bits will be set high, so that if D-extension’s instruction read the value so it found it as NaN. Not a Number - NaN 26
  • 27. Not a Number - NaN ● It can be represented as all 1’s in exponent field and other than all 0’s in mantissa or fraction field. (all 0’s represents +∞ or -∞) 31 30 23 22 0 S 63 62 52 51 0 D ● The MSB bit in fraction bits is always set to represent that it is a NaN and following bit is mask to define it is quiet NaN. X 11111111 10000000000000000000000 X 11111111111 10000000000000000000000...000 27
  • 28. Normalized and Denormalized Numbers ● The most of the floating point numbers are normalized number. ● The greater the integer part is, the less space is left for floating part precision. ● All denormalized numbers are very close to zero. ● Denormalized numbers extend on both the positive and negative sides of zero. ● +0.0 and -0.0 are themselves represented as denormalized numbers. ● The largest denormalized number is just less than the smallest positive normal number. ● Likewise, the most negative denormalized number is just greater than the least negative normal number. ● It is generally safe to ignore the distinction between normalized and denormalized numbers when using floating point in your applications. ● Computation on very small values (denormalized numbers) may loose all of your precision. 28
  • 29. Subnormal Numbers ● A subnormal number is a nonzero floating-point number with magnitude less than the magnitude of that format smallest normal number. ● As a result, a subnormal number in a given format fails to use the full precision available to normal numbers of the same format. ● 0.0 is also a subnormal number. ● Subnormal numbers always have equals to 0. ● The largest subnormal number is: ○ 0.FFFFFE * 2 ^ (-126) ● Smallest non-subnormal number is: ○ 1.0 * 2 ^ (-126) 29
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. Hardware Implementation of FPU with RISC-V Core 37
  • 38. RISC-V Floating Extensions ● There are 3 Extensions for floating points. ● F- single precision (32-bit) ● D- double precision (64-bit) ● Q- quad precision (128-bit) ● L- decimal floating point (64/128-bits) ❏ Note: RISC-V floating point is complained with IEEE 754-2008 38
  • 39. Floating Point Register File ● The register file contain 32 number of registers of WIDTH ○ 32 - while implementing F extension or Single precision ○ 64 - while implementing D extension or Double precision ● It has 3 read and 2 write ports because there are few instructions with 3 operands. ● It is a separate register file from integer register file. ● Things to remember while implementing D extension / Double precision: ○ Single precision or F extension is a prerequisite. ○ The single precision instruction will store the data in lower 32 bits of the register, the upper 32 bits will set to all 1s or if it is NaN than all 1s. 39
  • 40. Rounding modes ● There are two rounding mode, Dynamic or Static rounding mode. ● The rm field in the instruction will decide the mode. ● The 111 tells that its is Dynamic mode. ● In dynamic mode we choose the rm value from fcsr and in static mode from instruction. 40
  • 41. Rounding modes ● Round to nearest: The system chooses the nearer of the two possible outputs. If the correct answer is exactly halfway between the two, the system chooses the output where the least significant bit of Frac is zero. This behavior (round-to-even) prevents various undesirable effects. This is the default mode when an application starts up. It is the only mode supported by the ordinary floating-point libraries. Hardware floating-point environments and the enhanced floating-point libraries support all four rounding modes. ● Round up, or round toward plus infinity: The system chooses the larger of the two possible outputs (that is, the one further from zero if they are positive, and the one closer to zero if they are negative). ● Round down, or round toward minus infinity: The system chooses the smaller of the two possible outputs (that is, the one closer to zero if they are positive, and the one further from zero if they are negative). ● Round toward zero, or chop, or truncate: The system chooses the output that is closer to zero, in all cases. 41
  • 42. FCSR (frm+fflags) ● The fcsr register has an address 0x003 and it is a Read/Write register. ● It has the two basic parts, ‘frm’ and ‘fflags’ ● Both of the parts can be accessed individually by their respective addresses. ● All the upper bits should be set to zero when READ and WRITE. 42
  • 43. Exception Flags ● The status flags are used to tell us the status of the operation currently done in fpu. ● The base RISC-V ISA does not support generating a trap on the setting of a floating- point exception flag. 43
  • 44. Implementing in Decoder and Controller ● An illegal instructions is raised if the coming instruction is not matching with the format of any given format. ● If the detection of rounding mode in the instruction is 101 or 110, it will consider as illegal instruction. ● If the rounding mode is dynamic, then the rm value ‘101-111’ in the fcsr will be considered illegal instruction. ● Make sure, the following instruction will write back to the integer register file. ○ FCVT.W.S, FCVT.WU.S ○ FMV.X.W ○ FLT.S, FLE.S, FEQ.S ○ FCLASS.S ● The following instructions will read the rs1 from integer register file. ○ FCVT.S.W, FCVT.S.WU ○ FMV.W.X ○ FLW.S, FSW.S 44
  • 46. Q/A session Scan Me to access the slides!