SlideShare a Scribd company logo
1 of 34
15-213
       “The course that gives CMU its Zip!”

                      Floating
                       Point
                    Sept 6, 2006
              Topics
                   IEEE Floating Point Standard
                   Rounding
                   Floating Point Operations
                   Mathematical properties



class03.ppt                                        15-213, F’06
Floating Point Puzzles
         For each of the following C expressions, either:
          Argue that it is true for all argument values
          Explain why not true


                                   • x == (int)(float) x
                                   • x == (int)(double) x
  int x = …;
                                   • f == (float)(double) f
  float f = …;
                                   • d == (float) d
  double d = …;
                                   • f == -(-f);
                                   • 2/3 == 2/3.0
  Assume neither
  d nor f is NaN                   • d < 0.0         ⇒     ((d*2) < 0.0)
                                   • d > f           ⇒     -f > -d
                                   • d * d >= 0.0
                                   • (d+f)-d == f

–2–                                                                  15-213, F’06
IEEE Floating Point
IEEE Standard 754
         Established in 1985 as uniform standard for floating
          point arithmetic
            Before that, many idiosyncratic formats
         Supported by all major CPUs

Driven by Numerical Concerns
         Nice standards for rounding, overflow, underflow
         Hard to make go fast
            Numerical analysts predominated over hardware types in
             defining standard




–3–                                                          15-213, F’06
Fractional Binary Numbers
                                            2i
                                            2i–1

                                           4
                        •••                2
                                           1
              bi bi–1   •••   b2 b1    b0 . b–1 b–2 b–3   •••   b–j
                                      1/2
                                      1/4                 •••
                                      1/8

                                      2–j

Representation
         Bits to right of “binary point” represent fractional powers
          of 2                             i
         Represents rational number: ∑ bk ⋅2 k
                                     k =− j
–4–                                                                   15-213, F’06
Frac. Binary Number Examples
Value               Representation
  5-3/4             101.112
  2-7/8              10.1112
  63/64               0.1111112

Observations
     Divide by 2 by shifting right
     Multiply by 2 by shifting left
     Numbers of form 0.111111…2 just below 1.0
      1/2 + 1/4 + 1/8 + … + 1/2 i + … → 1.0
      Use notation 1.0 – ε




–5–                                               15-213, F’06
Representable Numbers
Limitation
     Can only exactly represent numbers of the form x/2k
     Other numbers have repeating bit representations
Value              Representation
  1/3              0.0101010101[01]…2
  1/5              0.001100110011[0011]…2
  1/10             0.0001100110011[0011]…2




–6–                                                     15-213, F’06
Floating Point Representation
Numerical Form
     –1s M 2E
      Sign bit s determines whether number is negative or
       positive
      Significand M normally a fractional value in range [1.0,2.0).
      Exponent E weights value by power of two

Encoding
     s            exp                        frac


     MSB is sign bit
     exp field encodes E
     frac field encodes M




–7–                                                          15-213, F’06
Floating Point Precisions
Encoding
         s         exp                     frac
     MSB is sign bit
     exp field encodes E
     frac field encodes M
Sizes
     Single precision: 8 exp bits, 23 frac bits
      32 bits total
     Double precision: 11 exp bits, 52 frac bits
      64 bits total
     Extended precision: 15 exp bits, 63 frac bits
      Only found in Intel-compatible machines
      Stored in 80 bits
         » 1 bit wasted

–8–                                                   15-213, F’06
“Normalized” Numeric Values
Condition
      exp ≠ 000…0 and exp ≠ 111…1
Exponent coded as biased value
      E = Exp – Bias
       Exp : unsigned value denoted by exp
       Bias : Bias value
         » Single precision: 127 ( Exp: 1…254, E: -126…127)
         » Double precision: 1023 ( Exp: 1…2046, E: -1022…1023)
         » in general: Bias = 2 e-1 - 1, where e is number of exponent
           bits

Significand coded with implied leading 1
      M =   1.xxx…x2
        xxx…x: bits of frac
       Minimum when 000…0 (M = 1.0)
       Maximum when 111…1 (M = 2.0 – ε)
       Get extra leading bit for “free”
–9–                                                             15-213, F’06
Normalized Encoding
   Example
   Value
         Float F = 15213.0;
          1521310 = 111011011011012 = 1.11011011011012 X 213

   Significand
         M      =       1.11011011011012
         frac =          110110110110100000000002
   Exponent
         E    =         13
         Bias =         127
         Exp =          140 =    100011002

             Floating Point Representation:
             Hex:        4     6    6    D    B    4    0    0
              Binary:        0100 0110 0110 1101 1011 0100 0000
             0000
             140:       100 0110 0
– 10 –       15213:                        1110 1101 1011 01      15-213, F’06
Denormalized Values
 Condition
             exp = 000…0

 Value
            Exponent value E = –Bias + 1
            Significand value M =    0.xxx…x2
               xxx…x: bits of frac

 Cases
            exp = 000…0, frac = 000…0
               Represents value 0
               Note that have distinct values +0 and –0
            exp = 000…0, frac ≠ 000…0
               Numbers very close to 0.0
               Lose precision as get smaller

– 11 –
               “Gradual underflow”                        15-213, F’06
Special Values
 Condition
             exp = 111…1

 Cases
            exp = 111…1, frac = 000…0
               Represents value ∞ (infinity)
               Operation that overflows
               Both positive and negative
               E.g., 1.0/0.0 = −1.0/−0.0 = + ∞,   1.0/−0.0 = − ∞
            exp = 111…1, frac ≠ 000…0
               Not-a-Number (NaN)
               Represents case when no numeric value can be
                determined
               E.g., sqrt(–1), ∞ − ∞, ∞ ∗ 0


– 12 –                                                              15-213, F’06
Summary of Floating Point
    Real Number Encodings


         −∞   -Normalized   -Denorm         +Denorm   +Normalized      +∞

 NaN                                                                       NaN
                                      −0   +0




– 13 –                                                              15-213, F’06
Tiny Floating Point Example
 8-bit Floating Point Representation
            the sign bit is in the most significant bit.
            the next four bits are the exponent, with a bias of 7.
            the last three bits are the frac
  Same General Form as IEEE Format
            normalized, denormalized
            representation of 0, NaN, infinity
                       7 6            3 2            0
                      s       exp           frac




– 14 –                                                         15-213, F’06
Values Related to the
   Exponent
         Exp   exp    E     2E

         0     0000   -6    1/64   (denorms)
         1     0001   -6    1/64
         2     0010   -5    1/32
         3     0011   -4    1/16
         4     0100   -3    1/8
         5     0101   -2    1/4
         6     0110   -1    1/2
         7     0111    0    1
         8     1000   +1    2
         9     1001   +2    4
         10    1010   +3    8
         11    1011   +4    16
         12    1100   +5    32
         13    1101   +6    64
         14    1110   +7    128
         15    1111   n/a          (inf, NaN)

– 15 –                                          15-213, F’06
Dynamic Range
            s exp       frac   E     Value

             0   0000 000      -6    0
             0   0000 001      -6    1/8*1/64 = 1/512       closest to zero
Denormalized 0   0000 010      -6    2/8*1/64 = 2/512
  numbers    …
             0   0000   110    -6    6/8*1/64   =   6/512
             0   0000   111    -6    7/8*1/64   =   7/512   largest denorm
             0   0001   000    -6    8/8*1/64   =   8/512   smallest norm
             0   0001   001    -6    9/8*1/64   =   9/512
             …
             0   0110   110    -1    14/8*1/2   =   14/16
             0   0110   111    -1    15/8*1/2   =   15/16   closest to 1 below
 Normalized 0    0111   000    0     8/8*1      =   1
  numbers                                                   closest to 1 above
             0   0111   001    0     9/8*1      =   9/8
             0   0111   010    0     10/8*1     =   10/8
             …
             0   1110 110      7     14/8*128 = 224
             0   1110 111      7     15/8*128 = 240         largest norm
             0   1111 000      n/a   inf
 – 16 –                                                          15-213, F’06
Distribution of Values
 6-bit IEEE-like format
            e = 3 exponent bits
            f = 2 fraction bits
            Bias is 3


 Notice how the distribution gets denser toward
   zero.

   -15            -10        -5          0          5          10          15
                          Denormalized   Normalized Infinity




– 17 –                                                              15-213, F’06
Distribution of Values
  (close-up view)
 6-bit IEEE-like format
            e = 3 exponent bits
            f = 2 fraction bits
            Bias is 3



    -1                   -0.5              0                 0.5           1
                            Denormalized   Normalized   Infinity




– 18 –                                                             15-213, F’06
Interesting Numbers
 Description                    exp   frac       Numeric Value
 Zero                           00…00 00…00      0.0
 Smallest Pos. Denorm.          00…00 00…01      2– {23,52} X 2– {126,1022}
        Single ≈ 1.4 X 10–45
        Double ≈ 4.9 X 10–324
 Largest Denormalized           00…00 11…11      (1.0 – ε) X 2– {126,1022}
        Single ≈ 1.18 X 10–38
        Double ≈ 2.2 X 10–308
 Smallest Pos. Normalized 00…01 00…00            1.0 X 2– {126,1022}
        Just larger than largest denormalized
 One                            01…11 00…00      1.0
 Largest Normalized             11…10 11…11      (2.0 – ε) X 2{127,1023}
        Single ≈ 3.4 X 1038
        Double ≈ 1.8 X 10308
– 19 –                                                                        15-213, F’06
Special Properties of
  Encoding
 FP Zero Same as Integer Zero
            All bits = 0

 Can (Almost) Use Unsigned Integer Comparison
            Must first compare sign bits
            Must consider -0 = 0
            NaNs problematic
               Will be greater than any other values
               What should comparison yield?
            Otherwise OK
               Denorm vs. normalized
               Normalized vs. infinity




– 20 –                                                  15-213, F’06
Floating Point Operations
 Conceptual View
        First compute exact result
        Make it fit into desired precision
         Possibly overflow if exponent too large
         Possibly round to fit into frac

 Rounding Modes (illustrate with $ rounding)
                                 $1.40   $1.60   $1.50   $2.50   –$1.50
        Zero                    $1      $1      $1      $2      –$1
        Round down (-∞)         $1      $1      $1      $2      –$2
        Round up (+∞)           $2      $2      $2      $3      –$1
        Nearest Even    (default)       $1      $2      $2      $2–$2
Note:
1. Round down: rounded result is close to but no greater than true result.
2. Round up: rounded result is close to but no less than true result.
– 21 –                                                             15-213, F’06
Closer Look at Round-To-
   Even
 Default Rounding M ode
        Hard to get any other kind without dropping into
         assembly
        All others are statistically biased
         Sum of set of positive numbers will consistently be over- or
           under- estimated

 Applying to Other Decimal Places / Bit Positions
        When exactly halfway between two possible values
         Round so that least significant digit is even
        E.g., round to nearest hundredth
         1.2349999   1.23     (Less than half way)
         1.2350001   1.24     (Greater than half way)
         1.2350000   1.24     (Half way—round up)
         1.2450000   1.24     (Half way—round down)
– 22 –                                                         15-213, F’06
Rounding Binary Numbers
 Binary Fractional Numbers
        “Even” when least significant bit is 0
        Half way when bits to right of rounding position = 100… 2

 Examples
        Round to nearest 1/4 (2 bits right of binary point)
     Value      Binary     Rounded Action            Rounded Value
     2 3/32     10.000112 10.002      (<1/2—down) 2
     2 3/16     10.001102 10.012      (>1/2—up)      2 1/4
     2 7/8      10.111002 11.002      (1/2—up)       3
     2 5/8      10.101002 10.102      (1/2—down)     2 1/2



– 23 –                                                         15-213, F’06
FP Multiplication
 Operands
                               *
         (–1)s1 M1 2E1               (–1)s2 M2 2E2

 Exact Result
         (–1)s M 2E
            Sign s: s1 ^ s2
            Significand M: M1 * M2
            Exponent E:       E1 + E2

 Fixing
            If M ≥ 2, shift M right, increment E
            If E out of range, overflow
            Round M to fit frac precision

 Implementation
            Biggest chore is multiplying significands
– 24 –                                                   15-213, F’06
FP Addition
 Operands
         (–1)s1 M1 2E1                                                E1–E2

         (–1)s2 M2 2E2                         (–1)s1 M1
            Assume E1 > E2
                                       +                             (–1)s2 M2
 Exact Result
         (–1)s M 2E                                        (–1)s M
            Sign s, significand M:
               Result of signed align & add
            Exponent E:     E1

 Fixing
            If M ≥ 2, shift M right, increment E
            if M < 1, shift M left k positions, decrement E by k
            Overflow if E out of range
– 25 –
            Round M to fit frac precision                                 15-213, F’06
Mathematical Properties of FP
Add
 Compare to those of Abelian Group
        Closed under addition?              YES
         But may generate infinity or NaN
        Commutative?                        YES
        Associative?                        NO
         Overflow and inexactness of rounding
        0 is additive identity?             YES
        Every element has additive inverse ALMOST
         Except for infinities & NaNs

 Monotonicity
        a ≥ b ⇒ a+c ≥ b+c?                  ALMOST
         Except for infinities & NaNs



– 26 –                                                15-213, F’06
Math. Properties of FP Mult
 Compare to Commutative Ring
        Closed under multiplication?            YES
         But may generate infinity or NaN
        Multiplication Commutative?             YES
        Multiplication is Associative?          NO
         Possibility of overflow, inexactness of rounding
        1 is multiplicative identity?           YES
        Multiplication distributes over addition?     NO
         Possibility of overflow, inexactness of rounding

 Monotonicity
        a ≥ b & c ≥ 0 ⇒ a *c ≥ b *c?            ALMOST
         Except for infinities & NaNs



– 27 –                                                       15-213, F’06
Creating Floating Point Number
 Steps                                   7 6          3 2                    0
            Normalize to have leading 1s      exp            frac
            Round to fit within fraction
            Postnormalize to deal with effects of rounding

 Case Study
            Convert 8-bit unsigned numbers to tiny floating point
             format
            Example Numbers
             128     10000000
             15      00001101
             33      00010001
             35      00010011
             138     10001010
– 28 –       63      00111111                                 15-213, F’06
Normalize
                     7 6           3 2            0
                     s      exp          frac

 Requirement
            Set binary point so that numbers of form 1.xxxxx
            Adjust all to have leading one
              Decrement exponent as shift left
             Value   Binary        Fraction       Exponent
             128     10000000      1.0000000      7
             15      00001101      1.1010000      3
             17      00010001      1.0001000      5
             19      00010011      1.0011000      5
             138     10001010      1.0001010      7
             63      00111111      1.1111100      5
– 29 –                                                       15-213, F’06
Rounding
                                    1.BBGRXXX
 Guard bit: LSB of result
                                                  Sticky bit: OR of remaining bits
Round bit: 1    st
                     bit removed

  Round up conditions
             Round = 1, Sticky = 1  > 0.5
             Guard = 1, Round = 1, Sticky = 0  Round to even
              Value      Fraction     GRS     Incr? Rounded
              128        1.0000000    000     N        1.000
              15         1.1010000    100     N        1.101
              17         1.0001000    010     N        1.000
              19         1.0011000    110     Y        1.010
              138        1.0001010    111     Y        1.001
              63         1.1111100    111     Y       10.000
 – 30 –                                                              15-213, F’06
Postnormalize
  Issue
            Rounding may have caused overflow
            Handle by shifting right once & incrementing exponent
             Value   Rounded      Exp    Adjusted   Result
             128      1.000       7                 128
             15       1.101       3                 15
             17       1.000       4                 16
             19       1.010       4                 20
             138      1.001       7                 134
             63      10.000       5      1.000/6    64




– 31 –                                                       15-213, F’06
Floating Point in C
 C Guarantees Two Levels
         float       single precision
         double      double precision

 Conversions
            Casting between int, float, and double changes
             numeric values
            Double or float to int
               Truncates fractional part
               Like rounding toward zero
               Not defined when out of range or NaN
                 » Generally sets to TMin
            int to double
               Exact conversion, as long as int has ≤ 53 bit word size
            int to float
               Will round according to rounding mode
– 32 –                                                            15-213, F’06
Curious Excel Behavior



                         Number      Subtract 16 Subtract .3 Subtract .01
    Default Format             16.31          0.31        0.01 -1.2681E-15
    Currency Format          $16.31        $0.31       $0.01        ($0.00)



            Spreadsheets use floating point for all computations
            Some imprecision for decimal arithmetic
            Can yield nonintuitive results to an accountant!



– 33 –                                                             15-213, F’06
Summary
 IEEE Floating Point Has Clear M athematical
   Properties
            Represents numbers of form M X 2 E
            Can reason about operations independent of
             implementation
               As if computed with perfect precision and then rounded
            Not the same as real arithmetic
               Violates associativity/distributivity
               Makes life difficult for compilers & serious numerical
                applications programmers




– 34 –                                                              15-213, F’06

More Related Content

What's hot

Floating point representation
Floating point representationFloating point representation
Floating point representationmissstevenson01
 
Integer Representation
Integer RepresentationInteger Representation
Integer Representationgavhays
 
Digital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systemsDigital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systemsImran Waris
 
data representation
 data representation data representation
data representationHaroon_007
 
Chapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbersChapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbersMohammad Bashartullah
 
Floating point presentation
Floating point presentationFloating point presentation
Floating point presentationSnehalataAgasti
 
Number system and codes
Number system and codesNumber system and codes
Number system and codesAbhiraj Bohra
 
Logic Circuits Design - "Chapter 1: Digital Systems and Information"
Logic Circuits Design - "Chapter 1: Digital Systems and Information"Logic Circuits Design - "Chapter 1: Digital Systems and Information"
Logic Circuits Design - "Chapter 1: Digital Systems and Information"Ra'Fat Al-Msie'deen
 
Number system
Number systemNumber system
Number systemaviban
 
Data types - things you have to know!
Data types - things you have to know!Data types - things you have to know!
Data types - things you have to know!Karol Sobiesiak
 
Lesson plan on data representation
Lesson plan on data representationLesson plan on data representation
Lesson plan on data representationPooja Tripathi
 
Cmp104 lec 2 number system
Cmp104 lec 2 number systemCmp104 lec 2 number system
Cmp104 lec 2 number systemkapil078
 
2.1 data represent on cpu
2.1 data represent on cpu2.1 data represent on cpu
2.1 data represent on cpuWan Afirah
 
CBNST PPT, Floating point arithmetic,Normalization
CBNST PPT, Floating point arithmetic,NormalizationCBNST PPT, Floating point arithmetic,Normalization
CBNST PPT, Floating point arithmetic,NormalizationAmberSinghal1
 
01.number systems
01.number systems01.number systems
01.number systemsrasha3
 

What's hot (20)

Floating point representation
Floating point representationFloating point representation
Floating point representation
 
Integer Representation
Integer RepresentationInteger Representation
Integer Representation
 
Digital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systemsDigital and Logic Design Chapter 1 binary_systems
Digital and Logic Design Chapter 1 binary_systems
 
data representation
 data representation data representation
data representation
 
Slide03 Number System and Operations Part 1
Slide03 Number System and Operations Part 1Slide03 Number System and Operations Part 1
Slide03 Number System and Operations Part 1
 
Chapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbersChapter 1 digital systems and binary numbers
Chapter 1 digital systems and binary numbers
 
Floating point presentation
Floating point presentationFloating point presentation
Floating point presentation
 
Number system and codes
Number system and codesNumber system and codes
Number system and codes
 
Logic Circuits Design - "Chapter 1: Digital Systems and Information"
Logic Circuits Design - "Chapter 1: Digital Systems and Information"Logic Circuits Design - "Chapter 1: Digital Systems and Information"
Logic Circuits Design - "Chapter 1: Digital Systems and Information"
 
Class10
Class10Class10
Class10
 
Number system
Number systemNumber system
Number system
 
Number system
Number systemNumber system
Number system
 
Data types - things you have to know!
Data types - things you have to know!Data types - things you have to know!
Data types - things you have to know!
 
Number systems tutorial
Number systems tutorialNumber systems tutorial
Number systems tutorial
 
Lesson plan on data representation
Lesson plan on data representationLesson plan on data representation
Lesson plan on data representation
 
Cmp104 lec 2 number system
Cmp104 lec 2 number systemCmp104 lec 2 number system
Cmp104 lec 2 number system
 
Bolum1cozumler
Bolum1cozumlerBolum1cozumler
Bolum1cozumler
 
2.1 data represent on cpu
2.1 data represent on cpu2.1 data represent on cpu
2.1 data represent on cpu
 
CBNST PPT, Floating point arithmetic,Normalization
CBNST PPT, Floating point arithmetic,NormalizationCBNST PPT, Floating point arithmetic,Normalization
CBNST PPT, Floating point arithmetic,Normalization
 
01.number systems
01.number systems01.number systems
01.number systems
 

Viewers also liked

TAO DAYS - Challenges of Modern Computer Based Assessment
TAO DAYS - Challenges of Modern Computer Based AssessmentTAO DAYS - Challenges of Modern Computer Based Assessment
TAO DAYS - Challenges of Modern Computer Based AssessmentOpen Assessment Technologies
 
Fixed-point arithmetic
Fixed-point arithmeticFixed-point arithmetic
Fixed-point arithmeticDavid Bařina
 
Floating point units
Floating point unitsFloating point units
Floating point unitsdipugovind
 
My presentation on 'computer hardware component' {hardware}
My presentation on 'computer hardware component' {hardware}My presentation on 'computer hardware component' {hardware}
My presentation on 'computer hardware component' {hardware}Rahul Kumar
 
Computer hardware component. ppt
Computer hardware component. pptComputer hardware component. ppt
Computer hardware component. pptNaveen Sihag
 
Generations of computer
Generations of computerGenerations of computer
Generations of computerJatin Jindal
 

Viewers also liked (8)

Ieee+floating
Ieee+floatingIeee+floating
Ieee+floating
 
TAO DAYS - Challenges of Modern Computer Based Assessment
TAO DAYS - Challenges of Modern Computer Based AssessmentTAO DAYS - Challenges of Modern Computer Based Assessment
TAO DAYS - Challenges of Modern Computer Based Assessment
 
Fixed-point arithmetic
Fixed-point arithmeticFixed-point arithmetic
Fixed-point arithmetic
 
Floating point units
Floating point unitsFloating point units
Floating point units
 
My presentation on 'computer hardware component' {hardware}
My presentation on 'computer hardware component' {hardware}My presentation on 'computer hardware component' {hardware}
My presentation on 'computer hardware component' {hardware}
 
Computer hardware and its components
Computer hardware and its componentsComputer hardware and its components
Computer hardware and its components
 
Computer hardware component. ppt
Computer hardware component. pptComputer hardware component. ppt
Computer hardware component. ppt
 
Generations of computer
Generations of computerGenerations of computer
Generations of computer
 

Similar to Class03

12IRGeneration.pdf
12IRGeneration.pdf12IRGeneration.pdf
12IRGeneration.pdfSHUJEHASSAN
 
Floating_point_representation.pdf
Floating_point_representation.pdfFloating_point_representation.pdf
Floating_point_representation.pdfRameshK531901
 
Floating point representation and arithmetic
Floating point representation and arithmeticFloating point representation and arithmetic
Floating point representation and arithmeticAvirajKaushik
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.pptssuser52a19e
 
Review on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and HexadecimalReview on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and HexadecimalUtkirjonUbaydullaev1
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.pptRAJKUMARP63
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.pptRabiaAsif31
 
Implementation of character translation integer and floating point values
Implementation of character translation integer and floating point valuesImplementation of character translation integer and floating point values
Implementation of character translation integer and floating point valuesغزالة
 
L12-FloatingPoint.ppt
L12-FloatingPoint.pptL12-FloatingPoint.ppt
L12-FloatingPoint.pptDevcond
 
IEEE floating point representation
 IEEE floating point representation IEEE floating point representation
IEEE floating point representationMaskurAlShalSabil
 
Oth1
Oth1Oth1
Oth1b137
 

Similar to Class03 (20)

Class10
Class10Class10
Class10
 
12IRGeneration.pdf
12IRGeneration.pdf12IRGeneration.pdf
12IRGeneration.pdf
 
Floating_point_representation.pdf
Floating_point_representation.pdfFloating_point_representation.pdf
Floating_point_representation.pdf
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
Floating point representation and arithmetic
Floating point representation and arithmeticFloating point representation and arithmetic
Floating point representation and arithmetic
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
Review on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and HexadecimalReview on Number Systems: Decimal, Binary, and Hexadecimal
Review on Number Systems: Decimal, Binary, and Hexadecimal
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
binary-numbers.ppt
binary-numbers.pptbinary-numbers.ppt
binary-numbers.ppt
 
ch3a-binary-numbers.ppt
ch3a-binary-numbers.pptch3a-binary-numbers.ppt
ch3a-binary-numbers.ppt
 
08. Numeral Systems
08. Numeral Systems08. Numeral Systems
08. Numeral Systems
 
Implementation of character translation integer and floating point values
Implementation of character translation integer and floating point valuesImplementation of character translation integer and floating point values
Implementation of character translation integer and floating point values
 
Number system
Number systemNumber system
Number system
 
L12-FloatingPoint.ppt
L12-FloatingPoint.pptL12-FloatingPoint.ppt
L12-FloatingPoint.ppt
 
IEEE floating point representation
 IEEE floating point representation IEEE floating point representation
IEEE floating point representation
 
Sistem bilangan
Sistem bilanganSistem bilangan
Sistem bilangan
 
Oth1
Oth1Oth1
Oth1
 
09 Arithmetic
09  Arithmetic09  Arithmetic
09 Arithmetic
 

Recently uploaded

Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 

Recently uploaded (20)

Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 

Class03

  • 1. 15-213 “The course that gives CMU its Zip!” Floating Point Sept 6, 2006 Topics  IEEE Floating Point Standard  Rounding  Floating Point Operations  Mathematical properties class03.ppt 15-213, F’06
  • 2. Floating Point Puzzles  For each of the following C expressions, either: Argue that it is true for all argument values Explain why not true • x == (int)(float) x • x == (int)(double) x int x = …; • f == (float)(double) f float f = …; • d == (float) d double d = …; • f == -(-f); • 2/3 == 2/3.0 Assume neither d nor f is NaN • d < 0.0 ⇒ ((d*2) < 0.0) • d > f ⇒ -f > -d • d * d >= 0.0 • (d+f)-d == f –2– 15-213, F’06
  • 3. IEEE Floating Point IEEE Standard 754  Established in 1985 as uniform standard for floating point arithmetic  Before that, many idiosyncratic formats  Supported by all major CPUs Driven by Numerical Concerns  Nice standards for rounding, overflow, underflow  Hard to make go fast  Numerical analysts predominated over hardware types in defining standard –3– 15-213, F’06
  • 4. Fractional Binary Numbers 2i 2i–1 4 ••• 2 1 bi bi–1 ••• b2 b1 b0 . b–1 b–2 b–3 ••• b–j 1/2 1/4 ••• 1/8 2–j Representation  Bits to right of “binary point” represent fractional powers of 2 i  Represents rational number: ∑ bk ⋅2 k k =− j –4– 15-213, F’06
  • 5. Frac. Binary Number Examples Value Representation 5-3/4 101.112 2-7/8 10.1112 63/64 0.1111112 Observations  Divide by 2 by shifting right  Multiply by 2 by shifting left  Numbers of form 0.111111…2 just below 1.0 1/2 + 1/4 + 1/8 + … + 1/2 i + … → 1.0 Use notation 1.0 – ε –5– 15-213, F’06
  • 6. Representable Numbers Limitation  Can only exactly represent numbers of the form x/2k  Other numbers have repeating bit representations Value Representation 1/3 0.0101010101[01]…2 1/5 0.001100110011[0011]…2 1/10 0.0001100110011[0011]…2 –6– 15-213, F’06
  • 7. Floating Point Representation Numerical Form  –1s M 2E Sign bit s determines whether number is negative or positive Significand M normally a fractional value in range [1.0,2.0). Exponent E weights value by power of two Encoding s exp frac  MSB is sign bit  exp field encodes E  frac field encodes M –7– 15-213, F’06
  • 8. Floating Point Precisions Encoding s exp frac  MSB is sign bit  exp field encodes E  frac field encodes M Sizes  Single precision: 8 exp bits, 23 frac bits 32 bits total  Double precision: 11 exp bits, 52 frac bits 64 bits total  Extended precision: 15 exp bits, 63 frac bits Only found in Intel-compatible machines Stored in 80 bits » 1 bit wasted –8– 15-213, F’06
  • 9. “Normalized” Numeric Values Condition  exp ≠ 000…0 and exp ≠ 111…1 Exponent coded as biased value E = Exp – Bias Exp : unsigned value denoted by exp Bias : Bias value » Single precision: 127 ( Exp: 1…254, E: -126…127) » Double precision: 1023 ( Exp: 1…2046, E: -1022…1023) » in general: Bias = 2 e-1 - 1, where e is number of exponent bits Significand coded with implied leading 1 M = 1.xxx…x2  xxx…x: bits of frac Minimum when 000…0 (M = 1.0) Maximum when 111…1 (M = 2.0 – ε) Get extra leading bit for “free” –9– 15-213, F’06
  • 10. Normalized Encoding Example Value Float F = 15213.0;  1521310 = 111011011011012 = 1.11011011011012 X 213 Significand M = 1.11011011011012 frac = 110110110110100000000002 Exponent E = 13 Bias = 127 Exp = 140 = 100011002 Floating Point Representation: Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000 140: 100 0110 0 – 10 – 15213: 1110 1101 1011 01 15-213, F’06
  • 11. Denormalized Values Condition  exp = 000…0 Value  Exponent value E = –Bias + 1  Significand value M = 0.xxx…x2  xxx…x: bits of frac Cases  exp = 000…0, frac = 000…0  Represents value 0  Note that have distinct values +0 and –0  exp = 000…0, frac ≠ 000…0  Numbers very close to 0.0  Lose precision as get smaller – 11 –  “Gradual underflow” 15-213, F’06
  • 12. Special Values Condition  exp = 111…1 Cases  exp = 111…1, frac = 000…0  Represents value ∞ (infinity)  Operation that overflows  Both positive and negative  E.g., 1.0/0.0 = −1.0/−0.0 = + ∞, 1.0/−0.0 = − ∞  exp = 111…1, frac ≠ 000…0  Not-a-Number (NaN)  Represents case when no numeric value can be determined  E.g., sqrt(–1), ∞ − ∞, ∞ ∗ 0 – 12 – 15-213, F’06
  • 13. Summary of Floating Point Real Number Encodings −∞ -Normalized -Denorm +Denorm +Normalized +∞ NaN NaN −0 +0 – 13 – 15-213, F’06
  • 14. Tiny Floating Point Example 8-bit Floating Point Representation  the sign bit is in the most significant bit.  the next four bits are the exponent, with a bias of 7.  the last three bits are the frac  Same General Form as IEEE Format  normalized, denormalized  representation of 0, NaN, infinity 7 6 3 2 0 s exp frac – 14 – 15-213, F’06
  • 15. Values Related to the Exponent Exp exp E 2E 0 0000 -6 1/64 (denorms) 1 0001 -6 1/64 2 0010 -5 1/32 3 0011 -4 1/16 4 0100 -3 1/8 5 0101 -2 1/4 6 0110 -1 1/2 7 0111 0 1 8 1000 +1 2 9 1001 +2 4 10 1010 +3 8 11 1011 +4 16 12 1100 +5 32 13 1101 +6 64 14 1110 +7 128 15 1111 n/a (inf, NaN) – 15 – 15-213, F’06
  • 16. Dynamic Range s exp frac E Value 0 0000 000 -6 0 0 0000 001 -6 1/8*1/64 = 1/512 closest to zero Denormalized 0 0000 010 -6 2/8*1/64 = 2/512 numbers … 0 0000 110 -6 6/8*1/64 = 6/512 0 0000 111 -6 7/8*1/64 = 7/512 largest denorm 0 0001 000 -6 8/8*1/64 = 8/512 smallest norm 0 0001 001 -6 9/8*1/64 = 9/512 … 0 0110 110 -1 14/8*1/2 = 14/16 0 0110 111 -1 15/8*1/2 = 15/16 closest to 1 below Normalized 0 0111 000 0 8/8*1 = 1 numbers closest to 1 above 0 0111 001 0 9/8*1 = 9/8 0 0111 010 0 10/8*1 = 10/8 … 0 1110 110 7 14/8*128 = 224 0 1110 111 7 15/8*128 = 240 largest norm 0 1111 000 n/a inf – 16 – 15-213, F’06
  • 17. Distribution of Values 6-bit IEEE-like format  e = 3 exponent bits  f = 2 fraction bits  Bias is 3 Notice how the distribution gets denser toward zero. -15 -10 -5 0 5 10 15 Denormalized Normalized Infinity – 17 – 15-213, F’06
  • 18. Distribution of Values (close-up view) 6-bit IEEE-like format  e = 3 exponent bits  f = 2 fraction bits  Bias is 3 -1 -0.5 0 0.5 1 Denormalized Normalized Infinity – 18 – 15-213, F’06
  • 19. Interesting Numbers Description exp frac Numeric Value Zero 00…00 00…00 0.0 Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}  Single ≈ 1.4 X 10–45  Double ≈ 4.9 X 10–324 Largest Denormalized 00…00 11…11 (1.0 – ε) X 2– {126,1022}  Single ≈ 1.18 X 10–38  Double ≈ 2.2 X 10–308 Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}  Just larger than largest denormalized One 01…11 00…00 1.0 Largest Normalized 11…10 11…11 (2.0 – ε) X 2{127,1023}  Single ≈ 3.4 X 1038  Double ≈ 1.8 X 10308 – 19 – 15-213, F’06
  • 20. Special Properties of Encoding FP Zero Same as Integer Zero  All bits = 0 Can (Almost) Use Unsigned Integer Comparison  Must first compare sign bits  Must consider -0 = 0  NaNs problematic  Will be greater than any other values  What should comparison yield?  Otherwise OK  Denorm vs. normalized  Normalized vs. infinity – 20 – 15-213, F’06
  • 21. Floating Point Operations Conceptual View  First compute exact result  Make it fit into desired precision Possibly overflow if exponent too large Possibly round to fit into frac Rounding Modes (illustrate with $ rounding) $1.40 $1.60 $1.50 $2.50 –$1.50  Zero $1 $1 $1 $2 –$1  Round down (-∞) $1 $1 $1 $2 –$2  Round up (+∞) $2 $2 $2 $3 –$1  Nearest Even (default) $1 $2 $2 $2–$2 Note: 1. Round down: rounded result is close to but no greater than true result. 2. Round up: rounded result is close to but no less than true result. – 21 – 15-213, F’06
  • 22. Closer Look at Round-To- Even Default Rounding M ode  Hard to get any other kind without dropping into assembly  All others are statistically biased Sum of set of positive numbers will consistently be over- or under- estimated Applying to Other Decimal Places / Bit Positions  When exactly halfway between two possible values Round so that least significant digit is even  E.g., round to nearest hundredth 1.2349999 1.23 (Less than half way) 1.2350001 1.24 (Greater than half way) 1.2350000 1.24 (Half way—round up) 1.2450000 1.24 (Half way—round down) – 22 – 15-213, F’06
  • 23. Rounding Binary Numbers Binary Fractional Numbers  “Even” when least significant bit is 0  Half way when bits to right of rounding position = 100… 2 Examples  Round to nearest 1/4 (2 bits right of binary point) Value Binary Rounded Action Rounded Value 2 3/32 10.000112 10.002 (<1/2—down) 2 2 3/16 10.001102 10.012 (>1/2—up) 2 1/4 2 7/8 10.111002 11.002 (1/2—up) 3 2 5/8 10.101002 10.102 (1/2—down) 2 1/2 – 23 – 15-213, F’06
  • 24. FP Multiplication Operands * (–1)s1 M1 2E1 (–1)s2 M2 2E2 Exact Result (–1)s M 2E  Sign s: s1 ^ s2  Significand M: M1 * M2  Exponent E: E1 + E2 Fixing  If M ≥ 2, shift M right, increment E  If E out of range, overflow  Round M to fit frac precision Implementation  Biggest chore is multiplying significands – 24 – 15-213, F’06
  • 25. FP Addition Operands (–1)s1 M1 2E1 E1–E2 (–1)s2 M2 2E2 (–1)s1 M1  Assume E1 > E2 + (–1)s2 M2 Exact Result (–1)s M 2E (–1)s M  Sign s, significand M:  Result of signed align & add  Exponent E: E1 Fixing  If M ≥ 2, shift M right, increment E  if M < 1, shift M left k positions, decrement E by k  Overflow if E out of range – 25 –  Round M to fit frac precision 15-213, F’06
  • 26. Mathematical Properties of FP Add Compare to those of Abelian Group  Closed under addition? YES But may generate infinity or NaN  Commutative? YES  Associative? NO Overflow and inexactness of rounding  0 is additive identity? YES  Every element has additive inverse ALMOST Except for infinities & NaNs Monotonicity  a ≥ b ⇒ a+c ≥ b+c? ALMOST Except for infinities & NaNs – 26 – 15-213, F’06
  • 27. Math. Properties of FP Mult Compare to Commutative Ring  Closed under multiplication? YES But may generate infinity or NaN  Multiplication Commutative? YES  Multiplication is Associative? NO Possibility of overflow, inexactness of rounding  1 is multiplicative identity? YES  Multiplication distributes over addition? NO Possibility of overflow, inexactness of rounding Monotonicity  a ≥ b & c ≥ 0 ⇒ a *c ≥ b *c? ALMOST Except for infinities & NaNs – 27 – 15-213, F’06
  • 28. Creating Floating Point Number Steps 7 6 3 2 0  Normalize to have leading 1s exp frac  Round to fit within fraction  Postnormalize to deal with effects of rounding Case Study  Convert 8-bit unsigned numbers to tiny floating point format  Example Numbers 128 10000000 15 00001101 33 00010001 35 00010011 138 10001010 – 28 – 63 00111111 15-213, F’06
  • 29. Normalize 7 6 3 2 0 s exp frac Requirement  Set binary point so that numbers of form 1.xxxxx  Adjust all to have leading one  Decrement exponent as shift left Value Binary Fraction Exponent 128 10000000 1.0000000 7 15 00001101 1.1010000 3 17 00010001 1.0001000 5 19 00010011 1.0011000 5 138 10001010 1.0001010 7 63 00111111 1.1111100 5 – 29 – 15-213, F’06
  • 30. Rounding 1.BBGRXXX Guard bit: LSB of result Sticky bit: OR of remaining bits Round bit: 1 st bit removed Round up conditions  Round = 1, Sticky = 1  > 0.5  Guard = 1, Round = 1, Sticky = 0  Round to even Value Fraction GRS Incr? Rounded 128 1.0000000 000 N 1.000 15 1.1010000 100 N 1.101 17 1.0001000 010 N 1.000 19 1.0011000 110 Y 1.010 138 1.0001010 111 Y 1.001 63 1.1111100 111 Y 10.000 – 30 – 15-213, F’06
  • 31. Postnormalize Issue  Rounding may have caused overflow  Handle by shifting right once & incrementing exponent Value Rounded Exp Adjusted Result 128 1.000 7 128 15 1.101 3 15 17 1.000 4 16 19 1.010 4 20 138 1.001 7 134 63 10.000 5 1.000/6 64 – 31 – 15-213, F’06
  • 32. Floating Point in C C Guarantees Two Levels float single precision double double precision Conversions  Casting between int, float, and double changes numeric values  Double or float to int  Truncates fractional part  Like rounding toward zero  Not defined when out of range or NaN » Generally sets to TMin  int to double  Exact conversion, as long as int has ≤ 53 bit word size  int to float  Will round according to rounding mode – 32 – 15-213, F’06
  • 33. Curious Excel Behavior Number Subtract 16 Subtract .3 Subtract .01 Default Format 16.31 0.31 0.01 -1.2681E-15 Currency Format $16.31 $0.31 $0.01 ($0.00)  Spreadsheets use floating point for all computations  Some imprecision for decimal arithmetic  Can yield nonintuitive results to an accountant! – 33 – 15-213, F’06
  • 34. Summary IEEE Floating Point Has Clear M athematical Properties  Represents numbers of form M X 2 E  Can reason about operations independent of implementation  As if computed with perfect precision and then rounded  Not the same as real arithmetic  Violates associativity/distributivity  Makes life difficult for compilers & serious numerical applications programmers – 34 – 15-213, F’06