SlideShare a Scribd company logo
1 of 17
Download to read offline
Java 中的 float&double 以及 IEEE754 研究

                     Zianed Hou

                    zianed@live.cn




1、IEEE754 二进制浮点数算术标准,全称 ANSI/IEEE Std 754-1985
2、二进制浮点数的表示
3、浮点数的舍入
4、数值处理中的异常
5、Java 中的 float 和 double
6、Java 中的 BigDecimal
7、IEEE-754 发展




Zianed                Version 1.0                1
1、IEEE754 二进制浮点数算术标准,全称 ANSI/IEEE Std 754-1985
  IEEE754
       浮点数:有理数中某些特定自己的数的数字表示,在计算机中用来近似表示
任意某个实数。这个实数由一个整数或定点数乘以某个技术的整数次幂得到,
类似于基数为 10 的科学计数法。
       一个浮点数 a 由两个数 m 和 e 来表示:a=m*b^e。选择一个基数 b 和精度
p。m 即尾数是形式如 d.ddddd 的 p 位数,每一位介于 0~b-1 之间的数,小数点左
侧的数字不为 0。
       浮点运算伴随着因为无法精确表述而进行的近似或舍入。
       浮点数标准:浮点数标准是关于浮点数存储和计算的标准。在多数情况下它
是指 IEEE 754 标准(包括单精度浮点数、双精度浮点数等的格式和运算规定)。
在 IEEE 标准 754 之前,业界并没有一个统一的浮点数标准。很多计算机制造商都
设计自己的浮点数规则,以及运算细节。那时,实现的速度和简易性比数字的
精确性更受重视。这给代码的可移植性造成了障碍。直到 1985 年,Intel 打算为其
的 8086 微处理器引进一种浮点数协处理器的时候,聘请了加州大学伯克利分校
的 William Kahan 教授──最优秀的数值分析家之一来为 8087 FPU 设计浮点数格
式;William Kahan 又找来两个专家来协助他,于是就有了 KCS 组合(Kahn,
Coonan, and Stone),并共同完成了 Intel 的浮点数格式设计。Intel 的 KCS 浮点数
格式完成得如此出色,以致于 IEEE 决定采用一个非常接近 KCS 的方案作为 IEEE
的标准浮点格式。IEEE 于 1985 年制订了二进制浮点运算标准(Binary Floating-
Point Arithmetic)IEEE 754。该标准限定指数的底为 2。同年,被美国引用为 ANSI
标准。目前,几乎所有计算机都支持该标准,大大改善了科学应用程序的可移植
性。考虑到 IBM System/370 的影响,IEEE 于 1987 年推出了与底数无关的二进制
浮点运算标准 IEEE 854。同年,被美国引用为 ANSI 标准。1989 年,国际标准组
织 IEC 批准 IEEE 754/854 为国际标准 IEC 559:1989。后来经修订后,标准号改为
IEC 60559。现在,几乎所有的浮点处理器完全或基本支持 IEC 60559。C99 的浮点
运算也支持 IEC 60559。
       IEEE 二进位浮点数算术标准(IEEE 754)是浮点数运算标准,为许多 CPU
与浮点运算器所采用。这个标准定义了表示浮点数的格式(包括负零-0)与反常值
(denormal number),一些特殊数值(无穷与非数值 NaN),以及这些数值的“浮点数运
算子”;它也指明了四种数值修约规则和五种例外状况(包括例外发生的时机
与处理方式)。
       IEEE 754 规定了四种表示浮点数值的方式:单精确度(32 位元)、双精确
度(64 位元)、延伸单精确度(43 位元以上,很少使用)与延伸双精确度




Zianed                  Version 1.0                    2
(79 位元以上,通常以 80 位元实做)。只有 32 位元模式有强制要求,其他都是
选择性的。大部分语言都有提供 IEEE 格式与算术,但有些将其列为非必要的。
例如,IEEE 754 问世之前就有的 C 语言,现在有包括 IEEE 算术,但不算作强制要
求(C 语言的 float 通常是指 IEEE 单精确度,而 double 是指双精确度)。
      该标准的全称为 IEEE 二进位浮点数算术标准(ANSI/IEEE Std 754-
1985),又称 IEC 60559:1989,微处理器系统的二进位浮点数算术(本来的编号是
IEC 559:1989)
。后来还有“与基数无关的浮点数”的“IEEE 854-1987 标准”,有规定基数为 2 跟 10
的状况。
      历史来源,参考《IEEE754:An Interview with William Kahan》




2、二进制浮点数的表示
    二进制浮点数以二号数值表示法格式存储,将最高位指定为符号位(Sign bit),
次高的 e 位指定为指数位(Exponent Bias),小数部分(decimal Fraction)
即最后剩下的 f 位。有效位数=符号位+指数位+小数位。
   S+Exp+Fraction
   32 位单精度
   32=1+8+23,偏正值+127。
      单精度的指数部分是-126~+127 加上 127 ,指数值的大小从 1~254(0 和
255 是特殊值)。浮点小数计算时,指数值减去偏正值将是实际的指数大小。
      例如:
      3F400000=0.75
      0=正数;1111110=126,指数=126-127=-1;
      1 .10000000000000000000000=2^0+*2^-1=1.5,小数部分=1.5
      也就是 1.5*2^-1=0.75




Zianed                 Version 1.0                    3
数值的计算:




特殊数值的表示:




   64 位双精度
    64=1+11+52,偏正值+1023
      双精度的指数部分是 1022~+1023 加上 1023 ,指数值的大小从 1~
2046(0(2 进位全为 0)和 2047(2 进位全为 1)是特殊值)。浮点小数计算时,指数
值减去偏正值将是实际的指数大小。


Zianed               Version 1.0                   4
数值的计算:




特殊数值的表示:




gradual underflow——>the subnormal numbers
          Gradual
        [Gradual underlow provides a number of advantages over abrupt
underfow.Without it, the gap between zero and the smallest foating-point number is much
larger than the gap between successive small foating-point numbers. Without gradual
underfow one can find two values, X and Y (such that X is not equal to Y), and yet when
you subtract them their result is zero. While a skilled numerical analyst could work
around this limitation in many situations, this anomaly would tend to cause problems for
less skilled programmers.——Charles Severance]

       结论:小数部分最高有效位由指数部分决定。如果指数在 0 < exponent <
2^e-1 之间,那么小数部分最高有效位将是 1,而且这个数将被称为正规形式。



Zianed                                Version 1.0                                      5
如果指数是 0,有效数最高有效位将会是 0,并且这个数将被称为非正规形式。这
里有三个特殊值需要指出:
    如果 指数 是 0 并且 小数部分 是 0,这个数±0(和符号位相关)
    如果 指数 = 2^e - 1 并且 小数部分 是 0,这个数是 ±无穷大(同样和符号
位相关)
    如果 指数 = 2^e - 1 并且 小数部分 非 0,这个数表示为不是一个数
(NaN)。
    以上规则,总结如下:
    形势 指数 小数部分
    零 00
    非正规形式 0 非 0
    正规形式 1 到 2^e - 2 任意
    无穷 2^e-1 0
    NaN 2^e-1 非零

  Fraction 位二进制数所能表示的二进制个数是在 2^(Fraction)个,而 Fraction 位
十进制数可以表示的个数是 10^(Fraction)个。
可以表示的比例是 2^(Fraction)/10^(Fraction)=0.2^(Fraction)。Fraction 越大,所能表
示的浮点数的比例就越小。

Subnormal numbers:
The numbers closest to the inverse of these bounds (−1×10−95 and 1×10−95) are
considered to be the smallest (in magnitude) normal numbers; non-zero numbers between
these smallest numbers are called subnormal numbers.

Subnormal numbers provide the guarantee that addition and subtraction of floating-point
numbers never underflows; two nearby floating-point numbers always have a
representable non-zero difference. Without gradual underflow, the subtraction a−b can
underflow and produce zero even though the values are not equal. This can, in turn, lead
to division by zero errors that cannot occur when gradual underflow is used.
By filling the underflow gap like this, significant digits are lost, but not to the extent as
when doing flush to zero on underflow (losing all significant digits all through the
underflow gap). Hence the production of a denormal number is sometimes called
gradual underflow because it allows a calculation to lose precision slowly when the
result is small.


Some processors handle subnormal values in hardware, just as normal values are.
Subnormal values (as arguments or results) then pose no particular performance issue;
they are handled at the same speed as normal values. But some processors leave the
handling of subnormal values to system software, only handling normal values (and zero)
in hardware. In this case, computing with subnormal values is significantly slower than
computing with normal values.Some applications need to contain code to avoid



Zianed                                   Version 1.0                                            6
subnormal numbers. Either to maintain accuracy, or in order to avoid the performance
penalty in some processors
If the exponent is all 0s, but the fraction is non-zero (else it would be interpreted as zero),
then the value is a subnormalized number, which does not have an assumed leading 1
before the binary point. Thus, this represents a number (-1)s × 0.f × 2-126, where s is the
sign bit and f is the fraction. For double precision, denormalized numbers are of the form
(-1)s × 0.f × 2-1022. From this you can interpret zero as a special type of denormalized
number.

各种类型数值计算中 Subnormal 的值:




3、浮点数的舍入
   任何有效数上的运算结果,通常都存放在较长的寄存器中,当结果返回为浮
点格式时,必须将多出来的位元丢弃。
   有多种方法可以用来执行舍入作业,实际上 IEEE 标准列出 4 种不同的方
法:
   舍入到最接近:会将结果舍入为最接近且可以表示的值。
   向+∞方向舍入:会将结果向正无限大的方向舍入。ceil()方法
   向-∞方向舍入: 会将结果向负无限大的方向舍入。 floor()方法
   向 0 方向舍入: 会将结果向 0 的方向舍入。 (int)截断舍入



Zianed                                   Version 1.0                                          7
IEEE754-2008 的舍入算法:
1)Rounding to nearest(向最近的数值进行舍入)
       Round to nearest,ties to eve 向偶数进行方向舍入,也就是将最后一位取 0 的
一种舍入方式;是默认的舍入方式,也是推荐的舍入方式。(理解:0 是偶数,所
以偶数比奇数多,自然取偶数的精确性更大些)
       Round to nearest,ties away from zero 向远离 0 的一侧进行舍入;正数取大的
数值,负数取小的数值。
2)Directed roundings(定向的舍入)
       Round toward 0 向 0 方向舍入。
       Round toward +∞ 将结果向正无限大的方向舍入。
       Round toward -∞ 将结果向负无限大的方向舍入。




4、数值处理中的异常

         标准定义了五种异常(非法操作、0除、上溢、下溢、不精确)

The standard defines five exceptions, each of which has a corresponding status
flag that (except in certain cases of underflow) is raised when the exception
occurs. No other action is required, but alternatives are recommended (see
below).

The five possible exceptions are:
    Invalid operation (e.g., square root of a negative number)
    Division by zero
    Overflow (a result is too large to be represented correctly)
    Underflow (a result is very small (outside the normal range) and is inexact)
    Inexact.

Underflow
    Recall that the IEEE format for a normal floating-point number is:
    (-1)
    s..(e- bias) . (2...) . 1.f
    where s is the sign bit, e is the biased exponent, and f is the fraction. Only s, e, and f
need to be stored to fully specify the number. Because the implicit leading bit of the
significand is defined to be 1 for normal numbers, it need not be stored.
    The smallest positive normal number that can be stored, then, has the negative
exponent of greatest magnitude and a fraction of all zeros. Even smaller numbers can be
accommodated by considering the leading bit to be zero rather than one. In the double-
precision format, this effectively extends the minimum exponent from 10-308 to 10-324,
because the fraction part is 52 bits long (roughly 16 decimal digits.) These are the


Zianed                                   Version 1.0                                             8
subnormal numbers; returning a subnormal number (rather than flushing an underflowed
result to zero) is gradual underflow.
    Clearly, the smaller a subnormal number, the fewer nonzero bits in its fraction;
computations producing subnormal results do not enjoy the same bounds on relative
roundoff error as computations on normal operands. However, the key fact about gradual
underflow is that its use implies:
          Underflowed results need never suffer a loss of accuracy any greater than that
which results from ordinary roundoff error.
            Addition, subtraction, comparison, and remainder are always exact when the
result is very small.

    Recall that the IEEE format for a subnormal floating-point number is:
    (-1)
    s..(- bias+ 1) . (2....) . 0.f
    where s is the sign bit, the biased exponent e is zero, and f is the fraction. Note that
the implicit power-of-two bias is one greater than the bias in the normal format, and the
implicit leading bit of the fraction is zero.
    Gradual underflow allows you to extend the lower range of representable numbers. It
is not smallness that renders a value questionable, but its associated error. Algorithms
exploiting subnormal numbers have smaller error bounds than other systems. The next
section provides some mathematical justification for gradual underflow.


Why Gradual Underflow?
    The purpose of subnormal numbers is not to avoid underflow/overflow entirely, as
some other arithmetic models do. Rather, subnormal numbers eliminate underflow as a
cause for concern for a variety of computations (typically, multiply followed by add). For
a more detailed discussion, see "Underflow and the Reliability of Numerical Software"
by James Demmel and "Combatting the Effects of Underflow and Overflow in
Determining Real Roots of Polynomials" by S. Linnainmaa.

    The presence of subnormal numbers in the arithmetic means that untrapped
underflow (which implies loss of accuracy) cannot occur on addition or subtraction. If x
and y are within a factor of two, then x -y is error-free. This is critical to a number of
algorithms that effectively increase the working precision at critical places in
algorithms.In addition, gradual underflow means that errors due to underflow are no
worse than usual roundoff error. This is a much stronger statement than can be made
about any other method of handling underflow, and this fact is one of the best
justifications for gradual underflow.
    Most of the time, floating-point results are rounded:
    computed result = (true result). Roundoff
    In IEEE arithmetic, with rounding mode to nearest,
     1/2 ulp0 . roundoff .




Zianed                                  Version 1.0                                            9
of the computed result.ulp is an acronym for Unit in the Last Place. The least
significant bit of the fraction of a number in its standard representation, is the last place.
If the roundoff error is less than or equal to one half unit in the last place, then the
calculation is correctly rounded.

the ulp for each floating point data type would be
         Precision                                         Value
single                      = 2^-23 ~ 1.192092896e-07
double                      = 2^-52 ~ 2.22044604925031308e-16
Intel double extended       = 2^-11 ~ 1.92592994438723585305597794258492732e-34

    Any conventional set of representable floating-point numbers has the property that
the worst effect of one inexact result is to introduce an error no worse than the distance to
one of the representable neighbors of the computed result. When subnormal numbers are
added to the representable set and gradual underflow is implemented, the worst effect of
one inexact or underflowed result is to introduce an error no greater than the distance to
one of the representable neighbors of the computed result.
    In particular, in the region between zero and the smallest normal number, the
distance between any two neighboring numbers equals the distance between zero and the
smallest subnormal number. The presence of subnormal numbers eliminates the
possibility of introducing a roundoff error that is greater than the distance to the nearest
representable number.
    In the absence of gradual underflow, user programs need to be sensitive to the
implicit inaccuracy threshold. For example, in single precision, if underflow occurs in
some parts of a calculation, and Store 0 is used to replace underflowed results with 0,
then accuracy can be guaranteed only to around 10-31, not 10-38, the usual lower range for
single-precision exponents.
    This means that programmers need to implement their own method of detecting when
they are approaching this inaccuracy threshold, or else abandon the quest for a robust,
stable implementation of their algorithm.Some algorithms can be scaled so that
computations don't take place in the onstricted area near zero. However, scaling the
algorithm and detecting the inaccuracy threshold can be difficult and time-consuming for
each numerical program.

认识:
Gradual underflow 可以使程序在截断数据的时间向更精确做出判断,提高数据精
确度,subnormal number 就是这种用来做判断的数据的一个范围值。




5、Java 中的 float 和 double
    Java
1)float
Float 中,指数 8 位,小数位 23 位


Zianed                                   Version 1.0                                             10
指数范围:0~255(-127 偏差)=-127~128
其中-127 和 128 是用来表示特殊数字的-126~127 表示的是正常数字。

以上数值为了表示 0:因此用指数-127 来表示:
0x00000000=(0,-127,1.0)=2^(-127)表示零。
以上数值为了表示无穷和非数:因此用指数 128 来表示:
0x[7|f]f[8|c]......=(0|1,128,)=2^128 当小数部分是 0 时表示无穷大;当小数部分不是 0
时表示的是非数。
因此,有效范围内的最大正值为 0x7f7fffff(0,127,7ffff)
        有效范围内的最小正值为 0x00000001=2^(-23)*2^(-126)=2^(-149)
        MIN_NORMAL=0x0080000=(0,1,0)=2^(-126)
Why -126?
         Otherwise we’d be skipping numbers
         0.1 * 2-126 = 1.0 * 2-127

Subnormal number 为最小值到 MIN_NORMAL 之间的所有数值。


                                                                                        Approximate
                         Subnormalized                        Normalized
                                                                                          Decimal

    Single          ± 2-149 to (1-2-23)×2-126
                    = (2-23——1-2-23)×2-126             ± 2-126 to (2-2-23)×2127     ± ~10-44.85 to ~1038.53
   Precision

    Double          ± 2-1074 to (1-2-52)×2-1022
                    =(2-52——1-2-52)×2-1022             ± 2-1022 to (2-2-52)×21023   ± ~10-323.3 to ~10308.3
   Precision



java.lang.Float 中
//@code Float.intBitsToFloat(0x7f800000)即(0,255-127=128,)
public static final double POSITIVE_INFINITY = 1.0 / 0.0;
//@Float.intBitsToFloat(0xff800000)即(0,255-127=128,)
public static final float NEGATIVE_INFINITY = -1.0f / 0.0f;
//@Float.intBitsToFloat(0x7fc00000)即(0,255-127=128,)
public static final float NaN = 0.0f / 0.0f;
//@Float.intBitsToFloat(0x7f7fffff)</code>.
public static final float MAX_VALUE = 0x1.fffffeP+127f; //
3.4028235e+38f
//@code Float.intBitsToFloat(0x00800000 即(0,1-127=-126,)
public static final float MIN_NORMAL = 0x1.0p-126f; // 1.17549435E-38f
//@code Float.intBitsToFloat(0x1) 即(0,0-127=-127,)
public static final float MIN_VALUE = 0x0.000002P-126f; // 1.4e-45f

获取代表 float 的 32bit 的 int 型表示:
                                 float
public static int floatToIntBits(float value)
                                           float
public static native int floatToRawIntBits(float value)




Zianed                                          Version 1.0                                               11
返回 32bit 代表的 float 浮点数值:
                                          int
public static native float intBitsToFloat(int bits)



2)double
java.lang.Double 中
//@code Double.longBitsToDouble(0x7ff0000000000000L)</code>.
public static final double POSITIVE_INFINITY = 1.0 / 0.0;
//@code Double.longBitsToDouble(0xfff0000000000000L)</code>.
public static final double NEGATIVE_INFINITY = -1.0 / 0.0;
//@code Double.longBitsToDouble(0x7ff8000000000000L)</code>.
public static final double NaN = 0.0d / 0.0;
//@code Double.longBitsToDouble(0x7fefffffffffffffL)</code>.
public static final double MAX_VALUE = 0x1.fffffffffffffP+1023; //
1.7976931348623157e+308
//@code Double.longBitsToDouble(0x0010000000000000L)
public static final double MIN_NORMAL = 0x1.0p-1022; //
2.2250738585072014E-308
//@code Double.longBitsToDouble(0x1L)
public static final double MIN_VALUE = 0x0.0000000000001P-1022; //
4.9e-324

获取代表 double 的 64bit 的 long 型表示:
                                    double
public static long doubleToLongBits(double value)
                                              double
public static native long doubleToRawLongBits(double value)
返回 64bit 代表的 double 浮点数值:
                                             long
public static native double longBitsToDouble(long bits)




6、Java 中的 BigDecimal
  Java
extends Number implements Comparable<BigDecimal>
实现了比较接口,可以进行相互之间的比较。
    不可变的、任意精度的有符号十进制数。BigDecimal 由任意精度的整数非标
度值 和 32 位的整数标度 (scale) 组成。如果为零或正数,则标度是小数点后的
位数。如果为负数,则将该数的非标度值乘以 10 的负 scale 次幂。因此,
BigDecimal 表示的数值是 (unscaledValue × 10-scale)。


在金融以及涉及到钱的计算中,都需要使用该类替换 double,以防止引起累积误
差,获取高准确的数值计算。
测试代码:
             double v1 = 1.0;
             double v2 = 0.9;
             out.println(v1 - v2);

             BigDecimal value1 = new BigDecimal(Double.toString(v1));



Zianed                          Version 1.0                             12
BigDecimal value2 = new BigDecimal(Double.toString(v2));
            out.println(value1.subtract(value2));



使用时的注意:
1)构造函数采用 String 参数而不是 double 参数,因为 double val 本身就是一个
精确表示的值。
public BigDecimal(String val)
2)基本运算
加
public BigDecimal add(BigDecimal augend)
减
public BigDecimal subtract(BigDecimal subtrahend)
乘
public BigDecimal multiply(BigDecimal multiplicand)
除
public BigDecimal divide(BigDecimal divisor)
3)舍入方式
// Rounding Modes

 //Rounding mode to round away from zero.      Always increments the digit.
 public final static int ROUND_UP =              0;

 //Rounding mode to round towards zero.       Never increments the digit.
 public final static int ROUND_DOWN =             1;


//Rounding mode to round towards positive infinity.
public final static int ROUND_CEILING =     2;

//Rounding mode to round towards negative infinity.
public final static int ROUND_FLOOR =       3;

//Rounding mode to round towards nearest neighbor
//unless both neighbors are equidistant, in which case round up.
public final static int ROUND_HALF_UP =     4;

//Rounding mode to round towards nearest neighbor
//unless both neighbors are equidistant, in which case round down.
public final static int ROUND_HALF_DOWN =   5;

//Rounding mode to round towards the nearest neighbor
//unless both neighbors are equidistant, in which case, round
// towards the even neighbor.
public final static int ROUND_HALF_EVEN =   6;

//Rounding mode to assert that the requested operation has an exact
// result, hence no rounding is necessary.
public final static int ROUND_UNNECESSARY = 7;

4)比较两个数值大小的方法是:
public int compareTo(BigDecimal val)
不能用


Zianed                          Version 1.0                                 13
public boolean equals(Object x)
进行。
测试代码:
import java.math.BigDecimal;
import static java.lang.System.out;

BigDecimal value11 = new BigDecimal("1");
BigDecimal value21 = new BigDecimal("1.0");
out.println(value11.equals(value21));//false
out.println(value11.compareTo(value21) == 0 ? true : false
                                                     false);//true




7、IEEE-754 发展
  IEEE-754
     IEEE 754-2008 governs binary floating-point arithmetic. It specifies number formats,
 basic operations, conversions, and exceptional conditions.The 2008 edition supersedes
both the 754-1985 standard and the related IEEE 854-1987 which generalized 754-1985
to cover decimal arithmetic as well as binary.
     IEEE-7442008 标准定义了:
     1)arithmetic formats:二进制、十进制浮点数;
             signed zeros,subormal numbers,infinites,NaN(Not a Number) ;
     2)interchange formats:encoding(bit strings)编码数据在交换时已获得更高效率;
     3)rounding algorithms:在计算和转换时进行的舍入方式;
     4)operations:操作符在计算层次上的格式;
     5)exception handling:指示异常条件(0 除、溢出)。

Basic Format(基本格式):
    Name         Common name         Base       Digits       E min    E max          Notes
binary16     Half precision                 2     10+1          -14      +15 storage, not basic
binary32     Single precision               2     23+1         -126     +127
binary64     Double precision               2     52+1        -1022    +1023
binary128    Quadruple precision            2    112+1       -16382   +16383
decimal32                               10               7      -95      +96 storage, not basic
decimal64                               10           16        -383     +384
decimal128                              10           34       -6143    +6144
All the basic formats are available in both hardware and software implementations


Arithmetic Format(算术格式):
用浮点数的 sign(符号位)、significand(小数位)、exponent(指数位)表示的浮点数。


Zianed                                 Version 1.0                                                14
Interchange format(交换格式)
The width of the exponent field for a k-bit format is computed as

w = round(4×log2(k))- 13.(指数位位数的计算公式)



    十进制浮点数:
    Kahan 教授的看法:使用十进制浮点数,以避免人为错误。也就是这种错误:
double d = 0.1;实际上,d≠0.1。IBM 公司的看法:在经济、金融和与人相关的程
序中,使用十进制浮点数。但是,由于没有硬件支持,用软件实现的十进制浮点计
算比硬件实现的二进制浮点计算要慢 100-1000 倍。由于被 IEEE 754R 所采纳,
IBM 公司将在下一代 Power 芯片中实现十进制 FPU。


总结附录图表:




Reference:
EN
http://standards.ieee.org/
http://ieeexplore.ieee.org/servlet/opac?punumber=2355
http://ieeexplore.ieee.org/servlet/opac?punumber=2502


Zianed                               Version 1.0                    15
http://ieeexplore.ieee.org/servlet/opac?punumber=4610933
http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html
http://grouper.ieee.org/groups/754/meeting-materials/2001-10-18-langdesign.pdf
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
http://754r.ucbtest.org/
http://grouper.ieee.org/groups/754/
http://en.wikipedia.org/wiki/IEEE_754-2008
http://en.wikipedia.org/wiki/Subnormal_number
http://docs.sun.com/source/806-3568/ncgTOC.html
http://docs.sun.com/source/806-3568/ncg_goldberg.html
http://chrishecker.com/Miscellaneous_Technical_Articles#Floating_Point
http://chrishecker.com/Miscellaneous_Technical_Articles
http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3
http://babbage.cs.qc.edu/IEEE-754/IEEE-754hex32.html
http://hal.archives-ouvertes.fr/hal-00128124/en/
http://www.dec.usc.es/arith16/docs/ieee_toc_arith_cfp.pdf
http://docs.sun.com/app/docs/doc/802-5692/6i9ecs3oa?l=zh&a=view

ZH
http://zh.wikipedia.org/wiki/%E6%B5%AE%E7%82%B9%E6%95%B0
http://zh.wikipedia.org/wiki/IEEE_754
http://codex.wordpress.org.cn/IEEE_754
http://www.cnblogs.com/bossin/archive/2007/04/08/704567.html
http://stephensuen.spaces.live.com/Blog/cns!1p1G_DGhjYiYGmj6keNZQAcw!172.entry
http://docs.sun.com/app/docs/coll/44.4?l=zh




Zianed                               Version 1.0                                 16
Zianed
Homepage:http://my.unix-center.net/~Zianed/
Mail: hxuanzhe86@sina.com
MSN:zianed@live.cn
QQ:1196123432
QQGroup: 50457022
Date:2009-10-24




Zianed                        Version 1.0     17

More Related Content

What's hot

DIGITAL SYSTEM DESIGN
DIGITAL SYSTEM DESIGN DIGITAL SYSTEM DESIGN
DIGITAL SYSTEM DESIGN Prakash Rao
 
Pi j1.2 variable-assignment
Pi j1.2 variable-assignmentPi j1.2 variable-assignment
Pi j1.2 variable-assignmentmcollison
 
Linear Block Codes
Linear Block CodesLinear Block Codes
Linear Block CodesNilaNila16
 
12 computer science_notes_ch01_overview_of_cpp
12 computer science_notes_ch01_overview_of_cpp12 computer science_notes_ch01_overview_of_cpp
12 computer science_notes_ch01_overview_of_cppsharvivek
 
[Apostila] programação arduíno brian w. evans
[Apostila] programação arduíno   brian w. evans[Apostila] programação arduíno   brian w. evans
[Apostila] programação arduíno brian w. evansWeb-Desegner
 
Calculating the hamming code
Calculating the hamming codeCalculating the hamming code
Calculating the hamming codeUmesh Gupta
 
Cse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogrammingCse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogrammingMd. Ashikur Rahman
 
Complicated declarations in c
Complicated declarations in cComplicated declarations in c
Complicated declarations in cRahul Budholiya
 
Error Detection and Correction - Data link Layer
Error Detection and Correction - Data link LayerError Detection and Correction - Data link Layer
Error Detection and Correction - Data link LayerAbdullaziz Tagawy
 
Chapter 4:Object-Oriented Basic Concepts
Chapter 4:Object-Oriented Basic ConceptsChapter 4:Object-Oriented Basic Concepts
Chapter 4:Object-Oriented Basic ConceptsIt Academy
 

What's hot (17)

DIGITAL SYSTEM DESIGN
DIGITAL SYSTEM DESIGN DIGITAL SYSTEM DESIGN
DIGITAL SYSTEM DESIGN
 
Variables and data types IN SWIFT
 Variables and data types IN SWIFT Variables and data types IN SWIFT
Variables and data types IN SWIFT
 
Pi j1.2 variable-assignment
Pi j1.2 variable-assignmentPi j1.2 variable-assignment
Pi j1.2 variable-assignment
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Linear Block Codes
Linear Block CodesLinear Block Codes
Linear Block Codes
 
12 computer science_notes_ch01_overview_of_cpp
12 computer science_notes_ch01_overview_of_cpp12 computer science_notes_ch01_overview_of_cpp
12 computer science_notes_ch01_overview_of_cpp
 
[Apostila] programação arduíno brian w. evans
[Apostila] programação arduíno   brian w. evans[Apostila] programação arduíno   brian w. evans
[Apostila] programação arduíno brian w. evans
 
5 linear block codes
5 linear block codes5 linear block codes
5 linear block codes
 
Gr2512211225
Gr2512211225Gr2512211225
Gr2512211225
 
Calculating the hamming code
Calculating the hamming codeCalculating the hamming code
Calculating the hamming code
 
Cse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogrammingCse115 lecture02overviewofprogramming
Cse115 lecture02overviewofprogramming
 
linear codes and cyclic codes
linear codes and cyclic codeslinear codes and cyclic codes
linear codes and cyclic codes
 
Linear block code
Linear block codeLinear block code
Linear block code
 
Complicated declarations in c
Complicated declarations in cComplicated declarations in c
Complicated declarations in c
 
Lecture07
Lecture07Lecture07
Lecture07
 
Error Detection and Correction - Data link Layer
Error Detection and Correction - Data link LayerError Detection and Correction - Data link Layer
Error Detection and Correction - Data link Layer
 
Chapter 4:Object-Oriented Basic Concepts
Chapter 4:Object-Oriented Basic ConceptsChapter 4:Object-Oriented Basic Concepts
Chapter 4:Object-Oriented Basic Concepts
 

Viewers also liked

Arrays的Sort算法分析
Arrays的Sort算法分析Arrays的Sort算法分析
Arrays的Sort算法分析Zianed Hou
 
Java设置环境变量
Java设置环境变量Java设置环境变量
Java设置环境变量Zianed Hou
 
Oracle的Constraint约束V1.1
Oracle的Constraint约束V1.1Oracle的Constraint约束V1.1
Oracle的Constraint约束V1.1Zianed Hou
 
Oracle试题Exam Adminv1.1
Oracle试题Exam Adminv1.1Oracle试题Exam Adminv1.1
Oracle试题Exam Adminv1.1Zianed Hou
 
Jvm的最小使用内存测试
Jvm的最小使用内存测试Jvm的最小使用内存测试
Jvm的最小使用内存测试Zianed Hou
 
Oracle数据库日志满导致错误
Oracle数据库日志满导致错误Oracle数据库日志满导致错误
Oracle数据库日志满导致错误Zianed Hou
 
Oracle中Sql解析过程
Oracle中Sql解析过程Oracle中Sql解析过程
Oracle中Sql解析过程Zianed Hou
 

Viewers also liked (7)

Arrays的Sort算法分析
Arrays的Sort算法分析Arrays的Sort算法分析
Arrays的Sort算法分析
 
Java设置环境变量
Java设置环境变量Java设置环境变量
Java设置环境变量
 
Oracle的Constraint约束V1.1
Oracle的Constraint约束V1.1Oracle的Constraint约束V1.1
Oracle的Constraint约束V1.1
 
Oracle试题Exam Adminv1.1
Oracle试题Exam Adminv1.1Oracle试题Exam Adminv1.1
Oracle试题Exam Adminv1.1
 
Jvm的最小使用内存测试
Jvm的最小使用内存测试Jvm的最小使用内存测试
Jvm的最小使用内存测试
 
Oracle数据库日志满导致错误
Oracle数据库日志满导致错误Oracle数据库日志满导致错误
Oracle数据库日志满导致错误
 
Oracle中Sql解析过程
Oracle中Sql解析过程Oracle中Sql解析过程
Oracle中Sql解析过程
 

Similar to Java中的Float&Double以及Ieee754研究V1.0

Lec 02 data representation part 2
Lec 02 data representation part 2Lec 02 data representation part 2
Lec 02 data representation part 2Abdul Khan
 
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...jmicro
 
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...zeeshanshanzy009
 
Lesson 1 basic theory of information
Lesson 1   basic theory of informationLesson 1   basic theory of information
Lesson 1 basic theory of informationRoma Kimberly Erolin
 
Lesson 1 basic theory of information
Lesson 1   basic theory of informationLesson 1   basic theory of information
Lesson 1 basic theory of informationRoma Kimberly Erolin
 
Numerical Analysis_Computer Representation of Numbers.docx
Numerical Analysis_Computer Representation of Numbers.docxNumerical Analysis_Computer Representation of Numbers.docx
Numerical Analysis_Computer Representation of Numbers.docxadmercano101
 
Beyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer ArithmeticBeyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer Arithmeticinside-BigData.com
 
Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...
Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...
Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...ammarqazi53
 
A floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptxA floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptxNiveditaAcharyya2035
 
Beating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit ArithmeticBeating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit Arithmeticinside-BigData.com
 
Multimedia lossy compression algorithms
Multimedia lossy compression algorithmsMultimedia lossy compression algorithms
Multimedia lossy compression algorithmsMazin Alwaaly
 
Digital electronics & microprocessor Batu- s y computer engineering- arvind p...
Digital electronics & microprocessor Batu- s y computer engineering- arvind p...Digital electronics & microprocessor Batu- s y computer engineering- arvind p...
Digital electronics & microprocessor Batu- s y computer engineering- arvind p...ARVIND PANDE
 
Computer Oraganizaation.pptx
Computer Oraganizaation.pptxComputer Oraganizaation.pptx
Computer Oraganizaation.pptxbmangesh
 
Design and Implementation of n-bit fastest Adder(IJECCE national conference w...
Design and Implementation of n-bit fastest Adder(IJECCE national conference w...Design and Implementation of n-bit fastest Adder(IJECCE national conference w...
Design and Implementation of n-bit fastest Adder(IJECCE national conference w...Shantanu Thakre
 
Data Converter Fundamentals presented by Oveis Dehghantanha
Data Converter Fundamentals presented by Oveis DehghantanhaData Converter Fundamentals presented by Oveis Dehghantanha
Data Converter Fundamentals presented by Oveis Dehghantanhaoveis dehghantanha
 
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
3. IEEE 754 FLOATING POINT For Comp. ORG.pdfPIPALIYANISARG
 
Binary Codes and Number System
Binary Codes and Number SystemBinary Codes and Number System
Binary Codes and Number SystemDebarati Das
 

Similar to Java中的Float&Double以及Ieee754研究V1.0 (20)

Lec 02 data representation part 2
Lec 02 data representation part 2Lec 02 data representation part 2
Lec 02 data representation part 2
 
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
 
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...
 
Lesson 1 basic theory of information
Lesson 1   basic theory of informationLesson 1   basic theory of information
Lesson 1 basic theory of information
 
Lesson 1 basic theory of information
Lesson 1   basic theory of informationLesson 1   basic theory of information
Lesson 1 basic theory of information
 
Numerical Analysis_Computer Representation of Numbers.docx
Numerical Analysis_Computer Representation of Numbers.docxNumerical Analysis_Computer Representation of Numbers.docx
Numerical Analysis_Computer Representation of Numbers.docx
 
Beyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer ArithmeticBeyond Floating Point – Next Generation Computer Arithmetic
Beyond Floating Point – Next Generation Computer Arithmetic
 
Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...
Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...
Applications of ICT Lecture 3.pptxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj...
 
A floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptxA floating-point adder (IEEE 754 floating-point.pptx
A floating-point adder (IEEE 754 floating-point.pptx
 
Number system
Number systemNumber system
Number system
 
Beating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit ArithmeticBeating Floating Point at its Own Game: Posit Arithmetic
Beating Floating Point at its Own Game: Posit Arithmetic
 
DLD-W3-L1.pptx
DLD-W3-L1.pptxDLD-W3-L1.pptx
DLD-W3-L1.pptx
 
Multimedia lossy compression algorithms
Multimedia lossy compression algorithmsMultimedia lossy compression algorithms
Multimedia lossy compression algorithms
 
Digital electronics & microprocessor Batu- s y computer engineering- arvind p...
Digital electronics & microprocessor Batu- s y computer engineering- arvind p...Digital electronics & microprocessor Batu- s y computer engineering- arvind p...
Digital electronics & microprocessor Batu- s y computer engineering- arvind p...
 
Computer Oraganizaation.pptx
Computer Oraganizaation.pptxComputer Oraganizaation.pptx
Computer Oraganizaation.pptx
 
Design and Implementation of n-bit fastest Adder(IJECCE national conference w...
Design and Implementation of n-bit fastest Adder(IJECCE national conference w...Design and Implementation of n-bit fastest Adder(IJECCE national conference w...
Design and Implementation of n-bit fastest Adder(IJECCE national conference w...
 
Data Converter Fundamentals presented by Oveis Dehghantanha
Data Converter Fundamentals presented by Oveis DehghantanhaData Converter Fundamentals presented by Oveis Dehghantanha
Data Converter Fundamentals presented by Oveis Dehghantanha
 
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
 
Binary Codes and Number System
Binary Codes and Number SystemBinary Codes and Number System
Binary Codes and Number System
 
Number system
Number systemNumber system
Number system
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Java中的Float&Double以及Ieee754研究V1.0

  • 1. Java 中的 float&double 以及 IEEE754 研究 Zianed Hou zianed@live.cn 1、IEEE754 二进制浮点数算术标准,全称 ANSI/IEEE Std 754-1985 2、二进制浮点数的表示 3、浮点数的舍入 4、数值处理中的异常 5、Java 中的 float 和 double 6、Java 中的 BigDecimal 7、IEEE-754 发展 Zianed Version 1.0 1
  • 2. 1、IEEE754 二进制浮点数算术标准,全称 ANSI/IEEE Std 754-1985 IEEE754 浮点数:有理数中某些特定自己的数的数字表示,在计算机中用来近似表示 任意某个实数。这个实数由一个整数或定点数乘以某个技术的整数次幂得到, 类似于基数为 10 的科学计数法。 一个浮点数 a 由两个数 m 和 e 来表示:a=m*b^e。选择一个基数 b 和精度 p。m 即尾数是形式如 d.ddddd 的 p 位数,每一位介于 0~b-1 之间的数,小数点左 侧的数字不为 0。 浮点运算伴随着因为无法精确表述而进行的近似或舍入。 浮点数标准:浮点数标准是关于浮点数存储和计算的标准。在多数情况下它 是指 IEEE 754 标准(包括单精度浮点数、双精度浮点数等的格式和运算规定)。 在 IEEE 标准 754 之前,业界并没有一个统一的浮点数标准。很多计算机制造商都 设计自己的浮点数规则,以及运算细节。那时,实现的速度和简易性比数字的 精确性更受重视。这给代码的可移植性造成了障碍。直到 1985 年,Intel 打算为其 的 8086 微处理器引进一种浮点数协处理器的时候,聘请了加州大学伯克利分校 的 William Kahan 教授──最优秀的数值分析家之一来为 8087 FPU 设计浮点数格 式;William Kahan 又找来两个专家来协助他,于是就有了 KCS 组合(Kahn, Coonan, and Stone),并共同完成了 Intel 的浮点数格式设计。Intel 的 KCS 浮点数 格式完成得如此出色,以致于 IEEE 决定采用一个非常接近 KCS 的方案作为 IEEE 的标准浮点格式。IEEE 于 1985 年制订了二进制浮点运算标准(Binary Floating- Point Arithmetic)IEEE 754。该标准限定指数的底为 2。同年,被美国引用为 ANSI 标准。目前,几乎所有计算机都支持该标准,大大改善了科学应用程序的可移植 性。考虑到 IBM System/370 的影响,IEEE 于 1987 年推出了与底数无关的二进制 浮点运算标准 IEEE 854。同年,被美国引用为 ANSI 标准。1989 年,国际标准组 织 IEC 批准 IEEE 754/854 为国际标准 IEC 559:1989。后来经修订后,标准号改为 IEC 60559。现在,几乎所有的浮点处理器完全或基本支持 IEC 60559。C99 的浮点 运算也支持 IEC 60559。 IEEE 二进位浮点数算术标准(IEEE 754)是浮点数运算标准,为许多 CPU 与浮点运算器所采用。这个标准定义了表示浮点数的格式(包括负零-0)与反常值 (denormal number),一些特殊数值(无穷与非数值 NaN),以及这些数值的“浮点数运 算子”;它也指明了四种数值修约规则和五种例外状况(包括例外发生的时机 与处理方式)。 IEEE 754 规定了四种表示浮点数值的方式:单精确度(32 位元)、双精确 度(64 位元)、延伸单精确度(43 位元以上,很少使用)与延伸双精确度 Zianed Version 1.0 2
  • 3. (79 位元以上,通常以 80 位元实做)。只有 32 位元模式有强制要求,其他都是 选择性的。大部分语言都有提供 IEEE 格式与算术,但有些将其列为非必要的。 例如,IEEE 754 问世之前就有的 C 语言,现在有包括 IEEE 算术,但不算作强制要 求(C 语言的 float 通常是指 IEEE 单精确度,而 double 是指双精确度)。 该标准的全称为 IEEE 二进位浮点数算术标准(ANSI/IEEE Std 754- 1985),又称 IEC 60559:1989,微处理器系统的二进位浮点数算术(本来的编号是 IEC 559:1989) 。后来还有“与基数无关的浮点数”的“IEEE 854-1987 标准”,有规定基数为 2 跟 10 的状况。 历史来源,参考《IEEE754:An Interview with William Kahan》 2、二进制浮点数的表示 二进制浮点数以二号数值表示法格式存储,将最高位指定为符号位(Sign bit), 次高的 e 位指定为指数位(Exponent Bias),小数部分(decimal Fraction) 即最后剩下的 f 位。有效位数=符号位+指数位+小数位。 S+Exp+Fraction 32 位单精度 32=1+8+23,偏正值+127。 单精度的指数部分是-126~+127 加上 127 ,指数值的大小从 1~254(0 和 255 是特殊值)。浮点小数计算时,指数值减去偏正值将是实际的指数大小。 例如: 3F400000=0.75 0=正数;1111110=126,指数=126-127=-1; 1 .10000000000000000000000=2^0+*2^-1=1.5,小数部分=1.5 也就是 1.5*2^-1=0.75 Zianed Version 1.0 3
  • 4. 数值的计算: 特殊数值的表示: 64 位双精度 64=1+11+52,偏正值+1023 双精度的指数部分是 1022~+1023 加上 1023 ,指数值的大小从 1~ 2046(0(2 进位全为 0)和 2047(2 进位全为 1)是特殊值)。浮点小数计算时,指数 值减去偏正值将是实际的指数大小。 Zianed Version 1.0 4
  • 5. 数值的计算: 特殊数值的表示: gradual underflow——>the subnormal numbers Gradual [Gradual underlow provides a number of advantages over abrupt underfow.Without it, the gap between zero and the smallest foating-point number is much larger than the gap between successive small foating-point numbers. Without gradual underfow one can find two values, X and Y (such that X is not equal to Y), and yet when you subtract them their result is zero. While a skilled numerical analyst could work around this limitation in many situations, this anomaly would tend to cause problems for less skilled programmers.——Charles Severance] 结论:小数部分最高有效位由指数部分决定。如果指数在 0 < exponent < 2^e-1 之间,那么小数部分最高有效位将是 1,而且这个数将被称为正规形式。 Zianed Version 1.0 5
  • 6. 如果指数是 0,有效数最高有效位将会是 0,并且这个数将被称为非正规形式。这 里有三个特殊值需要指出: 如果 指数 是 0 并且 小数部分 是 0,这个数±0(和符号位相关) 如果 指数 = 2^e - 1 并且 小数部分 是 0,这个数是 ±无穷大(同样和符号 位相关) 如果 指数 = 2^e - 1 并且 小数部分 非 0,这个数表示为不是一个数 (NaN)。 以上规则,总结如下: 形势 指数 小数部分 零 00 非正规形式 0 非 0 正规形式 1 到 2^e - 2 任意 无穷 2^e-1 0 NaN 2^e-1 非零 Fraction 位二进制数所能表示的二进制个数是在 2^(Fraction)个,而 Fraction 位 十进制数可以表示的个数是 10^(Fraction)个。 可以表示的比例是 2^(Fraction)/10^(Fraction)=0.2^(Fraction)。Fraction 越大,所能表 示的浮点数的比例就越小。 Subnormal numbers: The numbers closest to the inverse of these bounds (−1×10−95 and 1×10−95) are considered to be the smallest (in magnitude) normal numbers; non-zero numbers between these smallest numbers are called subnormal numbers. Subnormal numbers provide the guarantee that addition and subtraction of floating-point numbers never underflows; two nearby floating-point numbers always have a representable non-zero difference. Without gradual underflow, the subtraction a−b can underflow and produce zero even though the values are not equal. This can, in turn, lead to division by zero errors that cannot occur when gradual underflow is used. By filling the underflow gap like this, significant digits are lost, but not to the extent as when doing flush to zero on underflow (losing all significant digits all through the underflow gap). Hence the production of a denormal number is sometimes called gradual underflow because it allows a calculation to lose precision slowly when the result is small. Some processors handle subnormal values in hardware, just as normal values are. Subnormal values (as arguments or results) then pose no particular performance issue; they are handled at the same speed as normal values. But some processors leave the handling of subnormal values to system software, only handling normal values (and zero) in hardware. In this case, computing with subnormal values is significantly slower than computing with normal values.Some applications need to contain code to avoid Zianed Version 1.0 6
  • 7. subnormal numbers. Either to maintain accuracy, or in order to avoid the performance penalty in some processors If the exponent is all 0s, but the fraction is non-zero (else it would be interpreted as zero), then the value is a subnormalized number, which does not have an assumed leading 1 before the binary point. Thus, this represents a number (-1)s × 0.f × 2-126, where s is the sign bit and f is the fraction. For double precision, denormalized numbers are of the form (-1)s × 0.f × 2-1022. From this you can interpret zero as a special type of denormalized number. 各种类型数值计算中 Subnormal 的值: 3、浮点数的舍入 任何有效数上的运算结果,通常都存放在较长的寄存器中,当结果返回为浮 点格式时,必须将多出来的位元丢弃。 有多种方法可以用来执行舍入作业,实际上 IEEE 标准列出 4 种不同的方 法: 舍入到最接近:会将结果舍入为最接近且可以表示的值。 向+∞方向舍入:会将结果向正无限大的方向舍入。ceil()方法 向-∞方向舍入: 会将结果向负无限大的方向舍入。 floor()方法 向 0 方向舍入: 会将结果向 0 的方向舍入。 (int)截断舍入 Zianed Version 1.0 7
  • 8. IEEE754-2008 的舍入算法: 1)Rounding to nearest(向最近的数值进行舍入) Round to nearest,ties to eve 向偶数进行方向舍入,也就是将最后一位取 0 的 一种舍入方式;是默认的舍入方式,也是推荐的舍入方式。(理解:0 是偶数,所 以偶数比奇数多,自然取偶数的精确性更大些) Round to nearest,ties away from zero 向远离 0 的一侧进行舍入;正数取大的 数值,负数取小的数值。 2)Directed roundings(定向的舍入) Round toward 0 向 0 方向舍入。 Round toward +∞ 将结果向正无限大的方向舍入。 Round toward -∞ 将结果向负无限大的方向舍入。 4、数值处理中的异常 标准定义了五种异常(非法操作、0除、上溢、下溢、不精确) The standard defines five exceptions, each of which has a corresponding status flag that (except in certain cases of underflow) is raised when the exception occurs. No other action is required, but alternatives are recommended (see below). The five possible exceptions are: Invalid operation (e.g., square root of a negative number) Division by zero Overflow (a result is too large to be represented correctly) Underflow (a result is very small (outside the normal range) and is inexact) Inexact. Underflow Recall that the IEEE format for a normal floating-point number is: (-1) s..(e- bias) . (2...) . 1.f where s is the sign bit, e is the biased exponent, and f is the fraction. Only s, e, and f need to be stored to fully specify the number. Because the implicit leading bit of the significand is defined to be 1 for normal numbers, it need not be stored. The smallest positive normal number that can be stored, then, has the negative exponent of greatest magnitude and a fraction of all zeros. Even smaller numbers can be accommodated by considering the leading bit to be zero rather than one. In the double- precision format, this effectively extends the minimum exponent from 10-308 to 10-324, because the fraction part is 52 bits long (roughly 16 decimal digits.) These are the Zianed Version 1.0 8
  • 9. subnormal numbers; returning a subnormal number (rather than flushing an underflowed result to zero) is gradual underflow. Clearly, the smaller a subnormal number, the fewer nonzero bits in its fraction; computations producing subnormal results do not enjoy the same bounds on relative roundoff error as computations on normal operands. However, the key fact about gradual underflow is that its use implies: Underflowed results need never suffer a loss of accuracy any greater than that which results from ordinary roundoff error. Addition, subtraction, comparison, and remainder are always exact when the result is very small. Recall that the IEEE format for a subnormal floating-point number is: (-1) s..(- bias+ 1) . (2....) . 0.f where s is the sign bit, the biased exponent e is zero, and f is the fraction. Note that the implicit power-of-two bias is one greater than the bias in the normal format, and the implicit leading bit of the fraction is zero. Gradual underflow allows you to extend the lower range of representable numbers. It is not smallness that renders a value questionable, but its associated error. Algorithms exploiting subnormal numbers have smaller error bounds than other systems. The next section provides some mathematical justification for gradual underflow. Why Gradual Underflow? The purpose of subnormal numbers is not to avoid underflow/overflow entirely, as some other arithmetic models do. Rather, subnormal numbers eliminate underflow as a cause for concern for a variety of computations (typically, multiply followed by add). For a more detailed discussion, see "Underflow and the Reliability of Numerical Software" by James Demmel and "Combatting the Effects of Underflow and Overflow in Determining Real Roots of Polynomials" by S. Linnainmaa. The presence of subnormal numbers in the arithmetic means that untrapped underflow (which implies loss of accuracy) cannot occur on addition or subtraction. If x and y are within a factor of two, then x -y is error-free. This is critical to a number of algorithms that effectively increase the working precision at critical places in algorithms.In addition, gradual underflow means that errors due to underflow are no worse than usual roundoff error. This is a much stronger statement than can be made about any other method of handling underflow, and this fact is one of the best justifications for gradual underflow. Most of the time, floating-point results are rounded: computed result = (true result). Roundoff In IEEE arithmetic, with rounding mode to nearest, 1/2 ulp0 . roundoff . Zianed Version 1.0 9
  • 10. of the computed result.ulp is an acronym for Unit in the Last Place. The least significant bit of the fraction of a number in its standard representation, is the last place. If the roundoff error is less than or equal to one half unit in the last place, then the calculation is correctly rounded. the ulp for each floating point data type would be Precision Value single = 2^-23 ~ 1.192092896e-07 double = 2^-52 ~ 2.22044604925031308e-16 Intel double extended = 2^-11 ~ 1.92592994438723585305597794258492732e-34 Any conventional set of representable floating-point numbers has the property that the worst effect of one inexact result is to introduce an error no worse than the distance to one of the representable neighbors of the computed result. When subnormal numbers are added to the representable set and gradual underflow is implemented, the worst effect of one inexact or underflowed result is to introduce an error no greater than the distance to one of the representable neighbors of the computed result. In particular, in the region between zero and the smallest normal number, the distance between any two neighboring numbers equals the distance between zero and the smallest subnormal number. The presence of subnormal numbers eliminates the possibility of introducing a roundoff error that is greater than the distance to the nearest representable number. In the absence of gradual underflow, user programs need to be sensitive to the implicit inaccuracy threshold. For example, in single precision, if underflow occurs in some parts of a calculation, and Store 0 is used to replace underflowed results with 0, then accuracy can be guaranteed only to around 10-31, not 10-38, the usual lower range for single-precision exponents. This means that programmers need to implement their own method of detecting when they are approaching this inaccuracy threshold, or else abandon the quest for a robust, stable implementation of their algorithm.Some algorithms can be scaled so that computations don't take place in the onstricted area near zero. However, scaling the algorithm and detecting the inaccuracy threshold can be difficult and time-consuming for each numerical program. 认识: Gradual underflow 可以使程序在截断数据的时间向更精确做出判断,提高数据精 确度,subnormal number 就是这种用来做判断的数据的一个范围值。 5、Java 中的 float 和 double Java 1)float Float 中,指数 8 位,小数位 23 位 Zianed Version 1.0 10
  • 11. 指数范围:0~255(-127 偏差)=-127~128 其中-127 和 128 是用来表示特殊数字的-126~127 表示的是正常数字。 以上数值为了表示 0:因此用指数-127 来表示: 0x00000000=(0,-127,1.0)=2^(-127)表示零。 以上数值为了表示无穷和非数:因此用指数 128 来表示: 0x[7|f]f[8|c]......=(0|1,128,)=2^128 当小数部分是 0 时表示无穷大;当小数部分不是 0 时表示的是非数。 因此,有效范围内的最大正值为 0x7f7fffff(0,127,7ffff) 有效范围内的最小正值为 0x00000001=2^(-23)*2^(-126)=2^(-149) MIN_NORMAL=0x0080000=(0,1,0)=2^(-126) Why -126? Otherwise we’d be skipping numbers 0.1 * 2-126 = 1.0 * 2-127 Subnormal number 为最小值到 MIN_NORMAL 之间的所有数值。 Approximate Subnormalized Normalized Decimal Single ± 2-149 to (1-2-23)×2-126 = (2-23——1-2-23)×2-126 ± 2-126 to (2-2-23)×2127 ± ~10-44.85 to ~1038.53 Precision Double ± 2-1074 to (1-2-52)×2-1022 =(2-52——1-2-52)×2-1022 ± 2-1022 to (2-2-52)×21023 ± ~10-323.3 to ~10308.3 Precision java.lang.Float 中 //@code Float.intBitsToFloat(0x7f800000)即(0,255-127=128,) public static final double POSITIVE_INFINITY = 1.0 / 0.0; //@Float.intBitsToFloat(0xff800000)即(0,255-127=128,) public static final float NEGATIVE_INFINITY = -1.0f / 0.0f; //@Float.intBitsToFloat(0x7fc00000)即(0,255-127=128,) public static final float NaN = 0.0f / 0.0f; //@Float.intBitsToFloat(0x7f7fffff)</code>. public static final float MAX_VALUE = 0x1.fffffeP+127f; // 3.4028235e+38f //@code Float.intBitsToFloat(0x00800000 即(0,1-127=-126,) public static final float MIN_NORMAL = 0x1.0p-126f; // 1.17549435E-38f //@code Float.intBitsToFloat(0x1) 即(0,0-127=-127,) public static final float MIN_VALUE = 0x0.000002P-126f; // 1.4e-45f 获取代表 float 的 32bit 的 int 型表示: float public static int floatToIntBits(float value) float public static native int floatToRawIntBits(float value) Zianed Version 1.0 11
  • 12. 返回 32bit 代表的 float 浮点数值: int public static native float intBitsToFloat(int bits) 2)double java.lang.Double 中 //@code Double.longBitsToDouble(0x7ff0000000000000L)</code>. public static final double POSITIVE_INFINITY = 1.0 / 0.0; //@code Double.longBitsToDouble(0xfff0000000000000L)</code>. public static final double NEGATIVE_INFINITY = -1.0 / 0.0; //@code Double.longBitsToDouble(0x7ff8000000000000L)</code>. public static final double NaN = 0.0d / 0.0; //@code Double.longBitsToDouble(0x7fefffffffffffffL)</code>. public static final double MAX_VALUE = 0x1.fffffffffffffP+1023; // 1.7976931348623157e+308 //@code Double.longBitsToDouble(0x0010000000000000L) public static final double MIN_NORMAL = 0x1.0p-1022; // 2.2250738585072014E-308 //@code Double.longBitsToDouble(0x1L) public static final double MIN_VALUE = 0x0.0000000000001P-1022; // 4.9e-324 获取代表 double 的 64bit 的 long 型表示: double public static long doubleToLongBits(double value) double public static native long doubleToRawLongBits(double value) 返回 64bit 代表的 double 浮点数值: long public static native double longBitsToDouble(long bits) 6、Java 中的 BigDecimal Java extends Number implements Comparable<BigDecimal> 实现了比较接口,可以进行相互之间的比较。 不可变的、任意精度的有符号十进制数。BigDecimal 由任意精度的整数非标 度值 和 32 位的整数标度 (scale) 组成。如果为零或正数,则标度是小数点后的 位数。如果为负数,则将该数的非标度值乘以 10 的负 scale 次幂。因此, BigDecimal 表示的数值是 (unscaledValue × 10-scale)。 在金融以及涉及到钱的计算中,都需要使用该类替换 double,以防止引起累积误 差,获取高准确的数值计算。 测试代码: double v1 = 1.0; double v2 = 0.9; out.println(v1 - v2); BigDecimal value1 = new BigDecimal(Double.toString(v1)); Zianed Version 1.0 12
  • 13. BigDecimal value2 = new BigDecimal(Double.toString(v2)); out.println(value1.subtract(value2)); 使用时的注意: 1)构造函数采用 String 参数而不是 double 参数,因为 double val 本身就是一个 精确表示的值。 public BigDecimal(String val) 2)基本运算 加 public BigDecimal add(BigDecimal augend) 减 public BigDecimal subtract(BigDecimal subtrahend) 乘 public BigDecimal multiply(BigDecimal multiplicand) 除 public BigDecimal divide(BigDecimal divisor) 3)舍入方式 // Rounding Modes //Rounding mode to round away from zero. Always increments the digit. public final static int ROUND_UP = 0; //Rounding mode to round towards zero. Never increments the digit. public final static int ROUND_DOWN = 1; //Rounding mode to round towards positive infinity. public final static int ROUND_CEILING = 2; //Rounding mode to round towards negative infinity. public final static int ROUND_FLOOR = 3; //Rounding mode to round towards nearest neighbor //unless both neighbors are equidistant, in which case round up. public final static int ROUND_HALF_UP = 4; //Rounding mode to round towards nearest neighbor //unless both neighbors are equidistant, in which case round down. public final static int ROUND_HALF_DOWN = 5; //Rounding mode to round towards the nearest neighbor //unless both neighbors are equidistant, in which case, round // towards the even neighbor. public final static int ROUND_HALF_EVEN = 6; //Rounding mode to assert that the requested operation has an exact // result, hence no rounding is necessary. public final static int ROUND_UNNECESSARY = 7; 4)比较两个数值大小的方法是: public int compareTo(BigDecimal val) 不能用 Zianed Version 1.0 13
  • 14. public boolean equals(Object x) 进行。 测试代码: import java.math.BigDecimal; import static java.lang.System.out; BigDecimal value11 = new BigDecimal("1"); BigDecimal value21 = new BigDecimal("1.0"); out.println(value11.equals(value21));//false out.println(value11.compareTo(value21) == 0 ? true : false false);//true 7、IEEE-754 发展 IEEE-754 IEEE 754-2008 governs binary floating-point arithmetic. It specifies number formats, basic operations, conversions, and exceptional conditions.The 2008 edition supersedes both the 754-1985 standard and the related IEEE 854-1987 which generalized 754-1985 to cover decimal arithmetic as well as binary. IEEE-7442008 标准定义了: 1)arithmetic formats:二进制、十进制浮点数; signed zeros,subormal numbers,infinites,NaN(Not a Number) ; 2)interchange formats:encoding(bit strings)编码数据在交换时已获得更高效率; 3)rounding algorithms:在计算和转换时进行的舍入方式; 4)operations:操作符在计算层次上的格式; 5)exception handling:指示异常条件(0 除、溢出)。 Basic Format(基本格式): Name Common name Base Digits E min E max Notes binary16 Half precision 2 10+1 -14 +15 storage, not basic binary32 Single precision 2 23+1 -126 +127 binary64 Double precision 2 52+1 -1022 +1023 binary128 Quadruple precision 2 112+1 -16382 +16383 decimal32 10 7 -95 +96 storage, not basic decimal64 10 16 -383 +384 decimal128 10 34 -6143 +6144 All the basic formats are available in both hardware and software implementations Arithmetic Format(算术格式): 用浮点数的 sign(符号位)、significand(小数位)、exponent(指数位)表示的浮点数。 Zianed Version 1.0 14
  • 15. Interchange format(交换格式) The width of the exponent field for a k-bit format is computed as w = round(4×log2(k))- 13.(指数位位数的计算公式) 十进制浮点数: Kahan 教授的看法:使用十进制浮点数,以避免人为错误。也就是这种错误: double d = 0.1;实际上,d≠0.1。IBM 公司的看法:在经济、金融和与人相关的程 序中,使用十进制浮点数。但是,由于没有硬件支持,用软件实现的十进制浮点计 算比硬件实现的二进制浮点计算要慢 100-1000 倍。由于被 IEEE 754R 所采纳, IBM 公司将在下一代 Power 芯片中实现十进制 FPU。 总结附录图表: Reference: EN http://standards.ieee.org/ http://ieeexplore.ieee.org/servlet/opac?punumber=2355 http://ieeexplore.ieee.org/servlet/opac?punumber=2502 Zianed Version 1.0 15
  • 16. http://ieeexplore.ieee.org/servlet/opac?punumber=4610933 http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html http://grouper.ieee.org/groups/754/meeting-materials/2001-10-18-langdesign.pdf http://steve.hollasch.net/cgindex/coding/ieeefloat.html http://754r.ucbtest.org/ http://grouper.ieee.org/groups/754/ http://en.wikipedia.org/wiki/IEEE_754-2008 http://en.wikipedia.org/wiki/Subnormal_number http://docs.sun.com/source/806-3568/ncgTOC.html http://docs.sun.com/source/806-3568/ncg_goldberg.html http://chrishecker.com/Miscellaneous_Technical_Articles#Floating_Point http://chrishecker.com/Miscellaneous_Technical_Articles http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3 http://babbage.cs.qc.edu/IEEE-754/IEEE-754hex32.html http://hal.archives-ouvertes.fr/hal-00128124/en/ http://www.dec.usc.es/arith16/docs/ieee_toc_arith_cfp.pdf http://docs.sun.com/app/docs/doc/802-5692/6i9ecs3oa?l=zh&a=view ZH http://zh.wikipedia.org/wiki/%E6%B5%AE%E7%82%B9%E6%95%B0 http://zh.wikipedia.org/wiki/IEEE_754 http://codex.wordpress.org.cn/IEEE_754 http://www.cnblogs.com/bossin/archive/2007/04/08/704567.html http://stephensuen.spaces.live.com/Blog/cns!1p1G_DGhjYiYGmj6keNZQAcw!172.entry http://docs.sun.com/app/docs/coll/44.4?l=zh Zianed Version 1.0 16