Decimal Arithmetic Specification,
version 1.11 Copyright (c) IBM Corporation, 2003. All rights
reserved. © |
28 Feb 2003 |
[previous | contents | next]
|
The full specification in the body of this document
defines a decimal floating-point arithmetic which gives exact results and
preserves exponents where possible. If insufficient precision is available for
this, then numbers are handled according to the rules of IEEE 854. The use of
IEEE 854 rules implies that special values (infinities and NaNs) are allowed, as
subnormal values and the value –0.
For some applications and programming languages (especially those intended
for use by people who are not mathematically sophisticated), it may be
appropriate to provide an arithmetic where infinite, NaN, or subnormal results
are always treated as errors, –0 results are hidden, and other (largely
cosmetic) changes are provided to aid acceptance of results.
The arithmetic defined in ANSI X3.274 is such an arithmetic; this appendix
describes the differences between this and the full specification.
Implementations which support this subset explicitly might provide the subset
behavior under the control of a parameter in the context[1] or might provide a different
interface (additional or parameterized methods, for example).
Simplified number set
In the
subset arithmetic, a reduced set of number values is supported and (where
appropriate) numbers with positive exponents have their exponent reduced to
zero. Specifically:
- In the to-number conversion, if the coefficient for a finite
number has the value zero, then the sign and the exponent are
both set to 0.
- If the coefficient in a result has the value zero, then the
sign is set to 0 and (unless the operation is rescale) the
exponent is set to 0.[2]
- In the to-number conversion, strings which represent special values
are not permitted. (That is, only finite numbers are accepted.)
- Subnormal numbers are not permitted. If the result from a conversion or
operation would be subnormal then an Underflow error results (see below).
- After any operation and the rounding of its result (unless the operation
is rescale), a result with a positive exponent is converted to an
integer provided that the resulting coefficient would have no more than
precision digits. In other words, in this case a positive exponent is
reduced to 0 by multiplying the coefficient by
10exponent (which has the effect of suffixing
exponent zeros).[3]
Operation differences
In the
subset arithmetic, operands are rounded before use if necessary (as in Numerical
Turing[4] and Rexx), the Lost digits
condition is added to the context, the results of some operations are trimmed,
the rounding rule after a subtraction is less conservative, and raising 0 to the
power 0 is not treated as an error. Specifically:
- If the number of decimal digits in the coefficient of an operand to
an operation is greater than the current precision in the context then
the operand is rounded to precision significant digits using the
rounding algorithm described by the context before being used in the
computation. In other words, an automatic ‘convert to shorter’ is applied
before the operation.
- The Lost digits condition is added to the abstract context; it
should be set to 0 in default contexts.
This condition is raised when
non-zero digits are discarded before an operation. This can occur when an
operand which has more leading significant digits in its coefficient
than the precision setting is rounded to precision digits before
use
Note that the lost digits test does not treat trailing decimal zeros
in the coefficient as significant. For example, if precision had
the value 5, then the operands [0,12345,-5]
[0,12345,-2]
[0,12345,0]
[1,12345,0]
[0,123450000,-4]
[0,1234500000,0]
would not cause an exception (whereas [0,123451,-1] or
[0,1234500001,0] would).
- After a divide or power operation is complete and the result
has been rounded, any insignificant trailing zeros are removed. That is, if
the exponent is not zero and the coefficient is a multiple of a
positive power of ten then the coefficient is divided by that power of
ten and the exponent increased accordingly. If the exponent was
negative it will not be increased above zero.
- After an addition operation, the result is rounded to precision
digits if necessary, taking into account any extra (carry) digit on the left
after an addition, but otherwise counting from the position corresponding to
the most significant digit of the operands being added or subtracted (rather
than the most significant digit of the result).
- If both operands to a power operation are zero then the result is 1
(instead of being an error).
- If the right-hand operand to a power operation is negative, the
left-hand operand is used as-is and the final result is inverted. The integer
part of the right-hand operand must fit in precision digits.
- The integer part of the right-hand operand to a rescale operation
must fit in precision digits.
Exceptional condition and rounding mode
rules
In the subset arithmetic, exceptional conditions other than the
informational conditions (Lost digits, Inexact, Rounded, and Subnormal) must be
treated as errors, and results after these errors are undefined. Special values
and subnormal numbers, therefore, are not part of the arithmetic.
In the subset, only the Lost digits trap enabler is required. Inexact,
Rounded, and Subnormal trap enablers are optional, and the others are (in
effect) always set. Similarly, the status bits in the context are
optional.
Only the round-half-up rounding mode is required.
Footnotes:
[1] |
The decNumber package, for example, provides the subset
behavior if the extended bit is set to 0. |
[2] |
This rule, together with the to-number definition,
ensures that numbers with values such as -0 or 0.0000 will not
result from general operations in the subset arithmetic. This allows a
concrete representation for the subset to comprise simply two integers in
twos complement form. |
[3] |
The rule preserves integers as specified by ANSI X3.274,
and in particular ensures that the results of the divide and
divide-integer operations are identical when the result is an exact
integer. |
[4] |
See: T. E. Hull, A. Abrham, M. S. Cohen, A. F. X. Curley,
C. B. Hall, D. A. Penny, and J. T. M. Sawchuk, Numerical Turing,
SIGNUM Newsletter, vol. 20 #3, pp26-34, ACM, May 1985.
|
[previous | contents | next]