Decimal Arithmetic Specification,
version 1.08 Copyright (c) IBM Corporation, 2003. All rights reserved. © |
8 Jan 2003 |
[previous | contents | next] |
There are three components to the model:
This specification defines these components in the abstract. It neither defines the way in which operations are expressed (which might vary depending on the computer language or other interface being used),[1] nor does it define the concrete representation (specific layout in storage, or in a processor's register, for example) of numbers or context.
The remainder of this section describes the abstract model for each component.
Numbers may be finite numbers (numbers whose value can be represented exactly) or they may be special values (infinities and other values which are not finite numbers).
Notes:
For a subnormal result, the minimum value of the exponent becomes –Elimit–(precision–1), called Etiny, where precision is the working precision, as described below. The result will be rounded, if necessary, to ensure that the exponent is no smaller than Etiny. If, during this rounding, the result becomes inexact, then the Underflow condition is raised. A subnormal result does not necessarily raise Underflow, therefore, but is always indicated by the Subnormal condition (even if, after rounding, its value is 0).
When a number underflows to zero during a calculation, its exponent will be Etiny. The maximum value of the exponent is unaffected.
Note that the minimum value of the exponent for subnormal numbers is the same as the minimum value of exponent which can arise during operations which do not result in subnormal numbers, which occurs in the case where clength = precision.
Similarly, duples are used to indicate the special values. These have the form [sign, special-value], where the sign is indicated as before, and the special-value is one of inf, qNaN, or sNaN, representing infinity, quiet NaN, or signaling NaN, respectively.
So, for example, the triad [0,2708,-2] represents the number 27.08, the triad [1,1953,0] represents the integer -1953, the duple [1,inf] represents the number –¥, and the duple [0,qNaN] represents a quiet NaN.
For example, in a object-oriented language, the addition operation might be effected by a method called add, whereas in a calculator application it might be effected by clicking on a button icon. In other uses, an infix ‘+’ symbol might be used to indicate addition. And in all cases, the operation might be carried out in software, hardware, or some combination of these.
Similarly, operations which are distinct in the specification need not be mapped one-to-one to distinct operations in the implementation – it is only necessary that all the core operations are available. For example, conversions to a string could be handled by a single method, with variations determined from context or additional arguments.
An integer which must be positive (greater than 0). This sets the maximum number of significant digits that can result from an arithmetic operation.
In the abstract, there is no upper bound on the precision (although a specific precision must always be provided). In practice there may need to be some upper limit to it (for example, the length of the maximum coefficient supported by a concrete representation). This limit must be expressed as an integral number of decimal digits.
Similarly, there may be a lower bound on the setting on precision, which may be the same as the upper bound (for example, if it is implied by the length of the maximum coefficient supported by a concrete representation). This limit must also be expressed as an integral number of decimal digits.
An implementation must designate a precision to be known as single precision (see IEEE 854 §3.2.1). This must be greater than 5 (see IEEE 854 §3.1) and within the range of implemented precisions. It is recommended that it be at least 9.[7]
An implementation may also designate a precision to be known as double precision, which must be within the range of implemented precisions (see IEEE 854 §3.2.2). If a double precision is designated, then the following constraints apply:
A named value which indicates the algorithm to be used when rounding is necessary. Rounding is applied when a result coefficient has more significant digits than the value of precision; in this case the result coefficient is shortened to precision digits and may then be incremented by one (which may require a further shortening), depending on the rounding algorithm selected and the remaining digits of the original coefficient. The exponent is adjusted to compensate for any shortening.
The following rounding algorithms are defined and must be supported:[9]
(Truncate.) The discarded digits are ignored; the result is unchanged.
If the discarded digits represent greater than or equal to half (0.5) of the value of a one in the next left position then the result should be incremented by 1 (rounded up). Otherwise the discarded digits are ignored.
If the discarded digits represent greater than half (0.5) the value of a one in the next left position then the result should be incremented by 1 (rounded up). If they represent less than half, then the result is not adjusted (that is, the discarded digits are ignored).
Otherwise (they represent exactly half) the result is unaltered if its rightmost digit is even, or incremented by 1 (rounded up) if its rightmost digit is odd (to make an even digit).
(Round toward +¥.) If all of the discarded digits are zero or if the sign is 1 the result is unchanged. Otherwise, the result should be incremented by 1 (rounded up). If this would cause overflow then the result will be [0,inf].
(Round toward –¥.) If all of the discarded digits are zero or if the sign is 0 the result is unchanged. Otherwise, the sign is 1 and the coefficient should be incremented by 1. If this would cause overflow then the result will be [1,inf].
The exceptional conditions are grouped into signals, which can be controlled individually. The context contains a flag (which is either 0 or 1) and a trap-enabler (which also is either 0 or 1) for each signal.
For each of the signals, the corresponding flag is set to 1 when the signal occurs. It is only reset to 0 by explicit user action.
For each of the signals, the corresponding trap-enabler indicates which action is to be taken when the signal occurs (see IEEE 854 §7). If 0, a defined result is supplied, and execution continues (for example, an overflow is perhaps converted to a positive or negative infinity). If 1, then execution of the operation is ended or paused and control passes to a ‘trap handler’, which will have access to the defined result.
The signals are:
raised when a non-zero dividend is divided by zero
raised when a result is not exact (one or more non-zero coefficient digits were discarded during rounding)
raised when a result would be undefined or impossible
This signal cannot occur, and is therefore optional, in an implementation where the lower bound for precision is equal to the maximum length of the coefficient.
raised when the exponent of a result is too large to be represented
raised when a result has been rounded (that is, some zero or non-zero coefficient digits were discarded)
raised when a result is subnormal (its adjusted exponent is less than Emin), before any rounding
raised when a result is both subnormal and inexact.
This specification does not define the means by which flags and traps are reset or altered, respectively, or the means by which traps are effected.[10]
Notes:
It is recommended that if a double precision is designated then a third extended double default context be provided, with the same settings as the extended default context except that the precision is set to the double precision.
[1] | Indeed, some variations of operations could be selected by using context settings outside the scope of this specification. |
[2] | That is, the maximum value of the coefficient will be an integral power of ten, less one – for example, 99999999999999999999. |
[3] | See IEEE 854 §3.1. |
[4] | This rule, a requirement for both ANSI X3.274 and IEEE 854, constrains the number of values which would overflow or underflow when inverted (divided into 1). |
[5] | Typically, in a concrete representation, certain out-of-range values of the exponent are used to indicate the special values, and the coefficient is used to carry additional diagnostic information for quiet NaNs. |
[6] | IEEE 854 defines subnormal numbers as numbers whose absolute value is non-zero and is closer to zero than ten to the power of Emin. This definition includes zeros with tiny exponents. |
[7] | This is the ‘narrowest basic precision’ described in IEEE 854 §3.2.1. Strictly speaking, single precision should be the narrowest precision supported; however it is assumed that when precision is fully variable the intent of IEEE 854 is that the designation applies to the narrowest default precision – the programmer is permitted to specify a narrower precision explicitly. |
[8] | This constraint is very slightly tighter than that defined by IEEE 854, which specifies that Elimit for double be greater than or equal to 8 × Elimit for double, plus 7. There is a preference for human-oriented limits, so it is suggested that the Elimit for single be one tenth of, or one digit shorter than, the Elimit for double. |
[9] | The term ‘round to nearest’ is not used because it is ambiguous. round-half-up is the usual round-to-nearest algorithm used in European countries, in international financial dealings, and in the USA for tax calculations. round-half-even is often used for other applications in the USA, where it is usually called ‘round to nearest’ and is sometimes called ‘banker's rounding’. |
[10] | IEEE 854 suggests that there be a mechanism allowing traps to return a substitute result to the operation that raised the exception, but this may not be possible in some environments (including some object-oriented environments). |