Decimal Arithmetic Specification,
version 1.08 Copyright (c) IBM Corporation, 2003. All rights
reserved. © |
8 Jan 2003 |
[previous | contents | next]
|
This section describes the arithmetic operations on
numbers, including subnormal numbers, negative zeros, and special values (see
also IEEE 854 §6).
Arithmetic operation
notation
In this section, a simplified notation is used to illustrate
arithmetic operations: a number is shown as the string that would result from
using the to-scientific-string operation. Single quotes are used to
indicate that a number converted from an abstract representation is implied.
Also, operations are indicated as functions (taking either one or two
operands), and the sequence ==> means ‘results in’. Hence:
add('12', '7.00') ==> '19.00'
means that the result of the add operation with the operands
[0,12,0] and [0,700,-2] is [0,1900,-2].
Finally, in this example and in the examples below, the context is assumed to
have precision set to 9, rounding set to round-half-up, and
all trap-enablers set to 0.
Arithmetic operation rules
The
following general rules apply to all arithmetic operations.
- Every operation on finite numbers is carried out (as described under the
individual operations below) as though an exact mathematical result is
computed, using integer arithmetic on the coefficient where possible.
If
the coefficient of the theoretical exact result has no more than
precision digits, then (unless there is an underflow or overflow) it is
used for the result without change. Otherwise (it has more than
precision digits) it is rounded (shortened) to exactly precision
digits, using the current rounding algorithm, and the exponent
is increased by the number of digits removed.
If the value of the adjusted
exponent of the result is less than Emin, then an
exceptional condition (subnormal) results. In this case, the calculated
coefficient and exponent form the result, unless the value of the
exponent is less than Etiny, in which case the
exponent will be set to Etiny, the coefficient will be
rounded (possibly to zero) to match the adjustment of the exponent, and the
sign is unchanged. If this rounding gives an inexact result then the Underflow
Exceptional condition is raised.
If the value of the adjusted
exponent of the result is larger than Emax,
then an exceptional condition (overflow) results. In this case, the result is
as defined under the Overflow
Exceptional condition, and may be infinite. It will have the same sign as
the theoretical result.[1]
- Arithmetic using the special value infinity follows the usual
rules, where [1,inf] is less than every finite number and
[0,inf] is greater than every finite number. Under these rules, an
infinite result is always exact. Certain uses of infinity raise an Invalid
operation condition.
- signaling NaNs always raise the Invalid operation condition when
used as an operand to an arithmetic operation. The result in this case is
[0,qNaN].
- The result of any arithmetic operation which has an operand which is a NaN
(a quiet NaN, or signaling NaNs when the
invalid-operation trap enabler is 0) is [0,qNaN]. In this
case, the signs of the operands are ignored (the following rules do not
apply).
- The sign of the result of a multiplication or division will be 1
only if the operands have different signs and neither is a NaN.
- The sign of the result of an addition or subtraction will be 1 only
if the result is less than zero and neither operand is a NaN, except for the
special cases below where the result is a negative 0.
- A result which is a negative zero ([1,0,n]) can occur
under the following conditions only:
- a result is rounded to zero, and the value before rounding had a
sign of 1.
- the operation is an addition of [1,0,0] to [1,0,0], or
a subtraction of [0,0,0] from [1,0,0]
- the operation is an addition of operands with opposite signs (or is a
subtraction of operands with the same sign), the result has a
coefficient of 0, and the rounding is round-floor.
- the operation is a multiplication or division and the result has a
coefficient of 0 and the signs of the operands are different.
- the operation is power, the left-hand operand is
[1,0,0], and the right-hand operand is odd (and positive).
- the operation is rescale or round-to-integer, the
left-hand operand is negative, and the magnitude of the result is zero. In
the case of rescale the final exponent may also be non-zero.
- the operation is square-root and the operand is [1,0,0].
Examples involving special values:
add('Infinity', '1') ==> 'Infinity'
add('NaN', '1') ==> 'NaN'
subtract('1', 'Infinity') ==> '-Infinity'
multiply('-1', 'Infinity') ==> '-Infinity'
subtract('-0', '0') ==> '-0'
multiply('-1', '0') ==> '-0'
divide('-1', 'Infinity') ==> '-0'
divide('1', '0') ==> 'Infinity'
divide('1', '-0') ==> '-Infinity'
divide('-1', '0') ==> '-Infinity'
Notes:
- Operands may have more than precision digits and are not rounded
before use.
- Quiet NaNs are permitted to propagate diagnostic information pertaining to
the origin of the NaN (see IEEE 854 §6.2). Any such diagnostic information,
and the means by which it is propagated, is outside the scope of this
specification.
- The rules above imply that the compare operation can return a quiet
NaN as a result, which indicates an ‘unordered’ comparison (see IEEE 854
§5.7).
- An implementation may use the compare operation ‘under the covers’
to implement a closed set of comparison operations (greater than, equal,
etc.) if desired. In this case, the additional constraints detailed in
IEEE 854 §5.7 will apply; that is, a comparison (such a ‘greater than’) which
does not explicitly allow for an ‘unordered’ result yet would require an
unordered result will give rise to an Invalid
operation condition.
- If a result is rounded, remains finite, and is not subnormal, its
coefficient will have exactly precision digits (except after the
rescale, round-to-integer, or square-root operations, as
described below). That is, only unrounded or subnormal coefficients can have
fewer than precision digits.
- Trailing zeros are not removed after operations. That is, results are
unnormalized.
abs takes one operand. If the operand is
negative, the result is the same as using the minus
operation on the operand. Otherwise, the result is the same as using the plus
operation on the operand.
Examples:
abs('2.1') ==> '2.1'
abs('-100') ==> '100'
abs('101.5') ==> '101.5'
abs('-101.5') ==> '101.5'
add and subtract both take two operands.
If either operand is a special value then the general rules apply.
Otherwise, the operands are added (after inverting the sign used for
the second operand if the operation is a subtraction), as follows:
- The coefficient of the result is computed by adding or subtracting
the aligned coefficients of the two operands. The aligned coefficients are
computed by comparing the exponents of the operands:
- If they have the same exponent, the aligned coefficients are the same as
the original coefficients.
- Otherwise the aligned coefficient of the number with the larger exponent
is its original coefficient multiplied by 10n, where
n is the absolute difference between the exponents, and the aligned
coefficient of the other operand is the same as its original coefficient.
If the signs of the operands differ then the smaller aligned
coefficient is subtracted from the larger; otherwise they are added.
- The exponent of the result is the minimum of the exponents of the
two operands.
- The sign of the result is determined as follows:
- If the result is non-zero then the sign of the result is the sign of the
operand having the larger absolute value.
- Otherwise, the sign of a zero result is 0 unless either both
operands were negative or the signs of the operands were different and the
rounding is round-floor.
The result is
then rounded to to precision digits if necessary, counting from the most
significant digit of the result.
Examples:
add('12', '7.00') ==> '19.00'
add('1E+2', '1E+4') ==> '1.01E+4'
subtract('1.3', '1.07') ==> '0.23'
subtract('1.3', '1.30') ==> '0.00'
subtract('1.3', '2.07') ==> '-0.77'
compare takes two operands and
compares their values numerically. If either operand is a special value
then the general rules apply.
Otherwise, the operands are compared as follows.
If the signs of the operands differ, a value representing each operand
('-1' if the operand is less than zero, '0' if the operand is
zero or negative zero, or '1' if the operand is greater than zero) is
used in place of that operand for the comparison instead of the actual
operand.[2]
The comparison is then effected by subtracting the second operand from the
first and then returning a value according to the result of the subtraction:
'-1' if the result is less than zero, '0' if the result is
zero or negative zero, or '1' if the result is greater than zero.
An implementation may use this operation ‘under the covers’ to implement a
closed set of comparison operations (greater than, equal, etc.) if
desired. It need not, in this case, expose the compare operation itself.
Examples:
compare('2.1', '3') ==> '-1'
compare('2.1', '2.1') ==> '0'
compare('2.1', '2.10') ==> '0'
compare('3', '2.1') ==> '1'
compare('2.1', '-3') ==> '1'
compare('-3', '2.1') ==> '-1'
Note that the result of a compare is always exact and unrounded.
divide takes two operands. If either
operand is a special value then the general rules apply.
Otherwise, if the divisor is zero then either the Division undefined
condition is raised (if the dividend is zero) and the result is NaN, or the
Division by zero condition is raised and the result is an Infinity with a sign
which is the exclusive or of the signs of the operands.
Otherwise, a ‘long division’ is effected, with the division being complete
when either precision digits have been accumulated or the remainder from
a subtraction in the division is zero, as follows:
- An integer variable, adjust, is initialized to 0.
- If the dividend is non-zero, the coefficient of the result is
computed as follows (using working copies of the operand coefficients, as
necessary):
- The operand coefficients are adjusted so that the coefficient of the
dividend is greater than or equal to the coefficient of the divisor and is
also less than ten times the coefficient of the divisor, thus:
- While the coefficient of the dividend is less than the coefficient of
the divisor it is multiplied by 10 and adjust is incremented by
1.
- While the coefficient of the dividend is greater than or equal to ten
times the coefficient of the divisor the coefficient of the divisor is
multiplied by 10 and adjust is decremented by 1.
- The result coefficient is initialized to 0.
- The following steps are then repeated until the division is complete:
- While the coefficient of the divisor is smaller than or equal to the
coefficient of the dividend the former is subtracted from the latter and
the coefficient of the result is incremented by 1.
- If the coefficient of the dividend is now 0 and adjust is
greater than or equal to 0, or if the coefficient of the result has
precision digits, the division is complete.
Otherwise, the
coefficients of the result and the dividend are multiplied by 10 and
adjust is incremented by 1.
- Any remainder (the final coefficient of the dividend) is recorded and
taken into account for rounding.[3]
Otherwise (the
dividend is zero), the the coefficient of the result is zero and
adjust is set as described in step 1 above, calculated as if the
dividend coefficient were 1. (It will then have a value which is one less than
the length of the coefficient of the divisor.)
- The exponent of the result is computed by subtracting the sum of
the original exponent of the divisor and the value of adjust at the
end of the coefficient calculation from the original exponent of the dividend.
- The sign of the result is the exclusive or of the signs of
the operands.
The result is then rounded to precision
digits, if necessary, according to the rounding algorithm and taking into
account the remainder from the division.
Examples:
divide('1', '3' ) ==> '0.333333333'
divide('2', '3' ) ==> '0.666666667'
divide('5', '2' ) ==> '2.5'
divide('1', '10' ) ==> '0.1'
divide('12', '12') ==> '1'
divide('8.00', '2') ==> '4.00'
divide('2.400', '2.0') ==> '1.20'
divide('1000', '100') ==> '10'
divide('1000', '1') ==> '1000'
divide('2.40E+6', '2') ==> '1.20E+6'
divide-integer takes two
operands; it divides two numbers and returns the integer part of the result. If
either operand is a special value then the general rules apply.
Otherwise, the result returned is defined to be that which would result from
repeatedly subtracting the divisor from the dividend while the dividend is
larger than the divisor. During this subtraction, the absolute values of both
the dividend and the divisor are used: the sign of the final result is the same
as that which would result if normal division were used.
In other words, if the operands x and y were given to the
divide-integer and remainder operations, resulting in i and
r respectively, then the identity
x = i×y + r
holds.
The exponent of the result must be 0. Hence, if the result cannot be
expressed exactly within precision digits, the operation is in error and
will fail – that is, the result cannot have more digits than the value of
precision in effect for the operation, and will not be rounded. For
example, divide-integer('10000000000', '3') requires ten digits to
express the result exactly ('3333333333') and would therefore fail if
precision were in the range 1 through 9.
Notes:
- The divide-integer operation may not give the same result as truncating
normal division (which could be affected by rounding).
- The divide-integer and remainder operations are defined so that they may
be calculated as a by-product of the standard division operation (described
above). The division process is ended as soon as the integer result is
available; the residue of the dividend is the remainder.
- The divide and divide-integer operation on the same operands give results
of the same numerical value if no error occurs and there is no residue from
the divide-integer operation.
Examples:
divide-integer('2', '3') ==> '0'
divide-integer('10', '3') ==> '3'
divide-integer('1', '0.3') ==> '3'
max takes two operands, compares their
values numerically, and returns the maximum. If either operand is a NaN then the
general rules apply.
Otherwise, the operands are compared as as though by the compare
operation. If they are numerically equal then the left-hand operand is
chosen as the result. Otherwise the maximum (closer to positive infinity) of the
two operands is chosen as the result. In either case, the result is the same as
using the plus
operation on the chosen operand.
Examples:
max('3', '2') ==> '3'
max('-10', '3') ==> '3'
max('1.0', '1') ==> '1.0'
min takes two operands, compares their
values numerically, and returns the minimum. If either operand is a NaN then the
general rules apply.
Otherwise, the operands are compared as as though by the compare
operation. If they are numerically equal then the left-hand operand is
chosen as the result. Otherwise the minimum (closer to negative infinity) of the
two operands is chosen as the result. In either case, the result is the same as
using the plus
operation on the chosen operand.
Examples:
min('3', '2') ==> '2'
min('-10', '3') ==> '-10'
min('1.0', '1') ==> '1.0'
minus and plus both take one operand, and
correspond to the prefix minus and plus operators in programming languages.
The operations are evaluated using the same rules as add and
subtract; the operations plus(a) and minus(a) (where
a and b refer to any numbers) are calculated as the operations
add('0', a) and subtract('0', b) respectively, where the
'0' has the same exponent as the operand.
Examples:
plus('1.3') ==> '1.3'
plus('-1.3') ==> '-1.3'
minus('1.3') ==> '-1.3'
minus('-1.3') ==> '1.3'
multiply takes two operands. If
either operand is a special value then the general rules apply.
Otherwise, the the operands are multiplied together (‘long multiplication’),
resulting in a number which may be as long as the sum of the lengths of the two
operands, as follows:
- The coefficient of the result, before rounding, is computed by
multiplying together the coefficients of the operands.
- The exponent of the result, before rounding, is the sum of the
exponents of the two operands.
- The sign of the result is the exclusive or of the signs of
the operands.
The result is then rounded to to precision
digits if necessary, counting from the most significant digit of the result.
Examples:
multiply('1.20', '3') ==> '3.60'
multiply('7', '3') ==> '21'
multiply('0.9', '0.8') ==> '0.72'
multiply('0.9', '-0') ==> '-0.0'
multiply('654321', '654321') ==> '4.28135971E+11'
normalize takes one operand. It
has the same semantics as the plus operation, except that the final
result is reduced to its simplest form, with all trailing zeros removed.
That is, while the coefficient is non-zero and a multiple of ten the
coefficient is divided by ten and the exponent is incremented by
1. Alternatively, if the coefficient is zero the exponent is set
to 0. In all cases the sign is unchanged.
Examples:
normalize('2.1') ==> '2.1'
normalize('-2.0') ==> '-2'
normalize('1.200') ==> '1.2'
normalize('-120') ==> '-1.2E+2'
normalize('120.00') ==> '1.2E+2'
normalize('0.00') ==> '0'
remainder takes two operands; it
returns the remainder from integer division. If either operand is a special
value then the general rules apply.
Otherwise, the result is the residue of the dividend after the operation of
calculating integer division as described for divide-integer, rounded to
precision digits if necessary. The sign of the result, if non-zero, is
the same as that of the original dividend.
This operation will fail under the same conditions as integer division (that
is, if integer division on the same two operands would fail, the remainder
cannot be calculated).
Examples:
remainder('2.1', '3') ==> '2.1'
remainder('10', '3') ==> '1'
remainder('-10', '3') ==> '-1'
remainder('10.2', '1') ==> '0.2'
remainder('10', '0.3') ==> '0.1'
remainder('3.6', '1.3') ==> '1.0'
Notes:
- The divide-integer and remainder operations are defined so that they may
be calculated as a by-product of the standard division operation (described
above). The division process is ended as soon as the integer result is
available; the residue of the dividend is the remainder.
- The remainder operation differs from the remainder operation defined in
IEEE 854 (the remainder-near operator), in that it gives the same
results for numbers whose values are equal to integers as would the usual
remainder operator on integers.
For example, the result of the operation
remainder('10', '6') as defined here is '4', and
remainder('10.0', '6') would give '4.0' (as would
remainder('10', '6.0') or remainder('10.0', '6.0')). The
IEEE 854 remainder operation would, however, give the result '-2'
because its integer division step chooses the closest integer, not the one
nearer zero.
remainder-near takes two
operands. If either operand is a special value then the general rules
apply.
Otherwise, if the operands are given by x and y, then the
result is defined to be x – y × n, where n
is the integer nearest the exact value of x ÷ y (if two
integers are equally near then the even one is chosen). If the result is equal
to 0 then its sign will be the sign of x. (See IEEE §5.1.)
This operation will fail under the same conditions as integer division (that
is, if integer division on the same two operands would fail, the remainder
cannot be calculated).[4]
Examples:
remainder-near('2.1', '3') ==> '-0.9'
remainder-near('10', '6') ==> '-2'
remainder-near('10', '3') ==> '1'
remainder-near('-10', '3') ==> '-1'
remainder-near('10.2', '1') ==> '0.2'
remainder-near('10', '0.3') ==> '0.1'
remainder-near('3.6', '1.3') ==> '-0.3'
Notes:
- The remainder-near operation differs from the remainder
operation in that it does not give the same results for numbers whose values
are equal to integers as would the usual remainder operator on integers. For
example, the operation remainder('10', '6') gives the result
'4', and remainder('10.0', '6') gives '4.0' (as
would the operations remainder('10', '6.0') or remainder('10.0',
'6.0')). However, remainder-near('10', '6') gives the result
'-2' because its integer division step chooses the closest integer,
not the one nearer zero.
- The result of this operation is always exact.
- This operation is sometimes known as ‘IEEE remainder’.
rescale takes two operands. If
either operand is a special value then the general rules apply (and
infinities are unchanged), except that if the right-hand operand is infinite, an
Invalid
operation condition is raised, and the result is [0,qNaN].
Otherwise, it returns the number which is equal in value (except for any
rounding) and sign to the first (left-hand) operand and which has an
exponent set to the value of the second (right-hand) operand.
The right-hand operand must be a whole number whose integer part (after any
exponent has been applied) is no more than Emax and no less then
Etiny, and whose fractional part (if any) is all zeros.
The coefficient of the result is derived from that of the left-hand
operand. It may be rounded using the current rounding setting (if the
exponent is being increased), multiplied by a positive power of ten (if
the exponent is being decreased), or is unchanged (if the exponent
is already equal to the right-hand operand).
Unlike other operations, if the length of the coefficient after the
rescaling would be greater than precision then an Overflow condition
results. This guarantees that, unless there is an error condition, the
exponent of the result of a rescale is always the value specified by the
right-hand operand.
Examples:
rescale('2.17', '-3') ==> '2.170'
rescale('2.17', '-2') ==> '2.17'
rescale('2.17', '-1') ==> '2.2'
rescale('2.17', '0') ==> '2'
rescale('2.17', '1') ==> '0E+1'
rescale('2', 'Infinity') ==> 'NaN'
rescale('-0.1', '0') ==> '-0'
rescale('-0', '5') ==> '-0E+5'
rescale('+35236450.6', '-2') ==> 'Infinity'
rescale('-35236450.6', '-2') ==> '-Infinity'
rescale('217', '-1') ==> '217.0'
rescale('217', '0') ==> '217'
rescale('217', '1') ==> '2.2E+2'
rescale('217', '2') ==> '2E+2'
Note that in the penultimate example the number is
[0,22,1], leading to the string in scientific notation as shown.
round-to-integer takes one
operand. Its result is the same as using the rescale operation using the
given operand as the left-hand-operand and 0 as the right-hand-operand.[5]
Examples:
round-to-integer('2.1') ==> '2'
round-to-integer('100') ==> '100'
round-to-integer('100.0') ==> '100'
round-to-integer('101.5') ==> '102'
round-to-integer('-101.5') ==> '-102'
round-to-integer('10E+5') ==> '1000000'
Note: IEEE 854 refers to §4 for this operation, but then implies that
round-half-even rounding should always be used (whereas §4 specifically
allows directed rounding). It is assumed that it was not intended to exclude
directed rounding.
square-root takes one operand,
If the operand is a special value then the general rules apply.
Otherwise, the operand must be greater than or equal to 0. If the value of
the operand is –0 then the result is [1,0,0].
Otherwise, the result is the exact square root of the operand, rounded
according to the setting of precision using the round-half-even
algorithm, and then normalized (as though by the normalize operation).
Examples:
square-root('0') ==> '0'
square-root('-0') ==> '-0'
square-root('0.39') ==> '0.6244998'
square-root('1.00') ==> '1'
square-root('7') ==> '2.64575131'
square-root('10') ==> '3.16227766'
Notes:
- The rounding setting in the context is not used; this means that
the algorithm described in Properly Rounded Variable Precision Square
Root by T. E. Hull and A. Abrham (ACM Transactions on Mathematical
Software, Vol 11 #3, pp229-237, ACM, September 1985) may be used for this
operation.
- A subnormal result is only possible if the working precision is greater
than Emax+1.
- The result of this operation is normalized because an unnormalized result
with an integer coefficient cannot always be defined (e.g.,
square-root('4.0')); this normalization may cause the Rounded
condition.
The following operation is under review. It will
probably either be removed or be changed to simply state that the the
result must be within one ulp. The definition in this section is included
as it defines the results for the power testcase group and
for the reference implementation. |
power takes two operands, and raises a number (the left-hand operand)
to a whole number power (the right-hand operand). If either operand is a
special value then the general rules apply, except as stated below.
Otherwise, the right-hand operand must be a whole number whose integer part
(after any exponent has been applied) has no more than 9 digits and whose
fractional part (if any) is all zeros before any rounding. The operand may be
positive, negative, or zero; if negative, the absolute value of the power is
used, and the left-hand operand is inverted (divided into 1) before use.
For calculating the power, the number (left-hand operand) is in theory
multiplied by itself for the number of times expressed by the power.
In practice (see the note below for the reasons), the power is calculated by
the process of left-to-right binary reduction. For power(x, n):
‘n’ is converted to binary, and a temporary accumulator is set to 1. If
‘n’ has the value 0 then the initial calculation is complete. Otherwise
each bit (starting at the first non-zero bit) is inspected from left to right.
If the current bit is 1 then the accumulator is multiplied by ‘x’. If
all bits have now been inspected then the initial calculation is complete,
otherwise the accumulator is squared by multiplication and the next bit is
inspected.
The multiplications and initial division are done under the normal arithmetic
operation and rounding rules, using the context supplied for the operation,
except that the multiplications (and the division, if needed) are carried out
using an increased precision of precision+elength+1 digits.
Here, elength is the length in decimal digits of the integer part
(coefficient) of the whole number ‘n’ (i.e., excluding any sign,
decimal part, decimal point, or insignificant leading zeros.[6]
If the increased precision needed for the intermediate calculations exceeds
the capabilities of the implementation then an Invalid operation
condition is raised.
If, when raising to a negative power, an underflow occurs during the division
into 1, the operation is not halted at that point but continues.[7]
In addition:
- If both operands are zero, an Invalid
operation condition results.
- If the right-hand operand is infinite, an Invalid
operation condition is raised, the result is [0,qNaN], and the
following rules do not apply.
- If the left-hand operand is infinite, the result will be infinite if the
right-hand side is positive, 1 if the right-hand side is zero, and zero if the
right-hand side is negative. The sign of the result will be 0 if the
right-hand-side is even (or zero), or will be the same as the sign of
the left-hand-side if the right-hand-side is odd.
- If the operation overflows or underflows, the sign of the result
will be 0 if the right-hand-side is even, or will be the same as the
sign of the left-hand-side if the right-hand-side is odd.
Examples:
power('2', '3') ==> '8'
power('2', '-3') ==> '0.125'
power('1.7', '8') ==> '69.7575744'
power('Infinity', '-2') ==> '0'
power('Infinity', '-1') ==> '0'
power('Infinity', '0') ==> '1'
power('Infinity', '1') ==> 'Infinity'
power('Infinity', '2') ==> 'Infinity'
power('-Infinity', '-2') ==> '0'
power('-Infinity', '-1') ==> '-0'
power('-Infinity', '0') ==> '1'
power('-Infinity', '1') ==> '-Infinity'
power('-Infinity', '2') ==> 'Infinity'
power('0', '0') ==> 'NaN'
Notes:
- The result of the power operator is negative if (and only if) the
left-hand operand is negative and the right-hand operand is odd.
- A particular algorithm for calculating powers is described, since it is
efficient (though not optimal) and considerably reduces the number of actual
multiplications performed. It therefore gives better performance than the
simpler definition of repeated multiplication. Since results can occasionally
differ from those of repeated multiplication, the algorithm must be defined
here so that different implementations will give identical results for the
same operation on the same values. Other algorithms for this (and other)
operations may always be used, so long as they give identical results to those
described here.
- Mathematical and transcendental functions are outside the scope of this
specification. However, implementations are encouraged to provide a power
operator which will accept a non-integral right-hand operand when the
left-hand operand is non-negative. In this case it is not required that the
more general function return identical results to the operation described
above.
Footnotes:
[1] |
In practice, it is only necessary to work with
intermediate results of up to twice the current precision. Some rounding
settings may require some inspection of possible remainders or additional
digits (for example, to determine whether a result is exactly 0.5 in the
next position), though their actual values would not be required. For
round-half-up, rounding can be effected by truncating the result to
precision (and adding the count of truncated digits to the
exponent). The first truncated digit is then inspected, and if it
has the value 5 through 9 the result is incremented by 1. This could cause
the result to again exceed precision digits, in which case it is
divided by 10 and the exponent is incremented by 1. |
[2] |
This rule removes the possibility of an arithmetic
overflow during a numeric comparison. |
[3] |
In practice, only two bits need to be noted, indicating
whether the remainder was 0, or was exactly half of the final coefficient
of the divisor, or was in one of the two ranges above or below the
half-way point. |
[4] |
This is a deviation from IEEE 854, necessary to assure
realistic execution times when the operands have a wide range of
exponents. |
[5] |
This operation is defined in order to provide the Round
Floating-Point Number to Integral Value operator described in IEEE 854
§5.5. |
[6] |
The precision specified for the intermediate calculations
ensures that the final result will differ by at most 1, in the least
significant position, from the ‘true’ result (given that the operands are
expressed precisely under the current setting of digits). Half of
this maximum error comes from the intermediate calculation, and half from
the final rounding. |
[7] |
It can only be halted early if the result becomes zero.
|
[previous | contents | next]