diff options
author | Dan Amelang <dan@amelang.net> | 2006-12-09 18:54:47 -0800 |
---|---|---|
committer | Dan Amelang <dan@amelang.net> | 2006-12-09 21:05:20 -0800 |
commit | fea60c7283172be5efb42332a96fe322466bd6ed (patch) | |
tree | d7113a277fcd0ed4ca212c46eb2918eb1ba3ac90 | |
parent | cc75159587a4479951da354cfa282d81c74b0377 (diff) |
Change _cairo_lround to correctly handle edge cases previously missed
A nice side effect of this new approach is that the valid input range
was expanded back to (INT_MIN, INT_MAX]. No performance regressions observed.
Also included is documentation about the internal mysteries of _cairo_lround,
as previously promised.
-rw-r--r-- | src/cairo.c | 205 |
1 files changed, 179 insertions, 26 deletions
diff --git a/src/cairo.c b/src/cairo.c index 30672d7d5..b8747771d 100644 --- a/src/cairo.c +++ b/src/cairo.c @@ -3199,7 +3199,7 @@ _cairo_restrict_value (double *value, double min, double max) /* This function is identical to the C99 function lround(), except that it * performs arithmetic rounding (instead of away-from-zero rounding) and - * has a valid input range of [INT_MIN / 4, INT_MAX / 4] instead of + * has a valid input range of (INT_MIN, INT_MAX] instead of * [INT_MIN, INT_MAX]. It is much faster on both x86 and FPU-less systems * than other commonly used methods for rounding (lround, round, rint, lrint * or float (d + 0.5)). @@ -3216,41 +3216,194 @@ _cairo_restrict_value (double *value, double min, double max) * This function doesn't perform any floating-point calculations, and thus * avoids this penalty. */ -/* XXX needs inline comments explaining the internal magic - */ int _cairo_lround (double d) { + uint32_t top, shift_amount, output; union { - uint32_t ui32[2]; double d; + uint64_t ui64; + uint32_t ui32[2]; } u; - uint32_t exponent, most_significant_word, least_significant_word; - int32_t integer_result; u.d = d; -#ifdef FLOAT_WORDS_BIGENDIAN - most_significant_word = u.ui32[0]; - least_significant_word = u.ui32[1]; -#else - most_significant_word = u.ui32[1]; - least_significant_word = u.ui32[0]; + /* If the integer word order doesn't match the float word order, we swap + * the words of the input double. This is needed because we will be + * treating the whole double as a 64-bit unsigned integer. Notice that we + * use WORDS_BIGENDIAN to detect the integer word order, which isn't + * exactly correct because WORDS_BIGENDIAN refers to byte order, not word + * order. Thus, we are making the assumption that the byte order is the + * same as the integer word order which, on the modern machines that we + * care about, is OK. + */ +#if ( defined(FLOAT_WORDS_BIGENDIAN) && !defined(WORDS_BIGENDIAN)) || \ + (!defined(FLOAT_WORDS_BIGENDIAN) && defined(WORDS_BIGENDIAN)) + { + uint32_t temp = u.ui32[0]; + u.ui32[0] = u.ui32[1]; + u.ui32[1] = temp; + } #endif - exponent = 1052 - ((most_significant_word >> 20) & 0x7FF); - integer_result = ((most_significant_word & 0xFFFFF) | 0x100000) << 10; - integer_result |= (least_significant_word >> 22); - - if (most_significant_word & 0x80000000) - integer_result = -integer_result; - - integer_result >>= exponent; - - if (exponent > 30) - integer_result = 0; - - integer_result = (integer_result + 1) >> 1; +#ifdef WORDS_BIGENDIAN + #define MSW (0) /* Most Significant Word */ + #define LSW (1) /* Least Significant Word */ +#else + #define MSW (1) + #define LSW (0) +#endif - return integer_result; + /* By shifting the most significant word of the input double to the + * right 20 places, we get the very "top" of the double where the exponent + * and sign bit lie. + */ + top = u.ui32[MSW] >> 20; + + /* Here, we calculate how much we have to shift the mantissa to normalize + * it to an integer value. We extract the exponent "top" by masking out the + * sign bit, then we calculate the shift amount by subtracting the exponent + * from the bias. Notice that the correct bias for 64-bit doubles is + * actually 1075, but we use 1053 instead for two reasons: + * + * 1) To perform rounding later on, we will first need the target + * value in a 31.1 fixed-point format. Thus, the bias needs to be one + * less: (1075 - 1: 1074). + * + * 2) To avoid shifting the mantissa as a full 64-bit integer (which is + * costly on certain architectures), we break the shift into two parts. + * First, the upper and lower parts of the mantissa are shifted + * individually by a constant amount that all valid inputs will require + * at the very least. This amount is chosen to be 21, because this will + * allow the two parts of the mantissa to later be combined into a + * single 32-bit representation, on which the remainder of the shift + * will be performed. Thus, we decrease the bias by an additional 21: + * (1074 - 21: 1053). + */ + shift_amount = 1053 - (top & 0x7FF); + + /* We are done with the exponent portion in "top", so here we shift it off + * the end. + */ + top >>= 11; + + /* Before we perform any operations on the mantissa, we need to OR in + * the implicit 1 at the top (see the IEEE-754 spec). We needn't mask + * off the sign bit nor the exponent bits because these higher bits won't + * make a bit of difference in the rest of our calculations. + */ + u.ui32[MSW] |= 0x100000; + + /* If the input double is negative, we have to decrease the mantissa + * by a hair. This is an important part of performing arithmetic rounding, + * as negative numbers must round towards positive infinity in the + * halfwase case of -x.5. Since "top" contains only the sign bit at this + * point, we can just decrease the mantissa by the value of "top". + */ + u.ui64 -= top; + + /* By decrementing "top", we create a bitmask with a value of either + * 0x0 (if the input was negative) or 0xFFFFFFFF (if the input was positive + * and thus the unsigned subtraction underflowed) that we'll use later. + */ + top--; + + /* Here, we shift the mantissa by the constant value as described above. + * We can emulate a 64-bit shift right by 21 through shifting the top 32 + * bits left 11 places and ORing in the bottom 32 bits shifted 21 places + * to the right. Both parts of the mantissa are now packed into a single + * 32-bit integer. Although we severely truncate the lower part in the + * process, we still have enough significant bits to perform the conversion + * without error (for all valid inputs). + */ + output = (u.ui32[MSW] << 11) | (u.ui32[LSW] >> 21); + + /* Next, we perform the shift that converts the X.Y fixed-point number + * currently found in "output" to the desired 31.1 fixed-point format + * needed for the following rounding step. It is important to consider + * all possible values for "shift_amount" at this point: + * + * - {shift_amount < 0} Since shift_amount is an unsigned integer, it + * really can't have a value less than zero. But, if the shift_amount + * calculation above caused underflow (which would happen with + * input > INT_MAX or input <= INT_MIN) then shift_amount will now be + * a very large number, and so this shift will result in complete + * garbage. But that's OK, as the input was out of our range, so our + * output is undefined. + * + * - {shift_amount > 31} If the magnitude of the input was very small + * (i.e. |input| << 1.0), shift_amount will have a value greater than + * 31. Thus, this shift will also result in garbage. After performing + * the shift, we will zero-out "output" if this is the case. + * + * - {0 <= shift_amount < 32} In this case, the shift will properly convert + * the mantissa into a 31.1 fixed-point number. + */ + output >>= shift_amount; + + /* This is where we perform rounding with the 31.1 fixed-point number. + * Since what we're after is arithmetic rounding, we simply add the single + * fractional bit into the integer part of "output", and just keep the + * integer part. + */ + output = (output >> 1) + (output & 1); + + /* Here, we zero-out the result if the magnitude if the input was very small + * (as explained in the section above). Notice that all input out of the + * valid range is also caught by this condition, which means we produce 0 + * for all invalid input, which is a nice side effect. + * + * The most straightforward way to do this would be: + * + * if (shift_amount > 31) + * output = 0; + * + * But we can use a little trick to avoid the potential branch. The + * expression (shift_amount > 31) will be either 1 or 0, which when + * decremented will be either 0x0 or 0xFFFFFFFF (unsigned underflow), + * which can be used to conditionally mask away all the bits in "output" + * (in the 0x0 case), effectively zeroing it out. Certain, compilers would + * have done this for us automatically. + */ + output &= ((shift_amount > 31) - 1); + + /* If the input double was a negative number, then we have to negate our + * output. The most straightforward way to do this would be: + * + * if (!top) + * output = -output; + * + * as "top" at this point is either 0x0 (if the input was negative) or + * 0xFFFFFFFF (if the input was positive). But, we can use a trick to + * avoid the branch. Observe that the following snippet of code has the + * same effect as the reference snippet above: + * + * if (!top) + * output = 0 - output; + * else + * output = output - 0; + * + * Armed with the bitmask found in "top", we can condense the two statements + * into the following: + * + * output = (output & top) - (output & ~top); + * + * where, in the case that the input double was negative, "top" will be 0, + * and the statement will be equivalent to: + * + * output = (0) - (output); + * + * and if the input double was positive, "top" will be 0xFFFFFFFF, and the + * statement will be equivalent to: + * + * output = (output) - (0); + * + * Which, as pointed out earlier, is equivalent to the original reference + * snippet. + */ + output = (output & top) - (output & ~top); + + return output; +#undef MSW +#undef LSW } |