Age | Commit message (Collapse) | Author | Files | Lines |
|
This benchmark renders one of the radial gradients used in the
swfdec-youtube cairo trace 500 times and reports the average time it
took.
V2: Update .gitignore
|
|
The source, mask and destination buffers are initialised to 0xCC just after
they are allocated. Between each benchmark, there are a pair of memcpys,
from the destination buffer to the source buffer and back again (there are
no explanatory comments, but presumably this is an effort to flush the
caches). However, it has an unintended consequence, which is to change the
contents of the buffers on entry to subsequent benchmarks. This means it is
not a fair test: for example, with over_n_8888 (featured in the following
patches) it reports L2 and even M tests as being faster than the L1 test,
because after the L1 test, the source buffer is filled with fully opaque
pixels, for which over_n_8888 has a shortcut.
The fix here is simply to reverse the order of the memcpys, so src and
destination are both filled with 0xCC on entry to all tests.
|
|
The check-formats programs reveals that the 8 bit pipeline cannot meet
the current 0.004 acceptable deviation specified in utils.c, so we
have to increase it. Some of the failing pixels were captured in
pixel-test, which with this commit now passes.
== a4r4g4b4 DISJOINT_XOR a8r8g8b8 ==
The DISJOINT_XOR operator applied to an a4r4g4b4 source pixel of
0xd0c0 and a destination pixel of 0x5300ea00 results in the exact
value:
fa = (1 - da) / sa = (1 - 0x53 / 255.0) / (0xd / 15.0) = 0.7782
fb = (1 - sa) / da = (1 - 0xd / 15.0) / (0x53 / 255.0) = 0.4096
r = fa * (0xc / 15.0) + fb * (0xea / 255.0) = 0.99853
But when computing in 8 bits, we get:
fa8 = ((255 - 0x53) * 255 + 0xdd / 2) / 0xdd = 0xc6
fb8 = ((255 - 0xdd) * 255 + 0x53 / 3) / 0x53 = 0x68
r8 = (fa8 * 0xcc + 127) / 255 + (fb8 * 0xea + 127) / 255 = 0xfd
and
0xfd / 255.0 = 0.9921568627450981
for a deviation of 0.00637118610187, which we then have to consider
acceptable given the current implementation.
By switching to computing the result with
r = (fa * s + fb * d + 127) / 255
rather than
r = (fa * s + 127) / 255 + (fb * d + 127) / 255
the deviation would be only 0.00244961747442, so at some point it may
be worth doing either this, or switching to floating point for
operators that involve divisions.
Note that the conversion from 4 bits to 8 bits does not cause any
error in this case because both rounding and bit replication produces
an exact result when the number of from-bits divide the number of
to-bits.
== a8r8g8b8 OVER r5g6b5 ==
When OVER compositing the a8r8g8b8 pixel 0x0f00c300 with the x14r6g6b6
pixel 0x03c0, the true floating point value of the resulting green
channel is:
0xc3 / 255.0 + (1.0 - 0x0f / 255.0) * (0x0f / 63.0) = 0.9887955
but when compositing 8 bit values, where the 6-bit green channel is
converted to 8 bit through bit replication, the 8-bit result is:
0xc3 + ((255 - 0x0f) * 0x3c + 127) / 255 = 251
which corresponds to a real value of 0.984314. The difference from the
true value is 0.004482 which is bigger than the acceptable deviation
of 0.004. So, if we were to compute all the CONJOINT/DISJOINT
operators in floating point, or otherwise make them more accurate, the
acceptable deviation could be set at 0.0045.
If we were doing the 6-bit conversion with rounding:
(x / 63.0 * 255.0 + 0.5)
instead of bit replication, the deviation in this particular case
would be only 0.0005, so we may want to consider this at some
point.
|
|
This test program contains a table of individual operator/pixel
combinations. For each pixel combination, images of various sizes are
filled with the pixels and then composited. The result is then
verified against the output of do_composite(). If the result doesn't
match, detailed error information is printed.
The initial 14 pixel combinations currently all fail.
|
|
The check-formats.c test depends on the exact format of the strings
returned from these functions, so add a test here.
a1-trap-test isn't the ideal place, but it seems like overkill to add
a new test just for these trivial checks.
|
|
Given an operator and two formats, this program will composite and
check all pixels where the red and blue channels are 0. That is, if
the two formats are a8r8g8b8 and a4r4g4b4, all source pixels matching
the mask
0xff00ff00
are composited with the given operator against all destination pixels
matching the mask
0xf0f0
and the result is then verified against the do_composite() function
that was moved to utils.c earlier.
This program reveals that a number of operators and format
combinations are not computed to within the precision currently
accepted by pixel_checker_t. For example:
check-formats over a8r8g8b8 r5g6b5 | grep failed | wc -l
30
reveals that there are 30 pixel combinations where OVER produces
insufficiently precise results for the a8r8g8b8 and r5g6b5 formats.
|
|
This function returns the a, r, g, and b masks corresponding to the
pixel checker's format.
|
|
This function takes a pixel in the format corresponding to the pixel
checker, and converts to a color_t.
|
|
So that it can be used in other tests.
|
|
In c2cb303d33ec11390b93cabd90f0f9, return_if_fail()s were added to
prevent the trapezoid rasterizers from being called with non-alpha
formats. However, stress-test actually does call the rasterizers with
non-alpha formats, but because _pixman_log_error() is disabled in
versions with an odd minor number, the errors never materialized.
Fix this by changing the argument to random format to an enum of three
values DONT_CARE, PREFER_ALPHA, or REQUIRE_ALPHA, and then in the
switch that calls the trapezoid rasterizers, pass the appropriate
value for the function in question.
|
|
In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which
are usually configured this way. For other CPUs, this should only add a
constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs.
The problems were caused by cachelines becoming permanently evicted from
the cache, because the code that was intended to pull them back in again on
each iteration assumed too long a cache line (for the L1 test) or failed to
read memory beyond the first pixel row (for the L2 test). Also, the reloading
of the source buffer was unnecessary.
These issues were identified by Siarhei in this post:
http://lists.freedesktop.org/archives/pixman/2013-January/002543.html
|
|
Old functions pixman_transform_point() and pixman_transform_point_3d()
now become just wrappers for pixman_transform_point_31_16() and
pixman_transform_point_31_16_3d(). Eventually their uses should be
completely eliminated in the pixman code and replaced with their
extended range counterparts. This is needed in order to be able
to correctly handle any matrices and parameters that may come
to pixman from the code responsible for XRender implementation.
|
|
This test uses __float128 data type when it is available
for implementing a "perfect" reference implementation. The
output from from pixman_transform_point_31_16() and
pixman_transform_point_31_16_affine() is compared with the
reference implementation to make sure that the rounding
errors may only show up in a single least significant bit.
The platforms and compilers, which do not support __float128
data type, can rely on crc32 checksum for the pseudorandom
transform results.
|
|
This adds two extra tests, src_n_8 and src_8_8, which I have been
using to benchmark my ARMv6 changes.
I'd also like to propose that it requires an exact test name as the
executable's argument, as achieved by this strstr to strcmp change.
Without this, it is impossible to only benchmark (for example)
add_8_8, add_n_8 or src_n_8, due to those also being substrings of
many other test names.
|
|
With the operator_name() and format_name() functions there is no
longer any reason for composite.c to have its own table of format and
operator names.
|
|
This function returns the name of the given format code, which is
useful for printing out debug information. The function is written as
a switch without a default value so that the compiler will warn if new
formats are added in the future. The fake formats used in the fast
path tables are also recognized.
The function is used in alpha_map.c, where it replaces an existing
format_name() function, and in blitters-test.c, affine-test.c, and
scaling-test.c.
|
|
This function returns the name of the given operator, which is useful
for printing out debug information. The function is done as a switch
without a default value so that the compiler will warn if new
operators are added in the future.
The function is used in affine-test.c, scaling-test.c, and
blitters-test.c.
|
|
INCLUDES has been deprecated starting with automake 1.13. Convert all
occurrences with the recommended AM_CPPFLAGS replacement.
|
|
|
|
The entry points add_trapezoids(), rasterize_trapezoid() and
composite_trapezoid() are exercised with random trapezoids.
This uncovers crashes with stress-test seeds 0x17ee and 0x313c.
|
|
This is useful in demo programs to display the alpha channel.
|
|
Just use SSE2 intrinsics to do unaligned memory accesses as
a workaround for this gcc bug related to vector extensions.
|
|
They are the same as 'prng_rand_n' and 'prng_rand'
|
|
Wallclock time for running pixman "make check" (compile time not included):
----------------------------+----------------+-----------------------------+
| old PRNG (LCG) | new PRNG (Bob Jenkins) |
Processor type +----------------+------------+----------------+
| gcc 4.5 | gcc 4.5 | gcc 4.7 (simd) |
----------------------------+----------------+------------+----------------+
quad Intel Core i7 @2.8GHz | 0m49.494s | 0m43.722s | 0m37.560s |
dual ARM Cortex-A15 @1.7GHz | 5m8.465s | 4m37.375s | 3m45.819s |
IBM Cell PPU @3.2GHz | 23m0.821s | 20m38.316s | 16m37.513s |
----------------------------+----------------+------------+----------------+
But some tests got a particularly large boost. For example benchmarking and
profiling blitters-test on Core i7:
=== before ===
$ time ./blitters-test
real 0m10.907s
user 0m55.650s
sys 0m0.000s
70.45% blitters-test blitters-test [.] create_random_image
15.81% blitters-test blitters-test [.] compute_crc32_for_image_internal
2.26% blitters-test blitters-test [.] _pixman_implementation_lookup_composite
1.07% blitters-test libc-2.15.so [.] _int_free
0.89% blitters-test libc-2.15.so [.] malloc_consolidate
0.87% blitters-test libc-2.15.so [.] _int_malloc
0.75% blitters-test blitters-test [.] combine_conjoint_general_u
0.61% blitters-test blitters-test [.] combine_disjoint_general_u
0.40% blitters-test blitters-test [.] test_composite
0.31% blitters-test libc-2.15.so [.] _int_memalign
0.31% blitters-test blitters-test [.] _pixman_bits_image_setup_accessors
0.28% blitters-test libc-2.15.so [.] malloc
=== after ===
$ time ./blitters-test
real 0m3.655s
user 0m20.550s
sys 0m0.000s
41.77% blitters-test.n blitters-test.new [.] compute_crc32_for_image_internal
15.77% blitters-test.n blitters-test.new [.] prng_randmemset_r
6.15% blitters-test.n blitters-test.new [.] _pixman_implementation_lookup_composite
3.09% blitters-test.n libc-2.15.so [.] _int_free
2.68% blitters-test.n libc-2.15.so [.] malloc_consolidate
2.39% blitters-test.n libc-2.15.so [.] _int_malloc
2.27% blitters-test.n blitters-test.new [.] create_random_image
2.22% blitters-test.n blitters-test.new [.] combine_conjoint_general_u
1.52% blitters-test.n blitters-test.new [.] combine_disjoint_general_u
1.40% blitters-test.n blitters-test.new [.] test_composite
1.02% blitters-test.n blitters-test.new [.] prng_srand_r
1.00% blitters-test.n blitters-test.new [.] _pixman_image_validate
0.96% blitters-test.n blitters-test.new [.] _pixman_bits_image_setup_accessors
0.90% blitters-test.n libc-2.15.so [.] malloc
|
|
The 'lcg' prefix is going to be misleading if we replace
PRNG algorithm.
|
|
This adds a fast SIMD-optimized variant of a small noncryptographic
PRNG originally developed by Bob Jenkins:
http://www.burtleburtle.net/bob/rand/smallprng.html
The generated pseudorandom data is good enough to pass "Big Crush"
tests from TestU01 (http://en.wikipedia.org/wiki/TestU01).
SIMD code uses http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
which is a GCC specific extension. There is also a slower alternative
code path, which should work with any C compiler.
The performance of filling buffer with random data:
Intel Core i7 @2.8GHz (SSE2) : ~5.9 GB/s
ARM Cortex-A15 @1.7GHz (NEON) : ~2.2 GB/s
IBM Cell PPU @3.2GHz (Altivec) : ~1.7 GB/s
|
|
Also dropped redundant volatile keyword because any object
can be accessed via char* pointer without breaking aliasing
rules. The compilers are able to optimize this function to either
constant 0 or 1.
|
|
After two fixed-point numbers are multiplied, the result is shifted
into place, but up until now pixman has simply discarded the low-order
bits instead of rounding to the closest number.
Fix that by adding 0x8000 (or 0x2 in one place) before shifting and
update the test checksums to match.
|
|
Signed-off-by: Stefan Weil <sw@weilnetz.de>
|
|
These modifications fix lots of compiler warnings for systems where
sizeof(unsigned long) != sizeof(void *).
This is especially true for MinGW-w64 (64 bit Windows).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
|
|
pixman_composite_trapezoids() is supposed to composite across the
entire destination, but it actually only composites across the extent
of the trapezoids. For operators such as ADD or OVER this doesn't
matter since a zero source has no effect on the destination. But for
operators such as SRC or IN, it does matter.
So for such operators where a zero source has an effect, don't clip to
the trap extents.
|
|
This test runs the new floating point combiners on random input with
divide-by-zero exceptions turned on.
With the floating point combiners the only thing we guarantee is that
divide-by-zero exceptions are not generated, so change
enable_fp_exceptions() to only enable those, and rename accordingly.
|
|
Comment out some formats in blitters-test that are going to rely on
floating point in some upcoming patches.
|
|
In preparation for an upcoming change of the wide pipe to use floating
point, comment out some formats in glyph-test that are going to be
using floating point and update the CRC32 value to match.
|
|
Otherwise the test fails on big-endian.
Tested-by: Matt Turner <mattst88@gmail.com>
|
|
This test demonstrates a bug where a certain transformation matrix can
result in an infinite loop. It was extracted as a standalone version
of "affine-test 212944861".
If given the option -nf, the test program will not call fail_after()
and therefore potentially run forever.
|
|
Printing out the translation and scale is a bit misleading because the
actual transformation matrix can be modified in various other ways.
Instead simply print the whole transformation matrix that is actually
used.
|
|
This program exercises a bug in pixman-image.c where "-1" and "1" were
used instead of the correct "- pixman_fixed_1" and "pixman_fixed_1".
With the fast implementation enabled:
% ./rotate-test
rotate test failed! (checksum=35A01AAB, expected 03A24D51)
Without it:
% env PIXMAN_DISABLE=fast ./rotate-test
pixman: Disabled fast implementation
rotate test passed (checksum=03A24D51)
V2: The first version didn't have lcg_srand (testnum) in test_transform().
|
|
In general, the component alpha version of an operator is supposed to
do this:
- multiply source with mask in all channels
- multiply mask with source alpha in all channels
- compute the regular operator in all channels using the
mask value whenever source alpha is called for
The first two steps are usually accomplished with the function
combine_mask_ca(), but for operators where source alpha is not used,
such as SRC, ADD and OUT, the simpler function
combine_mask_value_ca(), which doesn't compute the new mask values,
can be used.
However, the PDF blend modes generally *do* make use of source alpha,
so they can't use combine_mask_value_ca() as they do now. They have to
use combine_mask_ca().
This patch fixes this in combine_multiply_ca() and the CA combiners
generated by PDF_SEPARABLE_BLEND_MODE.
|
|
Update the CRC values based on what the general implementation
reports. This reveals a bug in the fast implementation:
% env PIXMAN_DISABLE="mmx sse2" ./test/scaling-test
pixman: Disabled mmx implementation
pixman: Disabled sse2 implementation
scaling test failed! (checksum=AA722B06, expected 03A23E0C)
vs.
% env PIXMAN_DISABLE="mmx sse2 fast" ./test/scaling-test
pixman: Disabled fast implementation
pixman: Disabled mmx implementation
pixman: Disabled sse2 implementation
scaling test passed (checksum=03A23E0C)
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Handle cross-directory dependencies using PHONY targets and clean up
some redundancies.
|
|
These functions are operating on double precision values, so use pow()
instead of powf().
|
|
The sRGB conversion has to be done every time the limits are being
computed. Without this fix, pixel_checker_get_min/max() will produce
the wrong results when called from somewhere other than
pixel_checker_check().
|
|
glyph-test would sometimes set a solid image as an alpha map, which is
not allowed. When this happened and the debug spew was enabled,
messages like this one would be generated:
*** BUG ***
In pixman_image_set_alpha_map: The expression
!alpha_map || alpha_map->type == BITS was false
Set a breakpoint on '_pixman_log_error' to debug
Fix this by not passing the ALLOW_SOLID flag to create_image() when
the resulting is to be used as an alpha map.
|
|
The rectangles in the clip region set in set_general_properties()
would sometimes overflow, which would lead to messages like these:
*** BUG ***
In pixman_region32_union_rect: Invalid rectangle passed
Set a breakpoint on '_pixman_log_error' to debug
when the micro version number of pixman is even.
Fix this by detecting the overflow and clamping such that the x2/y2
coordinates are less than INT32_MAX.
|
|
Composite checks random combinations of operations that now also have
sRGB sources, masks and destinations, and stress-test validates the
read/write primitives.
|
|
The initialization work is already performed correctly in image_init().
|
|
stress-test current almost never composites anything because the clip
rectangles and transformations are such that either
_pixman_compute_composite_region32() or analyze_extent() will return
FALSE.
Fix this by:
- making log_rand() return smaller numbers so that the clip rectangles
are more likely to be within the destination image
- adding rand_x() and rand_y() functions that pick positions within an
image and using them for positioning alpha maps and source/mask
positions.
- making it less likely that clip regions are used in general
These changes make the test take longer, so speed it up a little by
making most images smaller and by reducing the maximum convolution
filter from 17x19 to 3x4.
With these changes, stress-test reveals a crash in iteration 0xd39
where fast_composite_tiled_repeat() creates an indexed image without a
palette.
|
|
Macro BILINEAR_INTERPOLATION_BITS in pixman-private.h selects
the number of fractional bits used for bilinear interpolation.
scaling-test and affine-test have checksums for 4-bit, 7-bit
and 8-bit configurations.
|
|
Scale factor is selected to be nearly 1x, so that the MPix/s results
can be directly compared with the results of non-scaled compositing
operations.
|