~sandmann/pixman - Unnamed repository; edit this file to name it for gitweb.

Age	Commit message (Collapse)	Author	Files	Lines
2010-12-08	Add a stress-test program.stresstest	Søren Sandmann Pedersen	3	-0/+871
	This test program tries to use as many rarely-used features as possible, including alpha maps, accessor functions, oddly-sized images, strange transformations, conical gradients, etc. The hope is to provoke crashes or irregular behavior in pixman.
2010-12-08	Make the argument to fence_malloc() an int64_t	Søren Sandmann Pedersen	2	-3/+6
	That way we can detect if someone attempts to allocate a negative size and abort instead of just returning NULL and segfaulting later.
2010-12-08	test/utils.c: Initialize palette->rgba to 0.	Søren Sandmann Pedersen	1	-0/+2
	That way it can be used with palettes that are not statically allocated, without causing valgrind issues.
2010-12-08	test: Move palette initialization to utils.[ch]	Søren Sandmann Pedersen	3	-55/+59

2010-12-08	Extend gradient-crash-test	Søren Sandmann Pedersen	2	-42/+87
	Test the gradients with various transformations, and test cases where the gradients are specified with two identical points.
2010-12-08	Add enable_fp_exceptions() function in utils.[ch]	Søren Sandmann Pedersen	3	-0/+37
	This function enables floating point traps if possible.
2010-12-08	test: Make composite test use some existing macros instead of defining its own	Søren Sandmann Pedersen	2	-30/+28
	Also move the ARRAY_LENGTH macro into utils.h so it can be used elsewhere.
2010-12-07	Fix for potential unaligned memory accesses	Siarhei Siamashka	1	-3/+3
	The temporary scanline buffer allocated on stack was declared as uint8_t array. As a result, the compiler was free to select any arbitrary alignment for it (even though there is typically no reason to use really weird alignments here and the stack is normally at least 4 bytes aligned on most platforms). Having improper alignment is non-portable and can impact performance or even make the code misbehave depending on the target platform. Using uint64_t type for this array should ensure that any possible memory accesses done by pixman code are going to be handled correctly (pixman-combine64.c can access this buffer via uint64_t * pointer). Some alignment related problem was reported in: http://lists.freedesktop.org/archives/pixman/2010-November/000747.html
2010-12-07	ARM: added 'neon_src_rpixbuf_8888' fast path	Siarhei Siamashka	2	-0/+62
	With this optimization added, pixman assisted conversion from non-premultiplied to premultiplied alpha format is now fully NEON optimized (both with and without R/B color components swapping in the process).
2010-12-03	ARM: added 'neon_composite_in_n_8' fast path	Siarhei Siamashka	2	-0/+55

2010-12-03	ARM: added flags parameter to some asm fast path wrapper macros	Siarhei Siamashka	3	-21/+24
	Not all types of operations can be skipped when having transparent solid source or transparent solid mask. Add an extra flags parameter for providing this information to the wrappers.
2010-12-03	ARM: added 'neon_composite_add_8888_n_8888' fast path	Siarhei Siamashka	2	-0/+30

2010-12-03	ARM: added 'neon_composite_add_n_8_8888' fast path	Siarhei Siamashka	2	-0/+33

2010-12-03	ARM: better NEON instructions scheduling for add_8888_8888_8888	Siarhei Siamashka	1	-18/+34
	Provides a minor performance improvement by using pipelining and hiding instructions latencies. Also do not clobber d0-d3 registers (source image pixels) while doing calculations in order to allow the use of the same macro for add_n_8_8888 fast path later. Benchmark from ARM Cortex-A8 @500MHz: == before == add_8888_8888_8888 = L1: 95.94 L2: 42.27 M: 25.60 (121.09%) HT: 14.54 VT: 13.13 R: 12.77 RT: 4.49 (48Kops/s) add_8888_8_8888 = L1: 104.51 L2: 57.81 M: 36.06 (106.62%) HT: 19.24 VT: 16.45 R: 14.71 RT: 4.80 (51Kops/s) == after == add_8888_8888_8888 = L1: 106.66 L2: 47.82 M: 27.32 (129.30%) HT: 15.44 VT: 13.96 R: 12.86 RT: 4.48 (48Kops/s) add_8888_8_8888 = L1: 107.72 L2: 61.02 M: 38.26 (113.16%) HT: 19.48 VT: 16.72 R: 14.82 RT: 4.80 (51Kops/s)
2010-12-03	ARM: added 'neon_composite_add_8888_8_8888' fast path	Siarhei Siamashka	2	-0/+21

2010-12-03	ARM: added 'neon_composite_over_0565_n_0565' fast path	Siarhei Siamashka	2	-0/+32

2010-12-03	ARM: reuse common NEON code for over_{n_8\|8888_n\|8888_8}_0565	Siarhei Siamashka	1	-36/+25
	Renamed suppementary macros from 'over_n_8_0565' to 'over_8888_8_0565', because they can actually support all variants of this operation: over_8888_8_0565/over_n_8_0565/over_8888_n_0565. Also 'over_8888_8_0565' now uses more optimized common code instead of its own variant, improving performance a bit. Even though this operation is still memory bandwidth limited, scaled variants of these fast paths may put more stress on CPU later. Benchmarked on ARM Cortex-A8 @500MHz: == before == over_8888_8_0565 = L1: 67.10 L2: 53.82 M: 44.70 (105.17%) HT: 18.73 VT: 16.91 R: 14.25 RT: 4.80 (52Kops/s) == after == over_8888_8_0565 = L1: 77.83 L2: 58.14 M: 44.82 (105.52%) HT: 20.58 VT: 17.44 R: 15.05 RT: 4.88 (52Kops/s)
2010-12-03	ARM: added 'neon_composite_over_8888_n_0565' fast path	Siarhei Siamashka	2	-0/+32

2010-12-03	ARM: better NEON instructions scheduling for over_n_8_0565	Siarhei Siamashka	1	-43/+77
	Code rearranged to get better instructions scheduling for ARM Cortex-A8/A9. Now it is ~30% faster for the pixel data in L1 cache and makes better use of memory bandwidth when running at lower clock frequencies (ex. 500MHz). Also register d24 (pixels from the mask image) is now not clobbered by supplementary macros, which allows to reuse them for the other variants of compositing operations later. Benchmark from ARM Cortex-A8 @500MHz: == before == over_n_8_0565 = L1: 63.90 L2: 63.15 M: 60.97 ( 73.53%) HT: 28.89 VT: 24.14 R: 21.33 RT: 6.78 ( 67Kops/s) == after == over_n_8_0565 = L1: 82.64 L2: 75.19 M: 71.52 ( 84.14%) HT: 30.49 VT: 25.56 R: 22.36 RT: 6.89 ( 68Kops/s)
2010-12-03	ARM: introduced 'fetch_mask_pixblock' macro to simplify code	Siarhei Siamashka	2	-13/+18
	This macro hides the implementation details of pixels fetching for the mask image just like 'fetch_src_pixblock' does for the source image. This provides more possibilities for reusing the same code blocks in different compositing functions. This patch does not introduce any functional changes and the resulting code in the compiled object file is exactly the same.
2010-12-03	ARM: added 'neon_composite_over_n_8_8' fast path	Siarhei Siamashka	2	-0/+71

2010-11-23	C fast path for a1 fill operation	Siarhei Siamashka	2	-3/+91
	Can be used as one of the solutions to fix bug https://bugs.freedesktop.org/show_bug.cgi?id=31604
2010-11-21	Sun's copyrights belong to Oracle now	Alan Coopersmith	2	-2/+2
	Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
2010-11-19	Fix argument quoting for AC_INIT.	Cyril Brulebois	1	-1/+1
	One gets rid of this accordingly: \| autoreconf -vfi \| autoreconf: Entering directory `.' \| autoreconf: configure.ac: not using Gettext \| autoreconf: running: aclocal --force \| configure.ac:61: warning: AC_INIT: not a literal: "pixman@lists.freedesktop.org" \| autoreconf: configure.ac: tracing \| configure.ac:61: warning: AC_INIT: not a literal: "pixman@lists.freedesktop.org" Signed-off-by: Cyril Brulebois <kibi@debian.org>
2010-11-16	Post-release version bump to 0.21.3	Søren Sandmann Pedersen	1	-1/+1

2010-11-16	Pre-release version bump	Søren Sandmann Pedersen	1	-1/+1

2010-11-16	Generate {a,x}8r8g8b8, a8, 565 fetchers for nearest/affine images	Søren Sandmann Pedersen	1	-33/+144
	There are versions for all combinations of x8r8g8b8/a8r8g8b8 and pad/repeat/none/normal repeat modes. The bulk of each function is an inline function that takes a format and a repeat mode as parameters.
2010-11-12	Improve conical gradients opacity check	Andrea Canciani	1	-0/+1
	Conical gradients are completely opaque if all of their stops are opaque and the repeat mode is not 'none'.
2010-11-12	Fix opacity check	Andrea Canciani	1	-1/+15
	Radial gradients are "conical", thus they can have some non-opaque parts even if all of their stops are completely opaque. To guarantee that a radial gradient is actually opaque, it needs to also have one of the two circles containing the other one. In this case when extrapolating, the whole plane is completely covered (as explained in the comment in pixman-radial-gradient.c).
2010-11-12	Remove unused stop_range field	Andrea Canciani	2	-3/+0

2010-11-10	ARM: optimization for scaled src_0565_0565 with nearest filter	Siarhei Siamashka	2	-0/+77
	The performance improvement is only in the ballpark of 5% when compared against C code built with a reasonably good compiler (gcc 4.5.1). But gcc 4.4 produces approximately 30% slower code here, so assembly optimization makes sense to avoid dependency on the compiler quality and/or optimization options. Benchmark from ARM11: == before == op=1, src_fmt=10020565, dst_fmt=10020565, speed=34.86 MPix/s == after == op=1, src_fmt=10020565, dst_fmt=10020565, speed=36.62 MPix/s Benchmark from ARM Cortex-A8: == before == op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s == after == op=1, src_fmt=10020565, dst_fmt=10020565, speed=94.91 MPix/s
2010-11-10	ARM: NEON optimization for scaled src_0565_8888 with nearest filter	Siarhei Siamashka	2	-0/+20
	Benchmark from ARM Cortex-A8 @720MHz: == before == op=1, src_fmt=10020565, dst_fmt=20028888, speed=8.99 MPix/s == after == op=1, src_fmt=10020565, dst_fmt=20028888, speed=76.98 MPix/s == unscaled == op=1, src_fmt=10020565, dst_fmt=20028888, speed=137.78 MPix/s
2010-11-10	ARM: NEON optimization for scaled src_8888_0565 with nearest filter	Siarhei Siamashka	2	-0/+17
	Benchmark from ARM Cortex-A8 @720MHz: == before == op=1, src_fmt=20028888, dst_fmt=10020565, speed=42.51 MPix/s == after == op=1, src_fmt=20028888, dst_fmt=10020565, speed=55.61 MPix/s == unscaled == op=1, src_fmt=20028888, dst_fmt=10020565, speed=117.99 MPix/s
2010-11-10	ARM: NEON optimization for scaled over_8888_0565 with nearest filter	Siarhei Siamashka	2	-0/+19
	Benchmark from ARM Cortex-A8 @720MHz: == before == op=3, src_fmt=20028888, dst_fmt=10020565, speed=10.29 MPix/s == after == op=3, src_fmt=20028888, dst_fmt=10020565, speed=36.36 MPix/s == unscaled == op=3, src_fmt=20028888, dst_fmt=10020565, speed=79.40 MPix/s
2010-11-10	ARM: NEON optimization for scaled over_8888_8888 with nearest filter	Siarhei Siamashka	2	-0/+20
	Benchmark from ARM Cortex-A8 @720MHz: == before == op=3, src_fmt=20028888, dst_fmt=20028888, speed=12.73 MPix/s == after == op=3, src_fmt=20028888, dst_fmt=20028888, speed=28.75 MPix/s == unscaled == op=3, src_fmt=20028888, dst_fmt=20028888, speed=53.03 MPix/s
2010-11-10	ARM: performance tuning of NEON nearest scaled pixel fetcher	Siarhei Siamashka	1	-6/+27
	Interleaving the use of NEON registers helps to avoid some stalls in NEON pipeline and provides a small performance improvement.
2010-11-10	ARM: macro template in C code to simplify using scaled fast paths	Siarhei Siamashka	1	-0/+40
	This template can be used to instantiate scaled fast path functions by providing main loop code and calling NEON assembly optimized scanline processing functions from it. Another macro can be used to simplify adding entries to fast path tables.
2010-11-10	ARM: nearest scaling support for NEON scanline compositing functions	Siarhei Siamashka	1	-13/+163
	Now it is possible to generate scanline processing functions for the case when the source image is scaled with NEAREST filter. Only 16bpp and 32bpp pixel formats are supported for now. But the others can be also added later when needed. All the existing NEON fast path functions should be quite easy to reuse for implementing fast paths which can work with scaled source images.
2010-11-10	ARM: NEON: source image pixel fetcher can be overrided now	Siarhei Siamashka	2	-33/+52
	Added a special macro 'pixld_src' which is now responsible for fetching pixels from the source image. Right now it just passes all its arguments directly to 'pixld' macro, but it can be used in the future to provide a special pixel fetcher for implementing nearest scaling. The 'pixld_src' has a lot of arguments which define its behavior. But for each particular fast path implementation, we already know NEON registers allocation and how many pixels are processed in a single block. That's why a higher level macro 'fetch_src_pixblock' is also introduced (it's easier to use because it has no arguments) and used everywhere in 'pixman-arm-neon-asm.S' instead of VLD instructions. This patch does not introduce any functional changes and the resulting code in the compiled object file is exactly the same.
2010-11-10	ARM: fix 'vld1.8'->'vld1.32' typo in add_8888_8888 NEON fast path	Siarhei Siamashka	1	-3/+3
	This was mostly harmless and had no effect on little endian systems. But wrong vector element size is at least inconsistent and also can theoretically cause problems on big endian ARM systems.
2010-11-05	Do CPU features detection from 'constructor' function when compiled with gcc	Siarhei Siamashka	2	-3/+39
	There is attribute 'constructor' supported since gcc 2.7 which allows to have a constructor function for library initialization. This eliminates an extra branch for each composite operation and also helps to avoid complains from race condition detection tools like helgrind. The other compilers may or may not support this attribute properly. Ideally, the compilers should fail to compile the code with unknown attribute, so the configure check should do the right job. But in reality the problems are surely possible. Fortunately such problems should be quite easy to find because NULL pointer dereference should happen almost immediately if the constructor fails to run. clang 2.7: supports __attribute__((constructor)) properly and pretends to be gcc tcc 0.9.25: ignores __attribute__((constructor)), but does not pretend to be gcc
2010-11-04	Delete the source_image_t struct.	Søren Sandmann Pedersen	4	-44/+33
	It serves no purpose anymore now that the source_class_t field is gone.
2010-11-04	[mmx] Mark some of the output variables as early-clobber.	Søren Sandmann Pedersen	1	-2/+2
	GCC assumes that input variables in inline assembly are fully consumed before any output variable is written. This means it may allocate the variables in the same register unless the output variables are marked as early-clobber. From Jeremy Huddleston: I noticed a problem building pixman with clang and reported it to the clang developers. They responded back with a comment about the inline asm in pixman-mmx.c and suggested a fix: """ Incidentally, Jeremy, in the asm that reads __asm__ ( "movq %7, %0\n" "movq %7, %1\n" "movq %7, %2\n" "movq %7, %3\n" "movq %7, %4\n" "movq %7, %5\n" "movq %7, %6\n" : "=y" (v1), "=y" (v2), "=y" (v3), "=y" (v4), "=y" (v5), "=y" (v6), "=y" (v7) : "y" (vfill)); all the output operands except the last one should be marked as earlyclobber ("=&y"). This is working by accident with gcc. """ Cc: jeremyhu@apple.com Reviewed-by: Matt Turner <mattst88@gmail.com>
2010-11-04	Remove workaround for a bug in the 1.6 X server.	Søren Sandmann Pedersen	6	-303/+28
	There used to be a bug in the X server where it would rely on out-of-bounds accesses when it was asked to composite with a window as the source. It would create a pixman image pointing to some bogus position in memory, but then set a clip region to the position where the actual bits were. Due to a bug in old versions of pixman, where it would not clip against the image bounds when a clip region was set, this would actually work. So when the pixman bug was fixed, a workaround was added to allow certain out-of-bound accesses. However, the 1.6 X server is so old now that we can remove this workaround. This does mean that if you update pixman to 0.22 or later, you will need to use a 1.7 X server or later.
2010-11-02	Fixed broken configure check for __thread support	Siarhei Siamashka	1	-2/+1
	Somehow the patch from [1] was not applied correctly, fixing that. 1. http://lists.cairographics.org/archives/cairo/2010-September/020826.html
2010-11-01	COPYING: Stop saying that a modification is currently under discussion.	Søren Sandmann Pedersen	1	-39/+40
	Also put the copyright text into a C comment for easier cut and paste.
2010-10-27	Version bump 0.21.1.	Søren Sandmann Pedersen	1	-1/+1
	The previous bump to 0.20.1 was a mistake; it belongs on the 0.20 branch.
2010-10-27	Post-release version bump to 0.20.1	Søren Sandmann Pedersen	1	-1/+1

2010-10-27	Pre-release version bump to 0.20.0	Søren Sandmann Pedersen	1	-2/+2

2010-10-27	Added check to find pthread on Haiku.	Scott McCreary	1	-1/+2