~mattst88/pixman - Unnamed repository; edit this file to name it for gitweb.

Age	Commit message (Collapse)	Author	Files	Lines
2013-01-30	mmx: faster bilinear interpolation (get rid of XOR instruction)no-xor-bilinear	Matt Turner	1	-7/+9

2013-01-28	build: Support building Loongson code for 2e, 2f, 3a	Matt Turner	7	-26/+244
	Since binutils refuses to link objects that are compiled with different -march flags, pixman-mmx.c is compiled with varying -march flags into separate shared objects, which are dlopened at runtime. AC_LINK_IFELSE is used to confirm that linking works, since for example an object built with -march=loongson2e cannot be linked with libc.so built with -march=loongson2f. I expect binary distributions' libcs to be built with generic flags, and in such case all three loongson march values can be built. If libc is built with a particular -march=loongson* flag, the linking test will fail and only the -march value matching the C library will be built. If only one -march value is built, avoid dlopen and simply build the code into libpixman-1 like before. Unfortunately, two internal pixman symbols are needed by pixman-mmx.c: _pixman_image_get_solid _pixman_implementation_create They are annotated with PIXMAN_EXPORT, but only in the dlopen case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451
2013-01-27	demo/scale: Add a spin button to set the number of subsample bits	Søren Sandmann Pedersen	2	-1/+36
	For large upscalings the level of subsampling for the filter has a quite visible effect, so make it settable in the UI so that people can experiment with various values.
2013-01-27	Use pixman_transform_point_31_16() from pixman_transform_point()	Siarhei Siamashka	2	-55/+27
	Old functions pixman_transform_point() and pixman_transform_point_3d() now become just wrappers for pixman_transform_point_31_16() and pixman_transform_point_31_16_3d(). Eventually their uses should be completely eliminated in the pixman code and replaced with their extended range counterparts. This is needed in order to be able to correctly handle any matrices and parameters that may come to pixman from the code responsible for XRender implementation.
2013-01-27	test: Added matrix-test for testing projective transform accuracy	Siarhei Siamashka	2	-0/+187
	This test uses __float128 data type when it is available for implementing a "perfect" reference implementation. The output from from pixman_transform_point_31_16() and pixman_transform_point_31_16_affine() is compared with the reference implementation to make sure that the rounding errors may only show up in a single least significant bit. The platforms and compilers, which do not support __float128 data type, can rely on crc32 checksum for the pseudorandom transform results.
2013-01-27	configure.ac: Added detection for __float128 support	Siarhei Siamashka	1	-0/+16
	GCC supports 128-bit floating point data type on some platforms (including but not limited to x86 and x86-64). This may be useful for tests, which need prefectly accurate reference implementations of certain algorithms.
2013-01-27	Add higher precision "pixman_transform_point_*" functions	Siarhei Siamashka	2	-0/+353
	The following new functions are added: pixman_transform_point_31_16_3d() - Calculates the product of a matrix and a vector multiplication. pixman_transform_point_31_16() - Calculates the product of a matrix and a vector multiplication. Then converts the homogenous resulting vector [x, y, z] to cartesian [x', y', 1] variant, where x' = x / z, and y' = y / z. pixman_transform_point_31_16_affine() - A faster sibling of the other two functions, which assumes affine transformation, where the bottom row of the matrix is [0, 0, 1] and the last element of the input vector is set to 1. These functions transform a point with 31.16 fixed point coordinates from the destination space to a point with 48.16 fixed point coordinates in the source space. The results are accurate and the rounding errors may only show up in the least significant bit. No overflows are possible for the affine transformations as long as the input data is provided in 31.16 format. In the case of projective transformations, some output values may be not representable using 48.16 fixed point format. In this case the results are clamped to return maximum or minimum 48.16 values (so that the caller can at least handle NONE and PAD repeats correctly).
2013-01-27	Faster fetch for the C variant of r5g6b5 src/dest iterator	Siarhei Siamashka	1	-1/+30
	Processing two pixels at once is used to reduce the number of arithmetic operations. The speedup relative to the generic fetch_scanline_r5g6b5() from "pixman-access.c" (pixman was compiled with gcc 4.7.2): MIPS 74K 480MHz : 20.32 MPix/s -> 26.47 MPix/s ARM11 700MHz : 34.95 MPix/s -> 38.22 MPix/s ARM Cortex-A8 1000MHz : 87.44 MPix/s -> 100.92 MPix/s ARM Cortex-A9 1700MHz : 150.95 MPix/s -> 158.13 MPix/s ARM Cortex-A15 1700MHz : 148.91 MPix/s -> 155.42 MPix/s IBM Cell PPU 3200MHz : 75.29 MPix/s -> 98.33 MPix/s Intel Core i7 2800MHz : 257.02 MPix/s -> 376.93 MPix/s That's the performance for C code (SIMD and assembly optimizations are disabled via PIXMAN_DISABLE environment variable).
2013-01-27	Faster write-back for the C variant of r5g6b5 dest iterator	Siarhei Siamashka	1	-3/+35
	Unrolling loops improves performance, so just use it here. Also GCC can't properly optimize this code for RISC processors and allocate 0x1F001F constant in a register. Because this constant is too large to be represented as an immediate operand in instructions, GCC inserts some redundant arithmetics. This problem can be workarounded by explicitly using a variable for 0x1F001F constant and also initializing it by a read from another volatile variable. In this case GCC is forced to allocate a register for it, because it is not seen as a constant anymore. The speedup relative to the generic store_scanline_r5g6b5() from "pixman-access.c" (pixman was compiled with gcc 4.7.2): MIPS 74K 480MHz : 33.22 MPix/s -> 43.42 MPix/s ARM11 700MHz : 50.16 MPix/s -> 78.23 MPix/s ARM Cortex-A8 1000MHz : 117.75 MPix/s -> 196.34 MPix/s ARM Cortex-A9 1700MHz : 177.04 MPix/s -> 320.32 MPix/s ARM Cortex-A15 1700MHz : 231.44 MPix/s -> 261.64 MPix/s IBM Cell PPU 3200MHz : 130.25 MPix/s -> 145.61 MPix/s Intel Core i7 2800MHz : 502.21 MPix/s -> 721.73 MPix/s That's the performance for C code (SIMD and assembly optimizations are disabled via PIXMAN_DISABLE environment variable).
2013-01-27	Added C variants of r5g6b5 fetch/write-back iterators	Siarhei Siamashka	1	-34/+127
	Adding specialized iterators for r5g6b5 color format allows us to work on fine tuning performance of r5g6b5 fetch/write-back operations in the pixman general "fetch -> combine -> store" pipeline. These iterators also make "src_x888_0565" fast path redundant, so it can be removed.
2013-01-27	Eliminate duplicate copies of channel flags for pixman_image_composite32()	Chris Wilson	1	-21/+18
	Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27	Always return a valid function from lookup_combiner()	Chris Wilson	2	-4/+13
	We should always have at least a C combiner available, so we never expect the search to fail. If it does, emit an error and return a dummy function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27	Always return a valid function from lookup_composite()	Chris Wilson	5	-159/+153
	We never expect to fail to find the appropriate function as the general_composite_rect should always match. So if somehow we fallthrough the search, emit a _pixman_log_error() and return a dummy function. Note that we remove some conditionals and a level of indentation hence a large amount of code movement. This also reveals that in a few places we are duplicating stack variables that can be eliminated later. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27	sse2: Add fast paths for bilinear source with a solid mask	Chris Wilson	1	-0/+120
	Based on the existing sse2_8888_n_8888 nearest scaling routines. fishbowl on an i5-2500: 60.9s -> 56.9s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27	sse2: Add a fast path for add_n_8_8888	Chris Wilson	1	-0/+99
	This path is being exercised by compositing of trapezoids for clipmasks, for instance as used in the firefox-asteroids cairo-trace. IVB i7-3720qm ./tests/lowlevel-blt-bench add_n_8_8888: reference memcpy speed = 14846.7MB/s (3711.7MP/s for 32bpp fills) before: L1: 681.10 L2: 735.14 M:701.44 ( 28.35%) HT:283.32 VT:213.23 R:208.93 RT: 77.89 ( 793Kops/s) after: L1: 992.91 L2:1017.33 M:982.58 ( 39.88%) HT:458.93 VT:332.32 R:326.13 RT:136.66 (1287Kops/s) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27	sse2: Add a fast path for add_n_8888	Chris Wilson	1	-0/+65
	This path is being exercised by inplace compositing of trapezoids, for instance as used in the firefox-asteroids cairo-trace. IVB i3-3720qm ./tests/lowlevel-blt-bench add_n_888: reference memcpy speed = 14918.3MB/s (3729.6MP/s for 32bpp fills) before: L1:1752.44 L2:2259.48 M:2215.73 ( 58.80%) HT:589.49 VT:404.04 R:424.69 RT:134.68 (1182Kops/s) after: L1:3931.21 L2:6132.78 M:3440.17 ( 92.24%) HT:1337.70 VT:1357.64 R:1270.27 RT:359.78 (2161Kops/s) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-25	Add a version of bilinear_interpolation for precision <=4	Jeff Muizelaar	1	-0/+37
	Having 4 or fewer bits means we can do two components at a time in a single 32 bit register. Here are the results for firefox-fishtank on a Pandaboard with 4.6.3 and PIXMAN_DISABLE="arm-neon" Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image t-firefox-fishtank 7.841 7.910 0.70% 6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image t-firefox-fishtank 6.951 6.995 1.11% 6/6
2013-01-25	Tweaks to lowlevel-blt-bench	Ben Avison	1	-1/+3
	This adds two extra tests, src_n_8 and src_8_8, which I have been using to benchmark my ARMv6 changes. I'd also like to propose that it requires an exact test name as the executable's argument, as achieved by this strstr to strcmp change. Without this, it is impossible to only benchmark (for example) add_8_8, add_n_8 or src_n_8, due to those also being substrings of many other test names.
2013-01-23	test: Use operator_name() and format_name() in composite.c	Søren Sandmann Pedersen	1	-120/+101
	With the operator_name() and format_name() functions there is no longer any reason for composite.c to have its own table of format and operator names.
2013-01-23	utils.[ch]: Add new format_name() function	Søren Sandmann Pedersen	6	-23/+103
	This function returns the name of the given format code, which is useful for printing out debug information. The function is written as a switch without a default value so that the compiler will warn if new formats are added in the future. The fake formats used in the fast path tables are also recognized. The function is used in alpha_map.c, where it replaces an existing format_name() function, and in blitters-test.c, affine-test.c, and scaling-test.c.
2013-01-23	test/utils.[ch]: Add new function operator_name()	Søren Sandmann Pedersen	5	-6/+78
	This function returns the name of the given operator, which is useful for printing out debug information. The function is done as a switch without a default value so that the compiler will warn if new operators are added in the future. The function is used in affine-test.c, scaling-test.c, and blitters-test.c.
2013-01-23	README: Add guidelines on how to contribute patches	Søren Sandmann Pedersen	1	-8/+102
	Ben Avison pointed out here: http://lists.freedesktop.org/archives/pixman/2013-January/002485.html that there isn't really any documentation about how to submit patches to pixman. This patch adds some information to the README file. v2: Incorporate some comments from Ben Avison v3: Change gitweb URL to cgit
2013-01-22	Convert INCLUDES to AM_CPPFLAGS	Matt Turner	3	-3/+3
	INCLUDES has been deprecated starting with automake 1.13. Convert all occurrences with the recommended AM_CPPFLAGS replacement.
2013-01-22	Add new demos and tests to .gitignore	Matt Turner	1	-0/+7

2013-01-22	MIPS: DSPr2: Added more fast-paths:	Nemanja Lukic	2	-0/+241
	- over_reverse_n_8888 - in_n_8_8 Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over_reverse_n_8888 = L1: 19.42 L2: 19.07 M: 15.38 ( 40.80%) HT: 13.35 VT: 13.10 R: 12.92 RT: 8.27 ( 49Kops/s) in_n_8_8 = L1: 21.20 L2: 22.86 M: 21.42 ( 14.21%) HT: 15.97 VT: 15.69 R: 15.47 RT: 8.00 ( 48Kops/s) Optimized: over_reverse_n_8888 = L1: 60.09 L2: 47.87 M: 28.65 ( 76.02%) HT: 23.58 VT: 22.51 R: 21.99 RT: 12.28 ( 60Kops/s) in_n_8_8 = L1: 89.38 L2: 86.07 M: 65.48 ( 43.44%) HT: 44.64 VT: 41.50 R: 40.77 RT: 16.94 ( 66Kops/s)
2013-01-22	MIPS: DSPr2: Added more fast-paths for REVERSE operation:	Nemanja Lukic	2	-0/+118
	- out_reverse_8_0565 - out_reverse_8_8888 Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): out_reverse_8_0565 = L1: 14.29 L2: 13.58 M: 12.14 ( 24.16%) HT: 9.23 VT: 9.12 R: 8.84 RT: 4.75 ( 36Kops/s) out_reverse_8_8888 = L1: 27.46 L2: 23.24 M: 17.41 ( 57.73%) HT: 12.61 VT: 12.47 R: 11.79 RT: 5.86 ( 41Kops/s) Optimized: out_reverse_8_0565 = L1: 28.24 L2: 25.64 M: 20.63 ( 41.05%) HT: 16.69 VT: 16.14 R: 15.50 RT: 8.69 ( 52Kops/s) out_reverse_8_8888 = L1: 52.78 L2: 41.44 M: 23.50 ( 77.94%) HT: 18.79 VT: 18.16 R: 16.90 RT: 9.11 ( 53Kops/s)
2013-01-06	pixman-filter.c: Cope with NULL returns from malloc()	Søren Sandmann Pedersen	1	-1/+9
	v2: Don't return a pointer to uninitialized memory when the allocation of horz and vert fails, but allocation of params doesn't.
2013-01-06	Handle solid images in the noop iterator	Søren Sandmann Pedersen	4	-38/+18
	The noop src iterator already has code to handle solid images, but that code never actually runs currently because it is not possible for an image to have both a format code of PIXMAN_solid and a flag of FAST_PATH_BITS_IMAGE. If these two were to be set at the same time, the fast_composite_tiled_repeat() fast path would trigger for solid images (because it triggers for PIXMAN_any formats, which includes PIXMAN_solid), but for solid images we can usually do better than that fast path. So this patch removes _pixman_solid_fill_iter_init() and instead handles such images (along with repeating 1x1 bits images without an alpha map) in pixman-noop.c. When a 1x1R image is involved in the general composite path, before this patch, it would hit this code in repeat() in pixman-inlines.h: while (c >= size) c -= size; while (c < 0) c += size; and those loops could run for a huge number of iteratons (proportional to the composite width). For such cases, the performance improvement is really big: ./test/lowlevel-blt-bench -n add_n_8888: Before: add_n_8888 = L1: 3.86 L2: 3.78 M: 1.40 ( 0.06%) HT: 1.43 VT: 1.41 R: 1.41 RT: 1.38 ( 19Kops/s) After: add_n_8888 = L1:1236.86 L2:2468.49 M:1097.88 ( 49.04%) HT:476.49 VT:429.05 R:417.04 RT:155.12 ( 817Kops/s)
2013-01-04	Fix build with automake-1.13	Marko Lindqvist	1	-1/+1
	Automake-1.13 has removed long obsolete AM_CONFIG_HEADER macro ( http://lists.gnu.org/archive/html/automake/2012-12/msg00038.html ) and autoreconf errors out upon seeing it. Attached patch replaces obsolete AM_CONFIG_HEADER with now proper AC_CONFIG_HEADERS.
2013-01-04	Use more appropriate types and remove a magic constant	Siarhei Siamashka	3	-3/+3

2013-01-04	Define SIZE_MAX if it is not provided by the standard C headers	Siarhei Siamashka	1	-0/+4
	C++ compilers do not define SIZE_MAX. It is also not available if the code is compiled by some C compilers: http://lists.freedesktop.org/archives/pixman/2012-August/002196.html
2012-12-20	Rename 'xor' variable to 'filler' (because 'xor' is a C++ keyword)	Siarhei Siamashka	6	-40/+40

2012-12-19	float-combiner.c: Change tests for x == 0.0 tests to - FLT_MIN < x < FLT_MIN	Søren Sandmann Pedersen	1	-13/+15
	pixman-float-combiner.c currently uses checks like these: if (x == 0.0f) ... else ... / x; to prevent division by 0. In theory this is correct: a division-by-zero exception is only supposed to happen when the floating point numerator is exactly equal to a positive or negative zero. However, in practice, the combination of x87 and gcc optimizations causes issues. The x87 registers are 80 bits wide, which means the initial test: if (x == 0.0f) may be false when x is an 80 bit floating point number, but when x is rounded to a 32 bit single precision number, it becomes equal to 0.0. In principle, gcc should compensate for this quirk of x87, and there are some options such as -ffloat-store, -fexcess-precision=standard, and -std=c99 that will make it do so, but these all have a performance cost. It is also possible to set the FPU to a mode that makes it do all computation with single or double precision, but that would require pixman to save the existing mode before doing anything with floating point and restore it afterwards. Instead, this patch side-steps the issue by replacing exact checks for equality with zero with a new macro that checkes whether the value is between -FLT_MIN and FLT_MIN. There is extensive reading material about this issue linked off the infamous gcc bug 323: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
2012-12-18	ARM: make use of UQADD8 instruction even in generic C code paths	Siarhei Siamashka	1	-0/+47
	ARMv6 has UQADD8 instruction, which implements unsigned saturated addition for 8-bit values packed in 32-bit registers. It is very useful for UN8x4_ADD_UN8x4, UN8_rb_ADD_UN8_rb and ADD_UN8 macros (which would otherwise need a lot of arithmetic operations to simulate this operation). Since most of the major ARM linux distros are built for ARMv7, we are much less dependent on runtime CPU detection and can get practical benefits from conditional compilation here for a lot of users. The results of cairo-perf-trace benchmark on ARM Cortex-A15 with pixman compiled by gcc 4.7.2 and PIXMAN_DISABLE set to "arm-simd arm-neon": Speedups ======== image firefox-talos-gfx (29938.22 0.12%) -> (27814.76 0.51%) : 1.08x speedup image firefox-asteroids (23241.11 0.07%) -> (21795.19 0.07%) : 1.07x speedup image firefox-canvas-alpha (174519.85 0.08%) -> (164788.64 0.20%) : 1.06x speedup image poppler (9464.46 1.61%) -> (8991.53 0.14%) : 1.05x speedup
2012-12-18	Faster conversion from a8r8g8b8 to r5g6b5 in C code	Siarhei Siamashka	1	-3/+7
	This change reduces 3 shifts, 3 ANDs and 2 ORs (total 8 arithmetic operations) to 3 shifts, 2 ANDs and 2 ORs (total 7 arithmetic operations). We get garbage in the high 16 bits of the result, which might need to be cleared when casting to uint16_t (it would bring us back to total 8 arithmetic operations). However in the case if the result of a8r8g8b8->r5g6b5 conversion is immediately stored to memory, no extra instructions for clearing these garbage bits are needed. This allows the a8r8g8b8->r5g6b5 conversion code to be compiled into 4 instructions for ARM instead of 5 (assuming a good optimizing compiler), which has no pipeline stalls on ARM11 as an additional bonus. The change in benchmark results for 'lowlevel-blt-bench src_8888_0565' with PIXMAN_DISABLE="arm-simd arm-neon mips-dspr2 mmx sse2" and pixman compiled by gcc-4.7.2: MIPS 74K 480MHz : 40.44 MPix/s -> 40.13 MPix/s ARM11 700MHz : 50.28 MPix/s -> 62.85 MPix/s ARM Cortex-A8 1000MHz : 124.38 MPix/s -> 141.85 MPix/s ARM Cortex-A15 1700MHz : 281.07 MPix/s -> 303.29 MPix/s Intel Core i7 2800MHz : 515.92 MPix/s -> 531.16 MPix/s The same trick was used in xomap (X server for Nokia N800/N810): http://repository.maemo.org/pool/diablo/free/x/xorg-server/ xorg-server_1.3.99.0~git20070321-0osso20083801.tar.gz
2012-12-18	Change CONVERT_XXXX_TO_YYYY macros into inline functions	Siarhei Siamashka	7	-62/+87
	It is easier and safer to modify their code in the case if the calculations need some temporary variables. And the temporary variables will be needed soon.
2012-12-18	test: add "src_0565_8888" to lowlevel-blt-bench	Siarhei Siamashka	1	-0/+1

2012-12-13	pixman_composite_trapezoids(): Check for NULL return from create_bits()	Søren Sandmann Pedersen	1	-2/+3
	A check is needed that the creation of the temporary image in pixman_composite_trapezoids() succeeds. Fixes crash in stress-test -s 0x313c on my system.
2012-12-13	pixman_composite_trapezoids: Return early if mask_format is not of TYPE_ALPHA	Søren Sandmann Pedersen	2	-0/+3
	stress-test -s 0x17ee crashes because pixman_composite_trapezoids() is given a mask_format of PIXMAN_c8, which causes it to create a temporary image with that format but without a palette. This causes crashes later. The only mask_format that we actually support are those of TYPE_ALPHA, so this patch add a return_if_fail() to ensure this. Similarly, although currently it won't crash if given an invalid format, alpha-only formats have always been the only thing that made sense for the pixman_rasterize_edges() functions, so add a return_if_fail() ensuring that the destination format is of type PIXMAN_TYPE_ALPHA.
2012-12-13	Add testing of trapezoids to stress-test	Søren Sandmann Pedersen	1	-25/+135
	The entry points add_trapezoids(), rasterize_trapezoid() and composite_trapezoid() are exercised with random trapezoids. This uncovers crashes with stress-test seeds 0x17ee and 0x313c.
2012-12-11	demos/radial-test: Add checkerboard to display the alpha channel	Søren Sandmann Pedersen	1	-0/+2

2012-12-11	demos/conical-test: Use the draw_checkerboard() utility function	Søren Sandmann Pedersen	1	-36/+2
	Instead of having its own copy.
2012-12-11	test/utils.[ch]: Add utility function to draw a checkerboard	Søren Sandmann Pedersen	2	-0/+59
	This is useful in demo programs to display the alpha channel.
2012-12-11	radial: When comparing t to mindr, use >= rather than >	Søren Sandmann Pedersen	1	-3/+3
	Radial gradients are conceptually rendered as a sequence of circles generated by linearly extrapolating from the two circles given by the gradient specification. Any circles in that sequence that would end up with a negative radius are not drawn, a condition that is enforced by checking that t * dr is bigger than mindr: if (t * dr > mindr) However, it is legitimate for a circle to have radius exactly 0, so the test should use >= rather than >. This gets rid of the dots in demos/radial-test except for when the c2 circle has radius 0 and a repeat mode of either NONE or NORMAL. Both those dots correspond to a t value of 1.0, which is outside the defined interval of [0.0, 1.0) and therefore subject to the repeat algorithm. As a result, in the NONE case, a value of 1.0 turns into transparent black. In the NORMAL case, 1.0 wraps around and becomes 0.0 which is red, unlike 0.99 which is blue. Cc: ranma42@gmail.com
2012-12-11	demos/radial-test: Add zero-radius circles to demonstrate rendering bugs	Søren Sandmann Pedersen	1	-1/+9
	Add two new gradient columns, one where the start circle is has radius 0 and one where the end circle has radius 0. All the new gradients except for one are rendered with a bright dot in the middle. In most but not all cases this is incorrect. Cc: ranma42@gmail.com
2012-12-10	test: Workaround unaligned MOVDQA bug (http://gcc.gnu.org/PR55614)	Siarhei Siamashka	1	-0/+12
	Just use SSE2 intrinsics to do unaligned memory accesses as a workaround for this gcc bug related to vector extensions.
2012-12-10	Improve performance of combine_over_u	Siarhei Siamashka	1	-7/+51
	The generic C over_u combiner can be a lot faster with the addition of special shortcuts for 0xFF and 0x00 alpha/mask values. This is already implemented in C and SSE2 fast paths. Profiling the run of cairo-perf-trace benchmarks with PIXMAN_DISABLE environment variable set to "fast mmx sse2" on Intel Core i7: === before === 37.32% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_u 21.37% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888 13.51% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8 2.96% cairo-perf-trac libpixman-1.so.0.29.1 [.] radial_compute_color 2.74% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_a8 2.71% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_x8r8g8b8 2.17% cairo-perf-trac libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel 1.86% cairo-perf-trac libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate 1.57% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8 0.97% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_in_reverse_u 0.96% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_ca === after === 28.79% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888 18.44% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8 15.54% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_u 3.94% cairo-perf-trac libpixman-1.so.0.29.1 [.] radial_compute_color 3.69% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_a8 3.69% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_x8r8g8b8 2.94% cairo-perf-trac libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel 2.52% cairo-perf-trac libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate 2.08% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8 1.31% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_in_reverse_u 1.29% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_ca
2012-12-08	Add fast paths for separable convolution	Søren Sandmann Pedersen	3	-3/+184
	Similar to the fast paths for general affine access, add some fast paths for the separable filter for all combinations of formats x8r8g8b8, a8r8g8b8, r5g6b5, a8 with the four repeat modes. It is easy to see the speedup in the demos/scale program.
2012-12-08	Add demo program for conical gradients	Søren Sandmann Pedersen	2	-0/+136
	This new test is derived from radial-test.c and displays conical gradients at various angles. It also demonstrates how PIXMAN_REPEAT_NORMAL is supposed to work when used with a gradient specification where the first stop is not a 0.0: In this case the gradient is supposed to have a smooth transition from the last stop back to the first stop with no sharp transitions. It also shows that the repeat mode is not ignored for conical gradients as one might be tempted to think.
2012-12-08	Add demos/zone_plate.png	Søren Sandmann Pedersen	1	-0/+0
	The zone plate image is a useful test case for image scalers because it contains all representable frequencies, so any imperfection in resampling filters will show up as Moire patterns. This version is symmetric around the midpoint of the image, so since rotating it is supposed to be a noop, it can also be used to verify that the resampling filters don't shift the image. V2: Run the file through OptiPNG to cut the size in half, as suggested by Siarhei.