~siamashka/pixman - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2013-09-28	vmx: there is no need to handle unaligned destination anymorevmx-fix-unaligned-writes	Siarhei Siamashka	1	-81/+36
	So the redundant variables, memory reads/writes and reshuffles can be safely removed. For example, this makes the inner loop of 'vmx_combine_add_u_no_mask' function much more simple. Before: 7a20:7d a8 48 ce lvx v13,r8,r9 7a24:7d 80 48 ce lvx v12,r0,r9 7a28:7d 28 50 ce lvx v9,r8,r10 7a2c:7c 20 50 ce lvx v1,r0,r10 7a30:39 4a 00 10 addi r10,r10,16 7a34:10 0d 62 eb vperm v0,v13,v12,v11 7a38:10 21 4a 2b vperm v1,v1,v9,v8 7a3c:11 2c 6a eb vperm v9,v12,v13,v11 7a40:10 21 4a 00 vaddubs v1,v1,v9 7a44:11 a1 02 ab vperm v13,v1,v0,v10 7a48:10 00 0a ab vperm v0,v0,v1,v10 7a4c:7d a8 49 ce stvx v13,r8,r9 7a50:7c 00 49 ce stvx v0,r0,r9 7a54:39 29 00 10 addi r9,r9,16 7a58:42 00 ff c8 bdnz+ 7a20 <.vmx_combine_add_u_no_mask+0x120> After: 76c0:7c 00 48 ce lvx v0,r0,r9 76c4:7d a8 48 ce lvx v13,r8,r9 76c8:39 29 00 10 addi r9,r9,16 76cc:7c 20 50 ce lvx v1,r0,r10 76d0:10 00 6b 2b vperm v0,v0,v13,v12 76d4:10 00 0a 00 vaddubs v0,v0,v1 76d8:7c 00 51 ce stvx v0,r0,r10 76dc:39 4a 00 10 addi r10,r10,16 76e0:42 00 ff e0 bdnz+ 76c0 <.vmx_combine_add_u_no_mask+0x120>
2013-09-28	vmx: align destination to fix valgrind invalid memory writes	Siarhei Siamashka	1	-0/+422
	The SIMD optimized inner loops in the VMX/Altivec code are trying to emulate unaligned accesses to the destination buffer. For each 4 pixels (which fit into a 128-bit register) the current implementation: 1. first performs two aligned reads, which cover the needed data 2. reshuffles bytes to get the needed data in a single vector register 3. does all the necessary calculations 4. reshuffles bytes back to their original location in two registers 5. performs two aligned writes back to the destination buffer Unfortunately in the case if the destination buffer is unaligned and the width is a perfect multiple of 4 pixels, we may have some writes crossing the boundaries of the destination buffer. In a multithreaded environment this may potentially corrupt the data outside of the destination buffer if it is concurrently read and written by some other thread. It is the primary suspect for the "make check" failure on power7 hardware: http://lists.freedesktop.org/archives/pixman/2013-August/002871.html The valgrind report for blitters-test is full of: ==23085== Invalid write of size 8 ==23085== at 0x1004B0B4: vmx_combine_add_u (pixman-vmx.c:1089) ==23085== by 0x100446EF: general_composite_rect (pixman-general.c:214) ==23085== by 0x10002537: test_composite (blitters-test.c:363) ==23085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733) ==23085== by 0x10004943: fuzzer_test_main (utils.c:728) ==23085== by 0x10002C17: main (blitters-test.c:397) ==23085== Address 0x5188218 is 0 bytes after a block of size 88 alloc'd ==23085== at 0x4051DA0: memalign (vg_replace_malloc.c:581) ==23085== by 0x4051E7B: posix_memalign (vg_replace_malloc.c:709) ==23085== by 0x10004CFF: aligned_malloc (utils.c:833) ==23085== by 0x10001DCB: create_random_image (blitters-test.c:47) ==23085== by 0x10002263: test_composite (blitters-test.c:283) ==23085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733) ==23085== by 0x10004943: fuzzer_test_main (utils.c:728) ==23085== by 0x10002C17: main (blitters-test.c:397) This patch addresses the problem by first aligning the destination buffer at a 16 byte boundary in each combiner function. This trick is borrowed from the pixman SSE2 code.
2013-09-28	utils.c: Make image_endian_swap() deal with negative strides	Søren Sandmann Pedersen	1	-6/+7
	Use a temporary variable s containing the absolute value of the stride as the upper bound in the inner loops. V2: Do this for the bpp == 16 case as well
2013-09-26	utils.c: Make print_image actually cope with negative strides	Søren Sandmann Pedersen	1	-3/+3
	Commit 4312f077365bf9f59423b1694136089c6da6216b claimed to have made print_image() work with negative strides, but it didn't actually work. When the stride was negative, the image buffer would be accessed as if the stride were positive. Fix the bug by not changing the stride variable and instead using a temporary, s, that contains the absolute value of stride.
2013-09-26	Move generated affine fetchers into pixman-fast-path.c	Søren Sandmann Pedersen	2	-530/+530
	The generated fetchers for NEAREST, BILINEAR, and SEPARABLE_CONVOLUTION filters are fast paths and so they belong in pixman-fast-path.c
2013-09-26	Move bits_image_fetch_bilinear_no_repeat_8888 into pixman-fast-path.c	Søren Sandmann Pedersen	2	-237/+237
	This iterator is really a fast path, so it belongs in the fast path implementation.
2013-09-26	fast, ssse3: Simplify logic to fetch lines in the bilinear iterators	Søren Sandmann Pedersen	2	-52/+30
	Instead of having logic to swap the lines around when one of them doesn't match, store the two lines in an array and use the least significant bit of the y coordinate as the index into that array. Since the two lines always have different least significant bits, they will never collide. The effect is that lines corresponding to even y coordinates are stored in info->lines[0] and lines corresponding to odd y coordinates are stored in info->lines[1].
2013-09-19	test: Test negative strides	Søren Sandmann Pedersen	6	-13/+98
	Pixman supports negative strides, but up until now they haven't been tested outside of stress-test. This commit adds testing of negative strides to blitters-test, scaling-test, affine-test, rotate-test, and composite-traps-test.
2013-09-19	test: Share the image printing code	Søren Sandmann Pedersen	6	-51/+40
	The affine-test, blitters-test, and scaling-test all have the ability to print out the bytes of the destination image. Share this code by moving it to utils.c. At the same time make the code work correctly with negative strides.
2013-09-19	{scaling,affine,composite-traps}-test: Use compute_crc32_for_image()	Søren Sandmann Pedersen	3	-30/+5
	By using this function instead of compute_crc32() the alpha masking code and the call to image_endian_swap() are not duplicated.
2013-09-16	pixman-filter.c: Use 65536, not 65535, for fixed point conversion	Søren Sandmann Pedersen	1	-1/+1
	Converting a double precision number to 16.16 fixed point should be done by multiplying with 65536.0, not 65535.0. The bug could potentially cause certain filters that would otherwise leave the image bit-for-bit unchanged under an identity transformation, to not do so, but the numbers are close enough that there weren't any visual differences.
2013-09-16	demos/scale.ui: Allow subsample_bits to be 0	Søren Sandmann Pedersen	1	-1/+1
	The separable convolution filter supports a subsample_bits of 0 which corresponds to no subsampling at all, so allow this value to be used in the scale demo.
2013-09-16	ssse3: Add iterator for separable bilinear scaling	Søren Sandmann Pedersen	1	-0/+312
	This new iterator uses the SSSE3 instructions pmaddubsw and pabsw to implement a fast iterator for bilinear scaling. There is a graph here recording the per-pixel time for various bilinear scaling algorithms as reported by scaling-bench: http://people.freedesktop.org/~sandmann/ssse3.v2/ssse3.v2.png As the graph shows, this new iterator is clearly faster than the existing C iterator, and when used with an SSE2 combiner, it is also faster than the existing SSE2 fast paths for upscaling, though not for downscaling. Another graph: http://people.freedesktop.org/~sandmann/ssse3.v2/movdqu.png shows the difference between writing to iter->buffer with movdqa, movdqu on an aligned buffer, and movdqu on a deliberately unaligned buffer. Since the differences are very small, the patch here avoids using movdqa because imposing alignment restrictions on iter->buffer may interfere with other optimizations, such as writing directly to the destination image. The data was measured with scaling-bench on a Sandy Bridge Core i3-2350M @ 2.3GHz and is available in this directory: http://people.freedesktop.org/~sandmann/ssse3.v2/ where there is also a Gnumeric spreadsheet ssse3.v2.gnumeric containing the per-pixel values and the graph. V2: - Use uintptr_t instead of unsigned long in the ALIGN macro - Use _mm_storel_epi64 instead of _mm_cvtsi128_si64 as the latter form is not available on x86-32. - Use _mm_storeu_si128() instead of _mm_store_si128() to avoid imposing alignment requirements on iter->buffer
2013-09-16	Add empty SSSE3 implementation	Søren Sandmann Pedersen	5	-2/+125
	This commit adds a new, empty SSSE3 implementation and the associated build system support. configure.ac: detect whether the compiler understands SSSE3 intrinsics and set up the required CFLAGS Makefile.am: Add libpixman-ssse3.la pixman-x86.c: Add X86_SSSE3 feature flag and detect it in detect_cpu_features(). pixman-ssse3.c: New file with an empty SSSE3 implementation V2: Remove SSSE3_LDFLAGS since it isn't necessary unless Solaris support is added.
2013-09-16	general: Ensure that iter buffers are aligned to 16 bytes	Søren Sandmann Pedersen	3	-7/+27
	At the moment iter buffers are only guaranteed to be aligned to a 4 byte boundary. SIMD implementations benefit from the buffers being aligned to 16 bytes, so ensure this is the case. V2: - Use uintptr_t instead of unsigned long - allocate 3 * SCANLINE_BUFFER_LENGTH byte on stack rather than just SCANLINE_BUFFER_LENGTH - use sizeof (stack_scanline_buffer) instead of SCANLINE_BUFFER_LENGTH to determine overflow
2013-09-16	sse2: faster bilinear scaling (pack 4 pixels to write with MOVDQA)	Siarhei Siamashka	1	-64/+144
	The loops are already unrolled, so it was just a matter of packing 4 pixels into a single XMM register and doing aligned 128-bit writes to memory via MOVDQA instructions for the SRC compositing operator fast path. For the other fast paths, this XMM register is also directly routed to further processing instead of doing extra reshuffling. This replaces "8 PACKSSDW/PACKUSWB + 4 MOVD" instructions with "3 PACKSSDW/PACKUSWB + 1 MOVDQA" per 4 pixels, which results in a clear performance improvement. There are also some other (less important) tweaks: 1. Convert 'pixman_fixed_t' to 'intptr_t' before using it as an index for addressing memory. The problem is that 'pixman_fixed_t' is a 32-bit data type and it has to be extended to 64-bit offsets, which needs extra instructions on 64-bit systems. 2. Allow to recalculate the horizontal interpolation weights only once per 4 pixels by treating the XMM register as four pairs of 16-bit values. Each of these 16-bit/16-bit pairs can be replicated to fill the whole 128-bit register by using PSHUFD instructions. So we get "3 PADDW/PSRLW + 4 PSHUFD" instructions per 4 pixels instead of "12 PADDW/PSRLW" per 4 pixels (or "3 PADDW/PSRLW" per each pixel). Now a good question is whether replacing "9 PADDW/PSRLW" with "4 PSHUFD" is a favourable exchange. As it turns out, PSHUFD instructions are very fast on new Intel processors (including Atoms), but are rather slow on the first generation of Core2 (Merom) and on the other processors from that time or older. A good instructions latency/throughput table, covering all the relevant processors, can be found at: http://www.agner.org/optimize/instruction_tables.pdf Enabling this optimization is controlled by the PSHUFD_IS_FAST define in "pixman-sse2.c". 3. One use of PSHUFD instruction (_mm_shuffle_epi32 intrinsic) in the older code has been also replaced by PUNPCKLQDQ equivalent (_mm_unpacklo_epi64 intrinsic) in PSHUFD_IS_FAST=0 configuration. The PUNPCKLQDQ instruction is usually faster on older processors, but has some side effects (instead of fully overwriting the destination register like PSHUFD does, it retains half of the original value, which may inhibit some compiler optimizations). Benchmarks with "lowlevel-blt-bench -b src_8888_8888" using GCC 4.8.1 on x86-64 system and default optimizations. The results are in MPix/s: ====== Intel Core2 T7300 (2GHz) ====== old: src_8888_8888 = L1: 128.69 L2: 125.07 M:124.86 over_8888_8888 = L1: 83.19 L2: 81.73 M: 80.63 over_8888_n_8888 = L1: 79.56 L2: 78.61 M: 77.85 over_8888_8_8888 = L1: 77.15 L2: 75.79 M: 74.63 new (PSHUFD_IS_FAST=0): src_8888_8888 = L1: 168.67 L2: 163.26 M:162.44 over_8888_8888 = L1: 102.91 L2: 100.43 M: 99.01 over_8888_n_8888 = L1: 97.40 L2: 95.64 M: 94.24 over_8888_8_8888 = L1: 98.04 L2: 95.83 M: 94.33 new (PSHUFD_IS_FAST=1): src_8888_8888 = L1: 154.67 L2: 149.16 M:148.48 over_8888_8888 = L1: 95.97 L2: 93.90 M: 91.85 over_8888_n_8888 = L1: 93.18 L2: 91.47 M: 90.15 over_8888_8_8888 = L1: 95.33 L2: 93.32 M: 91.42 ====== Intel Core i7 860 (2.8GHz) ====== old: src_8888_8888 = L1: 323.48 L2: 318.86 M:314.81 over_8888_8888 = L1: 187.38 L2: 186.74 M:182.46 new (PSHUFD_IS_FAST=0): src_8888_8888 = L1: 373.06 L2: 370.94 M:368.32 over_8888_8888 = L1: 217.28 L2: 215.57 M:211.32 new (PSHUFD_IS_FAST=1): src_8888_8888 = L1: 401.98 L2: 397.65 M:395.61 over_8888_8888 = L1: 218.89 L2: 217.56 M:213.48 The most interesting benchmark is "src_8888_8888" (because this code can be reused for a generic non-separable SSE2 bilinear fetch iterator). The results shows that PSHUFD instructions are bad for Intel Core2 T7300 (Merom core) and good for Intel Core i7 860 (Nehalem core). Both of these processors support SSSE3 instructions though, so they are not the primary targets for SSE2 code. But without having any other more relevant hardware to test, PSHUFD_IS_FAST=0 seems to be a reasonable default for SSE2 code and old processors (until the runtime CPU features detection becomes clever enough to recognize different microarchitectures). (Rebased on top of patch that removes support for 8-bit bilinear filtering -ssp)
2013-09-07	test: safeguard the scaling-bench test against COW	Siarhei Siamashka	1	-9/+20
	The calloc call from pixman_image_create_bits may still rely on http://en.wikipedia.org/wiki/Copy-on-write Explicitly initializing the destination image results in a more predictable behaviour. V2: - allocate 16 bytes aligned buffer with aligned stride instead of delegating this to pixman_image_create_bits - use memset for the allocated buffer instead of pixman solid fill - repeat tests 3 times and select best results in order to filter out even more measurement noise
2013-09-07	Drop support for 8-bit precision in bilinear filtering	Søren Sandmann Pedersen	4	-69/+17
	The default has been 7-bit for a while now, and the quality improvement with 8-bit precision is not enough to justify keeping the code around as a compile-time option.
2013-09-07	Make the first argument to scanline fetchers have type bits_image_t *	Søren Sandmann Pedersen	3	-36/+34
	Scanline fetchers haven't been used for images other than bits for a long time, so by making the type reflect this fact, a bit of casting can be saved in various places.
2013-09-04	iwmmxt: Disallow if gcc version is < 4.8.	Matt Turner	1	-2/+2
	Later versions of gcc-4.7.x are capable of generating iwMMXt instructions properly, but gcc-4.8 contains better support and other fixes, including iwMMXt in conjunction with hardfp. The existing 4.5 requirement was based on attempts to have OLPC use a patched gcc to build pixman. Let's just require gcc-4.8.
2013-08-31	fast_bilinear_cover_init: Don't install a finalizer on the error path	Søren Sandmann Pedersen	1	-1/+1
	No memory is allocated in the error case, so a finalizer is not necessary, and will cause problems if the data pointer is not initialized to NULL.
2013-08-10	Add an iterator that can fetch bilinearly scaled images	Søren Sandmann Pedersen	1	-0/+241
	This new iterator works in a separable way; that is, for a destination scaline, it scales the two involved source scanlines and then caches them so that they can be reused for the next destination scanlines. There are two versions of the code, one that uses 64 bit arithmetic, and one that uses 32 bit arithmetic only. The latter version is used on 32 bit systems, where it is expected to be faster. This scheme saves a substantial amount of arithmetic for larger scalings; the per-pixel times for various configurations as reported by scaling-bench are graphed here: http://people.freedesktop.org/~sandmann/separable.v2/v2.png The "sse2" graph is current default on x86, "mmx" is with sse2 disabled, "old c" is with sse2 and mmx disabled. The "new 32" and "new 64" graphs show times for the new code. As the graphs show, the 64 bit version of the new code beats the "old c" for all scaling ratios. The data was taken on a Sandy Bridge Core i3-2350M CPU @ 2.0 GHz running in 64 bit mode. The data used to generate the graph is available in this directory: http://people.freedesktop.org/~sandmann/separable.v2/ There is also a Gnumeric spreadsheet v2.gnumeric containing the per-pixel values and the graph. V2: - Add error message in the OOM/bad matrix case - Save some shifts by storing the cached scanlines in AGBR order - Special cased version that uses 32 bit arithmetic when sizeof(long) <= 4
2013-08-10	Add support for iter finalizers	Søren Sandmann Pedersen	4	-0/+13
	Iterators may sometimes need to allocate auxillary memory. In order to be able to free this memory, optional iterator finalizers are required.
2013-08-10	test/scaling-bench.c: New benchmark for bilinear scaling	Søren Sandmann Pedersen	2	-0/+70
	This new benchmark scales a 320 x 240 test a8r8g8b8 image by all ratios from 0.1, 0.2, ... up to 10.0 and reports the time it to took to do each of the scaling operations, and the time spent per destination pixel. The times reported for the scaling operations are given in milliseconds, the times-per-pixel are in nanoseconds. V2: Format output better
2013-08-07	RELEASING: Add note about changing the topic of the #cairo IRC channel	Søren Sandmann Pedersen	1	-0/+2

2013-08-05	test: fix matrix-test on big endian systems	Siarhei Siamashka	1	-2/+51

2013-06-25	test: Fix build on MSVC	Andrea Canciani	1	-3/+4
	The MSVC compiler is very strict about variable declarations after statements. Move all the declarations of each block before any statement in the same block to fix multiple instances of: alpha-loop.c(XX) : error C2275: 'pixman_image_t' : illegal use of this type as an expression
2013-06-11	Require GTK+ version >= 2.16	Alexander Troosh	1	-2/+2
	I'm got bug in my system: lcc: "scale.c", line 374: warning: function "gtk_scale_add_mark" declared implicitly [-Wimplicit-function-declaration] gtk_scale_add_mark (GTK_SCALE (widget), 0.0, GTK_POS_LEFT, NULL); ^ CCLD scale scale.o: In function `app_new': (.text+0x23e4): undefined reference to `gtk_scale_add_mark' scale.o: In function `app_new': (.text+0x250c): undefined reference to `gtk_scale_add_mark' scale.o: In function `app_new': (.text+0x2634): undefined reference to `gtk_scale_add_mark' make[2]: *** [scale] Error 1 make[2]: Target `all' not remade because of errors. $ pkg-config --modversion gtk+-2.0 2.12.1 The demos/scale.c use call to gtk_scale_add_mark() function from 2.16+ version of GTK+. Need do support old GTK+ (rewrite scale.c) or simple demand of high version of GTK+, like this:
2013-06-08	configure.ac: Don't use '+=' since it's not POSIX	Matthieu Herrb	1	-1/+1
	Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Matthieu Herrb <matthieu.herrb@laas.fr>
2013-05-22	Consolidate all the iter_init_bits_stride functions	Søren Sandmann Pedersen	5	-42/+25
	The SSE2, MMX, and fast implementations all have a copy of the function iter_init_bits_stride that computes an image buffer and stride. Move that function to pixman-utils.c and share it among all the implementations.
2013-05-22	Delete the old src/dest_iter_init() functions	Søren Sandmann Pedersen	7	-274/+6
	Now that we are using the new _pixman_implementation_iter_init(), the old _src/_dest_iter_init() functions are no longer needed, so they can be deleted, and the corresponding fields in pixman_implementation_t can be removed.
2013-05-22	Add _pixman_implementation_iter_init() and use instead of _src/_dest_init()	Søren Sandmann Pedersen	8	-6/+88
	A new field, 'iter_info', is added to the implementation struct, and all the implementations store a pointer to their iterator tables in it. A new function, _pixman_implementation_iter_init(), is then added that searches those tables, and the new function is called in pixman-general.c and pixman-image.c instead of the old _pixman_implementation_src_init() and _pixman_implementation_dest_init().
2013-05-22	general: Store the iter initializer in a one-entry pixman_iter_info_t table	Søren Sandmann Pedersen	1	-22/+62
	In preparation for sharing all iterator initialization code from all the implementations, move the general implementation to use a table of pixman_iter_info_t. The existing src_iter_init and dest_iter_init functions are consolidated into one general_iter_init() function that checks the iter_flags for whether it is dealing with a source or destination iterator. Unlike in the other implementations, the general_iter_init() function stores its own get_scanline() and write_back() functions in the iterator, so it relies on the initializer being called after get_scanline and write_back being copied from the struct to the iterator.
2013-05-22	fast: Replace the fetcher_info_t table with a pixman_iter_info_t table	Søren Sandmann Pedersen	1	-62/+45
	Similar to the SSE2 and MMX patches, this commit replaces a table of fetcher_info_t with a table of pixman_iter_info_t, and similar to the noop patch, both fast_src_iter_init() and fast_dest_iter_init() are now doing exactly the same thing, so their code can be shared in a new function called fast_iter_init_common().
2013-05-22	mmx: Replace the fetcher_info_t table with a pixman_iter_info_t table	Søren Sandmann Pedersen	1	-29/+35
	Similar to the SSE2 commit, information about the iterators is stored in a table of pixman_iter_info_t.
2013-05-22	sse2: Replace the fetcher_info_t table with a pixman_iter_info_t table	Søren Sandmann Pedersen	1	-29/+35
	Similar to the changes to noop, put all the iterators into a table of pixman_iter_info_t and then do a generic search of that table during iterator initialization.
2013-05-22	noop: Keep information about iterators in an array of pixman_iter_info_t	Søren Sandmann Pedersen	2	-74/+124
	Instead of having a nest of if statements, store the information about iterators in a table of a new struct type, pixman_iter_info_t, and then walk that table when initializing iterators. The new struct contains a format, a set of image flags, and a set of iter flags, plus a pixman_iter_get_scanline_t, a pixman_iter_write_back_t, and a new function type pixman_iter_initializer_t. If the iterator matches an entry, it is first initialized with the given get_scanline and write_back functions, and then the provided iter_initializer (if present) is run. Running the iter_initializer after setting get_scanline and write_back allows the initializer to override those fields if it wishes. The table contains both source and destination iterators, distinguished based on the recently-added ITER_SRC and ITER_DEST; similarly, wide iterators are recognized with the ITER_WIDE flag. Having both source and destination iterators in the table means the noop_src_iter_init() and noop_dest_iter_init() functions become identical, so this patch factors out their code in a new function noop_iter_init_common() that both calls. The following patches in this series will change all the implementations to use an iterator table, and then move the table search code to pixman-implementation.c.
2013-05-22	Always set the FAST_PATH_NO_ALPHA_MAP flag for non-BITS images	Søren Sandmann Pedersen	3	-5/+6
	We only support alpha maps for BITS images, so it's always to ignore the alpha map for non-BITS image. This makes it possible get rid of the check for SOLID images since it will now be subsumed by the check for FAST_PATH_NO_ALPHA_MAP. Opaque masks are reduced to NULL images in pixman.c, and those can also safely be treated as not having an alpha map, so set the FAST_PATH_NO_ALPHA_MAP bit for those as well.
2013-05-22	Add ITER_WIDE iter flag	Søren Sandmann Pedersen	2	-17/+16
	This will be useful for putting iterators into tables where they can be looked up by iterator flags. Without this flag, wide iterators can only be recognized by the absence of ITER_NARROW, which makes testing for a match difficult.
2013-05-22	Add ITER_SRC and ITER_DEST iter flags	Søren Sandmann Pedersen	3	-7/+17
	These indicate whether the iterator is for a source or a destination image. Note iterator initializers are allowed to rely on one of these being set, so they can't be left out the way it's generally harmless (aside from potentil performance degradation) to leave out a particular fast path flag.
2013-05-22	Make use of image flag in noop iterators	Søren Sandmann Pedersen	1	-6/+4
	Similar to c2230fe2aff, simply check against SAMPLES_COVER_CLIP_NEAREST instead of comparing all the x/y/width/height parameters.
2013-05-19	Use AC_LINK_IFELSE to check if the Loongson MMI code can link	Markos Chandras	1	-1/+1
	The Loongson code is compiled with -march=loongson2f to enable the MMI instructions, but binutils refuses to link object code compiled with different -march settings, leading to link failures later in the compile. This avoids that problem by checking if we can link code compiled for Loongson. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
2013-05-15	mmx: Document implementation(s) of pix_multiply().	Matt Turner	1	-0/+23
	I look at that function and can never remember what it does or how it manages to do it.
2013-05-11	Fix broken build when HAVE_CONFIG_H is undefined, e.g. on Win32.	ingmar@irsoft.de	2	-0/+4
	Build fix for platforms without a generated config.h, for example Win32.
2013-05-08	Post-release version bump to 0.31.1	Søren Sandmann Pedersen	1	-2/+2

2013-05-08	Pre-release version bump to 0.30.0	Søren Sandmann Pedersen	1	-2/+2

2013-04-30	Post-release version bump to 0.29.5	Søren Sandmann Pedersen	1	-1/+1

2013-04-30	Pre-release version bump to 0.29.4	Søren Sandmann Pedersen	1	-1/+1

2013-04-30	pixman/refactor: Delete this file	Søren Sandmann Pedersen	1	-478/+0
	Essentially all of it is obsolete by now.
2013-04-30	MIPS: DSPr2: Added rpixbuf fast path.	Nemanja Lukic	2	-0/+63
	Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): rpixbuf = L1: 14.63 L2: 13.55 M: 9.91 ( 79.53%) HT: 8.47 VT: 8.32 R: 8.17 RT: 4.90 ( 33Kops/s) Optimized: rpixbuf = L1: 45.69 L2: 37.30 M: 17.24 (138.31%) HT: 15.66 VT: 14.88 R: 13.97 RT: 8.38 ( 44Kops/s)