~sandmann/pixman - Unnamed repository; edit this file to name it for gitweb.

Age	Commit message (Collapse)	Author	Files	Lines
2012-06-29	Simplify CPU detection on PPC.cpudetectfiles	Søren Sandmann Pedersen	1	-75/+38
	Get rid of the initialized and have_vmx static variables in pixman-ppc.c There is no point to them since CPU detection only happens once per process. On Linux, just read /proc/self/auxv instead of generating the filename with getpid() and don't bother with the stack buffer. Instead just read the aux entries one by one.
2012-06-29	Simplifications to ARM CPU detection	Søren Sandmann Pedersen	1	-157/+87
	Organize pixman-arm.c such that each operating system/compiler exports a detect_cpu_features() function that returns a bitmask with the various features that we are interested in. A new function have_feature() then calls this function, caches the result, and return whether the given feature is available. The result is that all the pixman_have_arm_<feature> functions become redundant and can be deleted.
2012-06-29	Cleanups and simplifications in x86 CPU feature detection	Søren Sandmann Pedersen	1	-186/+134
	A new function pixman_cpuid() is added that runs the cpuid instruction and returns the results. On GCC this function uses inline assembly that is written such that it will work on both 32 and 64 bit. Compared to the old code, the only difference is %ebx is saved in %esi instead of on the stack. Saving 32 bit registers on a 64 bit stack is difficult or impossible because in 64 bit mode, the push and pop instructions work on 64 bit registers. On MSVC, the function calls the __cpuid intrinsic. There is also a new function called have_cpuid() which detects whether cpuid is available. On x86-64 and MSVC, it simply returns TRUE; on x86-32 bit, it checks whether the 22nd bit of eflags can be modified. On MSVC this does have the consequence that pixman will no longer work CPUS without cpuid (ie., older than 486 and some 486 models). These two functions together makes it possible to write a generic detect_cpu_features() in plain C. This function is then used in a new have_feature() function that checks whether a specific set of feature bits is available. Aside from the cleanups and simplifications, the main benefit from this patch is that pixman now can do feature detection on x86-64, so that newer instruction sets such as SSSE3 and SSE4.1 can be used. (And apparently the assumption that x86-64 CPUs always have MMX and SSE2 is no longer correct: Knight's Corner is x86-64, but doesn't have them). V2: Rename the constants in the getisax() code, as pointed out by Alan Coopersmith. Also reinstate the result variable and initialize features to 0.
2012-06-29	Simplify MIPS CPU detection	Søren Sandmann Pedersen	1	-35/+9
	There is no reason to have pixman_have_<feature> functions when all they do is call pixman_have_mips_feature(). Instead rename pixman_have_mips_feature() to have_feature() and call it directly from _pixman_mips_get_implementations(). Also on non-Linux, just make have_feature() return FALSE.
2012-06-29	Move the remaining bits of pixman-cpu into pixman-implementation.c	Søren Sandmann Pedersen	3	-80/+51

2012-06-29	Move MIPS specific CPU detection to its own file, pixman-mips.c	Søren Sandmann Pedersen	4	-76/+115

2012-06-29	Move PowerPC specific CPU detection to its own file pixman-ppc.c	Søren Sandmann Pedersen	4	-164/+197

2012-06-29	Move ARM specific CPU detection to a new file pixman-arm.c	Søren Sandmann Pedersen	4	-253/+300
	Similar to the x86 commit, this moves the ARM specific CPU detection to its own file which exports a pixman_arm_get_implementations() function that is supposed to be a noop on non-ARM.
2012-06-29	Move x86 specific CPU detection to a new file pixman-x86.c	Søren Sandmann Pedersen	4	-248/+291
	Extract the x86 specific parts of pixman-cpu.c and put them in their own file called pixman-x86.c which exports one function pixman_x86_get_implementations() that creates the MMX and SSE2 implementations. This file is supposed to be compiled on all architectures, but pixman_x86_get_implementations() should be a noop on non-x86.
2012-06-29	pixman-cpu.c: Rename disabled to _pixman_disabled() and export it	Søren Sandmann Pedersen	2	-11/+13

2012-06-29	Fix distcheck due to custom iwMMXt rules	Matt Turner	1	-0/+1

2012-06-29	sse2: faster bilinear scaling (use _mm_loadl_epi64)	Siarhei Siamashka	1	-8/+7
	Using _mm_loadl_epi64() to load two pixels at once (pairs of top and bottom pixels) is faster than loading each pixel separately and combining them with _mm_set_epi32(). === cairo-perf-trace === before: image firefox-fishtank 66.912 66.931 0.13% 3/3 after: image firefox-fishtank 57.584 58.349 0.74% 3/3 === lowlevel-blt-bench === before: src_8888_8888 = L1: 181.10 L2: 179.14 M:178.08 ( 11.02%) HT:153.22 VT:133.45 R:142.24 RT: 95.32 after: src_8888_8888 = L1: 228.68 L2: 225.75 M:223.98 ( 14.23%) HT:185.32 VT:155.06 R:162.73 RT:102.52 This improvement was suggested by Matt Turner on irc.
2012-06-29	test: support nearest/bilinear scaling in lowlevel-blt-bench	Siarhei Siamashka	1	-1/+62
	Scale factor is selected to be nearly 1x, so that the MPix/s results can be directly compared with the results of non-scaled compositing operations.
2012-06-29	test: Fix for strict aliasing issue in 'get_random_seed'	Siarhei Siamashka	1	-3/+3
	Gets rid of gcc warning when compiled with -fstrict-aliasing option in CFLAGS
2012-06-20	build: Fix compilation on win32	Andrea Canciani	1	-1/+5
	When compiling using the win32 build system, config.h is not available nor needed. Fixes: pixman-glyph.c(26) : fatal error C1083: Cannot open include file: 'config.h': No such file or directory
2012-06-16	sse2: add src_x888_0565	Matt Turner	1	-0/+83
	Port of 2ddd1c498b to SSE2. Uses the pmadd technique described in http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf Works around lack of packusdw instruction by first sign extending the values. fast: src_8888_0565 = L1: 681.40 L2: 689.20 M: 644.76 ( 25.51%) HT:404.42 VT:288.04 R:306.07 RT:150.80 (1619Kops/s) mmx: src_8888_0565 = L1:2056.03 L2:1985.44 M:1574.91 ( 61.87%) HT:533.10 VT:376.35 R:416.10 RT:178.79 (1833Kops/s) sse2: src_8888_0565 = L1:3793.42 L2:3653.44 M:1878.83 ( 73.94%) HT:535.03 VT:407.96 R:421.46 RT:163.31 (1727Kops/s) and for reference, using packusdw sse4: src_8888_0565 = L1:4396.18 L2:4229.25 M:1904.04 ( 75.18%) HT:559.79 VT:427.96 R:440.06 RT:165.71 (1744Kops/s) Notice that MMX is faster in the RT case because it can operate on 8-bytes instead of the current 16-bytes for SSE2.
2012-06-13	sse2: enable over_n_0565 for b5g6r5	Matt Turner	1	-0/+1
	Same as b950bb12 for MMX.
2012-06-13	.gitignore: add test/glyph-test	Matt Turner	1	-0/+1

2012-06-13	test: Add missing break in stress-test.c	Søren Sandmann Pedersen	1	-0/+1
	Found by coverity: https://bugzilla.redhat.com/show_bug.cgi?id=756069
2012-06-12	test: fix bisecting issue in fuzzer-find-diff.pl	Siarhei Siamashka	1	-0/+7
	Before bisecting to find the exact test which has failed, we first need to make sure that the first test is fine (the first test is "good" and the whole range is "bad"). Otherwise test 2 gets incorrectly flagged as problematic in the case if we already got a failure on test 1 right from the start.
2012-06-12	test: OpenMP 2.5 requires signed loop iteration variables	Siarhei Siamashka	2	-10/+8
	Unsigned loop variables are only supported since version 3.0 of OpenMP specification. Changing loop variables to use int32_t type fixes pixman build problems with path64 compiler.
2012-06-11	test: Make glyph test pass on big endian	Søren Sandmann Pedersen	1	-3/+7
	The destination buffer was initialized with random uint32_t values, so it started out different on big endian vs. little endian. Fix that by initializing the buffer with random uint8_t values instead.
2012-06-11	bits-image: Turn all the fetchers into iterator getters	Søren Sandmann Pedersen	2	-104/+114
	Instead of caching these fetchers in the image structure, and then have the iterator getter call them from there, simply change them to be iterator getters themselves. This avoids an extra indirect function call and lets us get rid of the get_scanline_32/64 fields in pixman_image_t.
2012-06-10	Faster unorm_to_unorm for wide processing.	Antti S. Lankila	1	-4/+27
	Optimizing the unorm_to_unorm functions allows a speedup from: src_8888_2x10 = L1: 62.08 L2: 60.73 M: 59.61 ( 4.30%) HT: 46.81 VT: 42.17 R: 43.18 RT: 26.01 (325Kops/s) to: src_8888_2x10 = L1: 76.94 L2: 78.43 M: 75.87 ( 5.59%) HT: 56.73 VT: 52.39 R: 53.00 RT: 29.29 (363Kops/s) on a i7 Q720 -based laptop. The key of the patch is the observation that unorm_to_unorm's work can more easily be done with a simple multiplication and shift, when the function is applied repeatedly and the parameters are not compile-time constants. For instance, converting from 0xfe to 0xfefe (expanding from 8 bits to 16 bits) can be done by calculating c = c * 0x101 However, sometimes the result is not a neat replication of all the bits. For instance, going from 10 bits to 16 bits can be done by calculating c = c * 0x401UL >> 4 where the intermediate result is 20 bit wide repetition of the 10-bit pattern followed by shifting off the unnecessary lowest bits. The patch has the algorithm to calculate the factor and the shift, and converts the code to use it.
2012-06-09	configure.ac: add iwmmxt2 configure flag	Matt Turner	1	-6/+14
	The flag allows the user to select whether pixman-mmx.c is compiled with -march=iwmmxt or -march=iwmmxt2. gcc has scheduling support for the Marvell CPU in the XO 1.75 when building with -march=iwmmxt2.
2012-06-09	autotools: use custom build rule to build iwMMXt code	Matt Turner	2	-6/+14
	gcc has no sane way of enabling iwmmxt code generation, like -msse for SSE, so you have to use -march=iwmmxt{,2}. User CFLAGS are placed after -march=iwmmxt and override the march value, so we have to use a custom build rule to order the CFLAGS such that pixman-mmx.c will be built with the necessary CFLAGS.
2012-06-02	Speed up _pixman_image_get_solid() in common casesglyph4	Søren Sandmann Pedersen	1	-6/+27
	Make _pixman_image_get_solid() faster by special-casing the common cases where the image is SOLID or a repeating a8r8g8b8 image. This optimization together with the previous one results in a small but reproducable performance improvement on the xfce4-terminal-a1 cairo trace: [ # ] backend test min(s) median(s) stddev. count Before: [ 0] image xfce4-terminal-a1 1.221 1.239 1.21% 100/100 After: [ 0] image xfce4-terminal-a1 1.170 1.199 1.26% 100/100 Either optimization by itself is difficult to separate from noise.
2012-06-02	Speed up _pixman_composite_glyphs_no_mask()	Søren Sandmann Pedersen	3	-23/+111
	Bypass much of the overhead of pixman_image_composite32() by only computing the composite region once instead of once per glyph, and by only looking up the composite function whenever the glyph format or flags change. As part of this, the pixman_compute_composite_region32() was renamed to _pixman_compute_composite_region32() and exported in pixman-private.h. I couldn't find a trace that would reliably demonstrate that this is actually an improvement by itself (since _pixman_composite_glyphs_no_mask() is called so rarely), but together with the following optimization for solid sources, there is a small but reliable improvement to the xfce4-a1-terminal cairo trace.
2012-06-02	Speed up pixman_composite_glyphs()	Søren Sandmann Pedersen	3	-32/+149
	When adding glyphs to the mask, bypass most of the overhead of pixman_image_composite32() by: - Only looking up the composite function when the glyph changes either format or flags. - Only using a white source when the glyph format is different from the mask format. - Simply intersecting the glyph rectangle with the destination rectangle instead of doing the full _pixman_composite_region32(). Performance results: [ # ] backend test min(s) median(s) stddev. count Before: [ 0] image firefox-talos-gfx 6.570 6.577 0.13% 8/10 After: [ 0] image firefox-talos-gfx 4.272 4.289 0.28% 10/10 V2: Changes to deal with white sources
2012-06-02	test: Add glyph-test	Søren Sandmann Pedersen	2	-0/+332
	This test tests the new glyph cache and compositing API. Much of this test is intending to making sure that clipping and alpha map handling survive any optimizations that may be added to the glyph compositing. V2: Evaluating lcg_rand_n() multiple times in an argument list lead to undefined behavior.
2012-06-02	Add support for alpha maps to compute_crc32_for_image().	Søren Sandmann Pedersen	1	-13/+75
	When a destination image I has an alpha map A, the following rules apply: - If I has an alpha channel itself, the content of that channel is undefined - If A has RGB channels, the content of those channels is undefined. Hence in order to compute the CRC32 for such an image, we have to mask off the alpha channel of the image, and the RGB channels of the alpha map. V2: Shifting by 32 is undefined in C
2012-06-02	Move CRC32 computation from blitters-test.c into utils.c	Søren Sandmann Pedersen	3	-31/+46
	This way it can be used in other tests.
2012-06-02	Add pixman_glyph_cache_t API	Søren Sandmann Pedersen	3	-0/+542
	This new API allows entire glyph strings to be composited in one go which reduces overhead compared to multiple calls to pixman_image_composite32(). The pixman_glyph_cache_t is a hash table that maps two keys (a "font" and a "glyph" key, but they are just keys; there is no distinction between them as far as pixman is concerned) to a glyph. Glyphs in the cache can be composited through two new entry points pixman_glyph_cache_composite_glyphs() and pixman_glyph_cache_composite_glyphs_no_mask(). A glyph cache may only be inserted into when it is "frozen", which is achieved by calling pixman_glyph_cache_freeze(). When pixman_glyph_cache_thaw() is later called, if the cache has become too crowded, some glyphs (currently the least-recently-used) will automatically be evicted. This means that a user must ensure that all the required glyphs are present in the cache before compositing a string. The intended way to use the cache is like this: pixman_glyph_t glyphs[MAX_GLYPHS]; pixman_glyph_cache_freeze (cache); for (i = 0; i < n_glyphs; ++i) { const void g; if (!(g = pixman_glyph_cache_lookup (cache, font_key, glyph_key))) { img = <rasterize glyph as a pixman_image_t>; g = pixman_glyph_cache_insert (cache, font_key, glyph_key, glyph_origin_x, glyph_origin_y, img); if (!g) { / Clean up out-of-memory condition */ goto oom; } glyphs[i].pos_x = glyph_x_pos; glyphs[i].pos_y = glyph_y_pos; glyphs[i].glyph = g; } } pixman_composite_glyphs (op, src, dest, ..., cache, n_glyphs, glyphs); pixman_glyph_cache_thaw (cache); V2: - Move glyphs to front of the MRU list when they are used. Pointed out by Behdad Esfahbod. - Composite glyphs with (white IN glyph) ADD mask in order to support mixed a8 and a8r8g8b8 glyphs. Also pointed out by Behdad. - Add pixman_glyph_get_mask_format
2012-06-02	Add doubly linked lists	Søren Sandmann Pedersen	2	-0/+49
	This commit adds some new inline functions to maintain a doubly linked list. The way to use them is to embed a pixman_link_t into the structures that should be linked, and use a pixman_list_t as the head of the list. The new functions are pixman_list_init (pixman_list_t list); pixman_list_prepend (pixman_list_t list, pixman_link_t link); pixman_list_move_to_front (pixman_list_t list, pixman_link_t *link); There are also a new macro: CONTAINER_OF(type, member, data); that can be used to get from a pointer to a member to the containing structure. V2: Use the C89 macro offsetof() instead of rolling our own - suggested by Alan Coopersmith.
2012-05-30	Make use of image flags in mmx and sse2 iterators	Søren Sandmann Pedersen	2	-20/+8
	Now that we have the full image flags available, the SSE2 and MMX iterators can simply check against SAMPLES_COVER_CLIP_NEAREST (which is computed in pixman_image_composite32()) instead of comparing all the x/y/width/height parameters.
2012-05-30	Pass the full image flags to iterators	Søren Sandmann Pedersen	12	-33/+40
	When pixman_image_composite32() is called some flags are computed that indicate various things about the composite operation that can't be deduced from the image flags themselves. These additional flags are not currently available to iterators. All they can do is read the image flags in image->common.flags. Fix that by passing the info->{src, mask, dest}_flags on to the iterator initialization and store the flags in the iter struct as "image_flags". At the same time rename the iterator flags variable to "iter_flags" to avoid confusion.
2012-05-27	mmx: add missing _mm_empty calls	Matt Turner	1	-0/+5
	Fixes spurious test failures on x86-32.
2012-05-26	mmx: add over_reverse_n_8888	Matt Turner	2	-0/+73
	Loongson: over_reverse_n_8888 = L1: 16.04 L2: 15.35 M: 10.20 ( 27.96%) HT: 10.95 VT: 10.45 R: 9.18 RT: 6.99 ( 76Kops/s) over_reverse_n_8888 = L1: 27.40 L2: 26.67 M: 16.97 ( 45.78%) HT: 16.66 VT: 15.38 R: 14.15 RT: 9.44 ( 97Kops/s) image poppler 34.106 35.500 1.48% 6/6 image poppler 29.598 30.835 1.70% 6/6 ARM/iwMMXt: over_reverse_n_8888 = L1: 15.63 L2: 14.33 M: 10.83 ( 27.55%) HT: 9.78 VT: 9.91 R: 9.49 RT: 6.96 ( 69Kops/s) over_reverse_n_8888 = L1: 22.79 L2: 19.40 M: 13.76 ( 34.19%) HT: 11.66 VT: 11.86 R: 11.17 RT: 7.85 ( 75Kops/s) image poppler 38.040 38.606 1.10% 6/6 image poppler 31.686 32.278 0.80% 5/6
2012-05-26	mmx: add add_0565_0565	Matt Turner	1	-0/+86
	Loongson: add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) add_0565_0565 = L1: 45.06 L2: 46.71 M: 27.45 ( 38.00%) HT: 23.76 VT: 22.84 R: 18.96 RT: 9.79 ( 104Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s) add_0565_0565 = L1: 31.14 L2: 28.87 M: 22.46 ( 28.60%) HT: 18.61 VT: 17.04 R: 15.21 RT: 9.35 ( 90Kops/s)
2012-05-26	fast: add add_0565_0565 function	Matt Turner	1	-0/+44
	I'll need this code for header and tail alignment loops in MMX, so I might as well implement a fast path here.
2012-05-26	mmx: implement expand_4x565 in terms of expand_4xpacked565	Matt Turner	1	-27/+59
	Loongson: over_n_0565 = L1: 38.57 L2: 38.88 M: 30.01 ( 20.97%) HT: 23.60 VT: 23.88 R: 21.95 RT: 11.65 ( 113Kops/s) over_n_0565 = L1: 56.28 L2: 55.90 M: 34.20 ( 23.82%) HT: 25.66 VT: 26.60 R: 23.78 RT: 11.80 ( 115Kops/s) over_8888_0565 = L1: 35.89 L2: 36.11 M: 21.56 ( 45.47%) HT: 18.33 VT: 17.90 R: 16.27 RT: 9.07 ( 98Kops/s) over_8888_0565 = L1: 40.91 L2: 41.06 M: 23.13 ( 48.46%) HT: 19.24 VT: 18.71 R: 16.82 RT: 9.18 ( 99Kops/s) over_n_8_0565 = L1: 28.92 L2: 29.12 M: 21.42 ( 30.00%) HT: 18.37 VT: 17.75 R: 16.15 RT: 8.79 ( 91Kops/s) over_n_8_0565 = L1: 32.32 L2: 32.13 M: 22.44 ( 31.27%) HT: 19.15 VT: 18.66 R: 16.62 RT: 8.86 ( 92Kops/s) over_n_8888_0565_ca = L1: 29.33 L2: 29.22 M: 18.99 ( 66.69%) HT: 16.69 VT: 16.22 R: 14.63 RT: 8.42 ( 88Kops/s) over_n_8888_0565_ca = L1: 34.97 L2: 34.14 M: 20.32 ( 71.73%) HT: 17.67 VT: 17.19 R: 15.23 RT: 8.50 ( 89Kops/s) ARM/iwMMXt: over_n_0565 = L1: 29.70 L2: 30.53 M: 24.47 ( 14.84%) HT: 22.28 VT: 21.72 R: 21.13 RT: 12.58 ( 105Kops/s) over_n_0565 = L1: 41.42 L2: 40.00 M: 30.95 ( 19.13%) HT: 27.06 VT: 27.28 R: 23.43 RT: 14.44 ( 114Kops/s) over_8888_0565 = L1: 12.73 L2: 11.53 M: 9.07 ( 16.47%) HT: 9.00 VT: 9.25 R: 8.44 RT: 7.27 ( 76Kops/s) over_8888_0565 = L1: 23.72 L2: 21.76 M: 15.89 ( 29.51%) HT: 14.36 VT: 14.05 R: 12.44 RT: 8.94 ( 86Kops/s) over_n_8_0565 = L1: 6.80 L2: 7.15 M: 6.37 ( 7.90%) HT: 6.58 VT: 6.24 R: 6.49 RT: 5.94 ( 59Kops/s) over_n_8_0565 = L1: 12.06 L2: 11.02 M: 10.16 ( 13.43%) HT: 9.57 VT: 8.49 R: 9.10 RT: 6.86 ( 69Kops/s) over_n_8888_0565_ca = L1: 7.62 L2: 7.01 M: 6.27 ( 20.52%) HT: 6.00 VT: 6.07 R: 5.68 RT: 5.53 ( 57Kops/s) over_n_8888_0565_ca = L1: 13.54 L2: 11.96 M: 9.76 ( 30.66%) HT: 9.72 VT: 8.45 R: 9.37 RT: 6.85 ( 67Kops/s)
2012-05-26	mmx: add and use expand_4xpacked565 function	Matt Turner	2	-6/+59
	Loongson: add_0565_0565 = L1: 14.39 L2: 13.98 M: 11.28 ( 15.22%) HT: 10.11 VT: 9.74 R: 9.39 RT: 6.05 ( 67Kops/s) add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 11.12 L2: 10.40 M: 8.82 ( 10.65%) HT: 7.98 VT: 7.41 R: 7.57 RT: 5.21 ( 54Kops/s) add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s)
2012-05-26	Post-release version bump to 0.27.1	Søren Sandmann Pedersen	1	-2/+2

2012-05-26	Pre-release version bump to 0.26.0	Søren Sandmann Pedersen	1	-2/+2

2012-05-25	Fix MSVC compilation	Ingmar Runge	1	-2/+7
	Only up to three SSE intrinsics supported in function declaration.
2012-05-24	test: Composite with solid images instead of using pixman_image_fill_*	Søren Sandmann Pedersen	2	-11/+15
	There is a couple of places where the test suite uses the pixman_image_fill_* functions to initialize images. These functions can fail, and will do so if the "fast" implementation is disabled. So to make sure the test suite passes even using PIXMAN_DISABLE="fast", use pixman_image_composite32() with a solid image instead of pixman_image_fill_*.
2012-05-23	MIPS: DSPr2: Added bilinear over_8888_8_8888 fast path.	Nemanja Lukic	4	-0/+177
	Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): cairo-perf-trace: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.25.3 [ 0] image firefox-fishtank 2289.180 2290.567 0.05% 5/6 Optimized: cairo-perf-trace: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.25.3 [ 0] image firefox-fishtank 1700.925 1708.314 0.22% 5/6
2012-05-23	MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines	Nemanja Lukic	1	-32/+28
	In main loop (unrolled by factor 2), instead of negating multiplied mask values by srca, values of srca was negated, and passed as alpha argument for UN8x4_MUL_UN8x4_ADD_UN8x4 macro. Instead of: ma = ~ma; UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s); Code was doing this: ma = ~srca; UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s); Key is in substituting registers s0/s1 (containing srca value), with t0/t1 containing mask values multiplied by srca. Register usage is also improved (less registers are saved on stack, for over_n_8888_8888_ca routine). The bug was introduced in commit d2ee5631 and revealed by composite test.
2012-05-20	demos: Add parrot.jpg to EXTRA_DIST	Søren Sandmann Pedersen	1	-1/+1
	Pointed out by Cyril Brulebois.
2012-05-15	configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt	Matt Turner	1	-0/+3
	If not compiling with -march=iwmmxt, the configure test will still pass, thinking that the __builtin_arm_* intrinsic is a function instead of generating a single instruction. Since no linking is done, the configure test doesn't catch this, and we get linking errors in the build.