~podain/pixman - Private pixman repository

Age	Commit message (Collapse)	Author	Files	Lines
2011-10-18	ARM: NEON: Fix assembly typo error in src_n_8_8888HEAD master	Taekyun Kim	1	-1/+1
	Binutils 2.21 does not complain about missing comma between ARM register and alignement specifier in vld/vst instructions which causes build error on binutils 2.20.
2011-10-18	ARM: NEON: Standard fast path src_n_8_8	Taekyun Kim	2	-0/+69
	Performance numbers of before/after on cortex-a8 @ 1GHz - before L1: 28.05 L2: 28.26 M: 26.97 ( 4.48%) HT: 19.79 VT: 19.14 R: 17.61 RT: 9.88 ( 101Kops/s) - after L1:1430.28 L2:1252.10 M:421.93 ( 75.48%) HT:170.16 VT:138.03 R:145.86 RT: 35.51 ( 255Kops/s)
2011-10-18	ARM: NEON: Standard fast path src_n_8_8888	Taekyun Kim	2	-0/+80
	Performance numbers of before/after on cortex-a8 @ 1GHz - before L1: 32.39 L2: 31.79 M: 30.84 ( 13.77%) HT: 21.58 VT: 19.75 R: 18.83 RT: 10.46 ( 106Kops/s) - after L1: 516.25 L2: 372.00 M:193.49 ( 85.59%) HT:136.93 VT:109.10 R:104.48 RT: 34.77 ( 253Kops/s)
2011-10-18	ARM: NEON: Instruction scheduling of bilinear over_8888_8_8888	Taekyun Kim	1	-4/+158
	Instructions are reordered to eliminate pipeline stalls and get better memory access. Performance of before/after on cortex-a8 @ 1GHz << 2000 x 2000 with scale factor close to 1.x >> before : 40.53 Mpix/s after : 50.76 Mpix/s
2011-10-18	ARM: NEON: Instruction scheduling of bilinear over_8888_8888	Taekyun Kim	1	-3/+146
	Instructions are reordered to eliminate pipeline stalls and get better memory access. Performance of before/after on cortex-a8 @ 1GHz << 2000 x 2000 with scale factor close to 1.x >> before : 50.43 Mpix/s after : 61.09 Mpix/s
2011-10-18	ARM: NEON: Replace old bilinear scanline generator with new template	Taekyun Kim	1	-192/+292
	Bilinear scanline functions in pixman-arm-neon-asm-bilinear.S can be replaced with new template just by wrapping existing macros.
2011-10-18	ARM: NEON: Bilinear macro template for instruction scheduling	Taekyun Kim	1	-0/+195
	This macro template takes 6 code blocks. 1. process_last_pixel 2. process_two_pixels 3. process_four_pixels 4. process_pixblock_head 5. process_pixblock_tail 6. process_pixblock_tail_head process_last_pixel does not need to update horizontal weight. This is done by the template. two and four code block should update horizontal weight inside of them. head/tail/tail_head blocks consist unrolled core loop. You can apply instruction scheduling to the tail_head blocks. You can also specify size of the pixel block. Supported size is 4 and 8. If you want to use mask, give BILINEAR_FLAG_USE_MASK flags to the template, then you can use register MASK. When using d8~d15 registers, give BILINEAR_FLAG_USE_ALL_NEON_REGS to make sure registers are properly saved on the stack and later restored.
2011-10-18	ARM: NEON: Some cleanup of bilinear scanline functions	Taekyun Kim	1	-61/+67
	Use STRIDE and initial horizontal weight update is done before entering interpolation loop. Cache preload for mask and dst.
2011-10-11	Post-release version bump to 0.23.7	Søren Sandmann Pedersen	1	-1/+1

2011-10-11	Pre-release version bump to 0.23.6	Søren Sandmann Pedersen	1	-3/+3

2011-10-10	Simple repeat: Extend too short source scanlines into temporary buffer	Taekyun Kim	1	-3/+92
	Too short scanlines can cause repeat handling overhead and optimized pixman composite functions usually process a bunch of pixels in a single loop iteration it might be beneficial to pre-extend source scanlines. The temporary buffers will usually reside in cache, so accessing them should be quite efficient.
2011-10-10	Simple repeat fast path	Taekyun Kim	1	-0/+89
	We can implement simple repeat by stitching existing fast path functions. First lookup COVER_CLIP function for given input and then stitch horizontally using the function.
2011-10-10	Move _pixman_lookup_composite_function() to pixman-utils.c	Taekyun Kim	3	-116/+127

2011-10-10	Add src, mask, and dest flags to the composite args struct.	Søren Sandmann Pedersen	2	-0/+7
	These flags are useful in the various compositing routines, and the flags stored in the image structs are missing some bits of information that can only be computed when pixman_image_composite() is called.
2011-10-10	Add new fast path flag FAST_PATH_BITS_IMAGE	Taekyun Kim	2	-0/+2
	This fast path flag indicate that type of the image is bits image.
2011-10-10	init/fini functions for pixman_image_t	Taekyun Kim	3	-76/+121
	pixman_image_t itself can be on stack or heap. So segregating init/fini from create/unref can be useful when we want to use pixman_image_t on stack or other memory.
2011-10-10	sse2: Bilinear scaled over_8888_8_8888	Taekyun Kim	1	-0/+168

2011-10-10	sse2: Bilinear scaled over_8888_8888	Taekyun Kim	1	-1/+106

2011-10-10	sse2: Macros for assembling bilinear interpolation code fractions	Taekyun Kim	1	-80/+77
	Primitive bilinear interpolation code is reusable to implement other bilinear functions. BILINEAR_DECLARE_VARIABLES - Declare variables needed to interpolate src pixels. BILINEAR_INTERPOLATE_ONE_PIXEL - Interpolate one pixel and advance to next pixel BILINEAR_SKIP_ONE_PIXEL - Skip interpolation and just advance to next pixel This is useful for skipping zero mask
2011-10-06	Correct the minimum gcc version needed for iwmmxt	Matt Turner	1	-1/+1
	Spotted by Søren Sandmann. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-10-06	Make sure iwMMXt is only detected on ARM	Matt Turner	1	-0/+3
	iwMMXt is incorrectly detected on x86 and amd64. This happens because the test uses standard _mm_* intrinsic functions which it compiles with -march=iwmmxt, but when the user has set CFLAGS=-march=k8 for instance, no error is generated from -march=iwmmxt, even though it's not a valid flag on x86/amd64. Passing CFLAGS=-march=native does not override the -march=iwmmxt flag though, which is why it wasn't noticed before. So, just #error out in the test if the __arm__ preprocessor directive isn't defined. Fixes https://bugs.gentoo.org/show_bug.cgi?id=385179 Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-28	Don't include stdint.h in scaling-helpers-test.	Søren Sandmann Pedersen	1	-1/+0
	Fixes bug 41257.
2011-09-28	build: replace @VAR@ with $(VAR) in makefiles	Benjamin Otte	2	-6/+6

2011-09-28	tests: Add PNG_CFLAGS/LIBS to tests	Benjamin Otte	1	-2/+2
	PNG flags were accidentally included by gdk-pixbuf. This has been fixed recently, so we need to make sure to include it ourselves.
2011-09-27	mmx: optimize unaligned 64-bit ARM/iwmmxt loads	Matt Turner	1	-0/+7
	Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27	mmx: compile on ARM for iwmmxt optimizations	Matt Turner	4	-4/+82
	Check in configure for at least gcc-4.6, since gcc-4.7 (and hopefully 4.6) will be the eariest version capable of compiling the _mm_* intrinsics on ARM/iwmmxt. Even for suitable compile versions I use _mm_srli_si64 which is known to cause unpatched compilers to fail. Select iwmmxt at runtime only after NEON, since we expect the NEON optimizations to be more capable and faster than iwmmxt. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27	mmx: prepare pixman-mmx.c to be compiled for ARM/iwmmxt	Matt Turner	1	-2/+11
	Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27	mmx: fix unaligned accesses	Matt Turner	1	-56/+129
	Simply return *p in the unaligned access functions, since alignment constraints are very relaxed on x86 and this allows us to generate identical code as before. Tested with the test suite, lowlevel-blit-test, and cairo-perf-trace on ARM and Alpha with no unaligned accesses found. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27	mmx: wrap x86/MMX inline assembly in ifdef USE_X86_MMX	Matt Turner	1	-4/+4
	Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27	mmx: rename USE_MMX to USE_X86_MMX	Matt Turner	6	-12/+12
	This will make upcoming ARM usage of pixman-mmx.c unambiguous. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26	mmx: convert while (w) to if (w) when possible	Matt Turner	1	-24/+5
	gcc isn't able to see that w is no greater than 1, so it generates unnecessary loop instructions with while (w). Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26	mmx: fix formats in commented code	Matt Turner	1	-2/+2
	b8r8g8 is apparently no longer supported sometime since this code was commented. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26	lowlevel-blt: add over_x888_8_8888	Matt Turner	1	-0/+1
	Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-21	BILINEAR->NEAREST filter optimization for simple rotation and translation	Siarhei Siamashka	1	-1/+38
	Simple rotation and translation are the additional cases when BILINEAR filter can be safely reduced to NEAREST.
2011-09-21	Strength-reduce BILINEAR filter to NEAREST filter for identity transforms	Søren Sandmann Pedersen	5	-38/+62
	An image with a bilinear filter and an identity transform is equivalent to one with a nearest filter, so there is no reason the standard fast paths shouldn't be usable. But because a BILINEAR filter samples a 2x2 pixel block in the source image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the source area is the entire image, because some compositing operations might then read pixels outside the image. This patch fixes the problem by splitting the FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip covers the samples taking into account NEAREST/BILINEAR filters respectively. All the existing compositing operations that require FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which filter they depend on. In compute_image_info() both COVER_CILP_NEAREST and COVER_CLIP_BILINEAR can be set depending on how much room there is around the clip rectangle. Finally, images with an identity transform and a bilinear filter get FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER. Performance measurementas with render_bench against Xephyr: Before * ROUND 1 * --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 5.720 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 5.149 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.237 sec. After: * ROUND 1 * --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 4.947 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 4.487 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.235 sec.
2011-09-21	test: Occasionally use a BILINEAR filter in blitters-test	Søren Sandmann Pedersen	1	-1/+4
	To test that reductions of BILINEAR->NEAREST for identity transformations happen correctly, occasionally use a bilinear filter in blitters test.
2011-09-21	test: better coverage for BILINEAR->NEAREST filter optimization	Siarhei Siamashka	1	-8/+32
	The upcoming optimization which is going to be able to replace BILINEAR filter with NEAREST where appropriate needs to analyze the transformation matrix and not to make any mistakes. The changes to affine-test include: 1. Higher chance of using the same scale factor for x and y axes. This can help to stress some special cases (for example the case when both x and y scale factors are integer). The same applies to x/y translation. 2. Introduced a small chance for "corrupting" transformation matrix by flipping random bits. This supposedly can help to identify the cases when some of the fast paths or other code logic is wrongly activated due to insufficient checks.
2011-09-21	Eliminate compute_sample_extents() function	Søren Sandmann Pedersen	1	-58/+42
	In analyze_extents(), instead of calling compute_sample_extents() call compute_transformed_extents() and inline the remaining part of compute_sample_extents(). The upcoming bilinear->nearest optimization will do something different with these two pieces of code.
2011-09-21	Split computation of sample area into own function	Søren Sandmann Pedersen	1	-62/+76
	compute_sample_extents() have two parts: one that computes the transformed extents, and one that checks whether the computed extents fit within the 16.16 coordinate space. Split the first part into its own function compute_transformed_extents().
2011-09-21	Remove x and y coordinates from analyze_extents() and compute_sample_extents()	Søren Sandmann Pedersen	1	-26/+37
	These coordinates were only ever used for subtracting from the extents box to put it into the coordinate space of the image, so we might as well do this coordinate translation only once before entering the functions.
2011-09-20	Use MAKE_ACCESSORS() to generate accessors for paletted formats	Søren Sandmann Pedersen	1	-230/+46
	Add support in convert_pixel_from_a8r8g8b8() and convert_pixel_to_a8r8g8b8() for conversion to/from paletted formats, then use MAKE_ACCESSORS() to generate accessors for the indexed formats: c8, g8, g4, c4, g1
2011-09-20	Use MAKE_ACCESSORS() to generate accessors for the a1 format.	Søren Sandmann Pedersen	1	-79/+46
	Add FETCH_1 and STORE_1 macros and use them to add support for 1bpp pixels to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the accessors for the a1 format. (Not the g1 format as it is indexed).
2011-09-20	Use MAKE_ACCESSORS() to generate accessors for 24bpp formats	Søren Sandmann Pedersen	1	-153/+46
	Add FETCH_24 and STORE_24 macros and use them to add support for 24bpp pixels in fetch_and_convert_pixel() and convert_and_store_pixel(). Then use MAKE_ACCESSORS() to generate accessors for the 24 bpp formats: r8g8b8 b8g8r8
2011-09-20	Use MAKE_ACCESSORS() to generate accessors for 4 bpp RGB formats	Søren Sandmann Pedersen	1	-381/+70
	Use FETCH_4 and STORE_4 macros to add support for 4bpp pixels to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate accessors for 4 bpp formats, except g4 and c4 which are indexed: a4 r1g2b1 b1g2r1 a1r1g1b1 a1b1g1r1
2011-09-20	Use MAKE_ACCESSORS() to generate accessors for 8bpp RGB formats	Søren Sandmann Pedersen	1	-382/+14
	Add support for 8 bpp formats to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the accessors for all the 8 bpp formats, except g8 and c8, which are indexed: a8 r3g3b2 b2g3r3 a2r2g2b2 a2b2g2r2 x4a4
2011-09-20	Use MAKE_ACCESSORS() to generate accessors for all the 16bpp formats	Søren Sandmann Pedersen	1	-640/+18
	Add support for 16bpp pixels to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate accessors for all the 16bpp formats: r5g6b5 b5g6r5 a1r5g5b5 x1r5g5b5 a1b5g5r5 x1b5g5r5 a4r4g4b4 x4r4g4b4 a4b4g4r4 x4b4g4r4
2011-09-20	Use MAKE_ACCESSORS() to generate all the 32 bit accessors	Søren Sandmann Pedersen	1	-466/+17
	Add support for 32bpp formats in fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate accessors for all the 32 bpp formats: a8r8g8b8 x8r8g8b8 a8b8g8r8 x8b8g8r8 x14r6g6b6 b8g8r8a8 b8g8r8x8 r8g8b8x8 r8g8b8a8
2011-09-20	Add initial version of the MAKE_ACCESSORS() macro	Søren Sandmann Pedersen	1	-0/+114
	This macro will eventually allow the fetchers and storers to be generated automatically. For now, it's just a skeleton that doesn't actually do anything.
2011-09-20	Add general pixel converter	Søren Sandmann Pedersen	1	-0/+100
	This function can convert between any <= 32 bpp formats. Nothing uses it yet.
2011-09-20	Add a generic unorm_to_unorm() conversion utility	Søren Sandmann Pedersen	2	-29/+48
	This function can convert between normalized numbers of different depths. When converting to higher bit depths, it will replicate the existing bits, when converting to lower bit depths, it will simply truncate. This function replaces the expand16() function in pixman-utils.c