summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-10-18ARM: NEON: Fix assembly typo error in src_n_8_8888HEADmasterTaekyun Kim1-1/+1
Binutils 2.21 does not complain about missing comma between ARM register and alignement specifier in vld/vst instructions which causes build error on binutils 2.20.
2011-10-18ARM: NEON: Standard fast path src_n_8_8Taekyun Kim2-0/+69
Performance numbers of before/after on cortex-a8 @ 1GHz - before L1: 28.05 L2: 28.26 M: 26.97 ( 4.48%) HT: 19.79 VT: 19.14 R: 17.61 RT: 9.88 ( 101Kops/s) - after L1:1430.28 L2:1252.10 M:421.93 ( 75.48%) HT:170.16 VT:138.03 R:145.86 RT: 35.51 ( 255Kops/s)
2011-10-18ARM: NEON: Standard fast path src_n_8_8888Taekyun Kim2-0/+80
Performance numbers of before/after on cortex-a8 @ 1GHz - before L1: 32.39 L2: 31.79 M: 30.84 ( 13.77%) HT: 21.58 VT: 19.75 R: 18.83 RT: 10.46 ( 106Kops/s) - after L1: 516.25 L2: 372.00 M:193.49 ( 85.59%) HT:136.93 VT:109.10 R:104.48 RT: 34.77 ( 253Kops/s)
2011-10-18ARM: NEON: Instruction scheduling of bilinear over_8888_8_8888Taekyun Kim1-4/+158
Instructions are reordered to eliminate pipeline stalls and get better memory access. Performance of before/after on cortex-a8 @ 1GHz << 2000 x 2000 with scale factor close to 1.x >> before : 40.53 Mpix/s after : 50.76 Mpix/s
2011-10-18ARM: NEON: Instruction scheduling of bilinear over_8888_8888Taekyun Kim1-3/+146
Instructions are reordered to eliminate pipeline stalls and get better memory access. Performance of before/after on cortex-a8 @ 1GHz << 2000 x 2000 with scale factor close to 1.x >> before : 50.43 Mpix/s after : 61.09 Mpix/s
2011-10-18ARM: NEON: Replace old bilinear scanline generator with new templateTaekyun Kim1-192/+292
Bilinear scanline functions in pixman-arm-neon-asm-bilinear.S can be replaced with new template just by wrapping existing macros.
2011-10-18ARM: NEON: Bilinear macro template for instruction schedulingTaekyun Kim1-0/+195
This macro template takes 6 code blocks. 1. process_last_pixel 2. process_two_pixels 3. process_four_pixels 4. process_pixblock_head 5. process_pixblock_tail 6. process_pixblock_tail_head process_last_pixel does not need to update horizontal weight. This is done by the template. two and four code block should update horizontal weight inside of them. head/tail/tail_head blocks consist unrolled core loop. You can apply instruction scheduling to the tail_head blocks. You can also specify size of the pixel block. Supported size is 4 and 8. If you want to use mask, give BILINEAR_FLAG_USE_MASK flags to the template, then you can use register MASK. When using d8~d15 registers, give BILINEAR_FLAG_USE_ALL_NEON_REGS to make sure registers are properly saved on the stack and later restored.
2011-10-18ARM: NEON: Some cleanup of bilinear scanline functionsTaekyun Kim1-61/+67
Use STRIDE and initial horizontal weight update is done before entering interpolation loop. Cache preload for mask and dst.
2011-10-11Post-release version bump to 0.23.7Søren Sandmann Pedersen1-1/+1
2011-10-11Pre-release version bump to 0.23.6Søren Sandmann Pedersen1-3/+3
2011-10-10Simple repeat: Extend too short source scanlines into temporary bufferTaekyun Kim1-3/+92
Too short scanlines can cause repeat handling overhead and optimized pixman composite functions usually process a bunch of pixels in a single loop iteration it might be beneficial to pre-extend source scanlines. The temporary buffers will usually reside in cache, so accessing them should be quite efficient.
2011-10-10Simple repeat fast pathTaekyun Kim1-0/+89
We can implement simple repeat by stitching existing fast path functions. First lookup COVER_CLIP function for given input and then stitch horizontally using the function.
2011-10-10Move _pixman_lookup_composite_function() to pixman-utils.cTaekyun Kim3-116/+127
2011-10-10Add src, mask, and dest flags to the composite args struct.Søren Sandmann Pedersen2-0/+7
These flags are useful in the various compositing routines, and the flags stored in the image structs are missing some bits of information that can only be computed when pixman_image_composite() is called.
2011-10-10Add new fast path flag FAST_PATH_BITS_IMAGETaekyun Kim2-0/+2
This fast path flag indicate that type of the image is bits image.
2011-10-10init/fini functions for pixman_image_tTaekyun Kim3-76/+121
pixman_image_t itself can be on stack or heap. So segregating init/fini from create/unref can be useful when we want to use pixman_image_t on stack or other memory.
2011-10-10sse2: Bilinear scaled over_8888_8_8888Taekyun Kim1-0/+168
2011-10-10sse2: Bilinear scaled over_8888_8888Taekyun Kim1-1/+106
2011-10-10sse2: Macros for assembling bilinear interpolation code fractionsTaekyun Kim1-80/+77
Primitive bilinear interpolation code is reusable to implement other bilinear functions. BILINEAR_DECLARE_VARIABLES - Declare variables needed to interpolate src pixels. BILINEAR_INTERPOLATE_ONE_PIXEL - Interpolate one pixel and advance to next pixel BILINEAR_SKIP_ONE_PIXEL - Skip interpolation and just advance to next pixel This is useful for skipping zero mask
2011-10-06Correct the minimum gcc version needed for iwmmxtMatt Turner1-1/+1
Spotted by Søren Sandmann. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-10-06Make sure iwMMXt is only detected on ARMMatt Turner1-0/+3
iwMMXt is incorrectly detected on x86 and amd64. This happens because the test uses standard _mm_* intrinsic functions which it compiles with -march=iwmmxt, but when the user has set CFLAGS=-march=k8 for instance, no error is generated from -march=iwmmxt, even though it's not a valid flag on x86/amd64. Passing CFLAGS=-march=native does not override the -march=iwmmxt flag though, which is why it wasn't noticed before. So, just #error out in the test if the __arm__ preprocessor directive isn't defined. Fixes https://bugs.gentoo.org/show_bug.cgi?id=385179 Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-28Don't include stdint.h in scaling-helpers-test.Søren Sandmann Pedersen1-1/+0
Fixes bug 41257.
2011-09-28build: replace @VAR@ with $(VAR) in makefilesBenjamin Otte2-6/+6
2011-09-28tests: Add PNG_CFLAGS/LIBS to testsBenjamin Otte1-2/+2
PNG flags were accidentally included by gdk-pixbuf. This has been fixed recently, so we need to make sure to include it ourselves.
2011-09-27mmx: optimize unaligned 64-bit ARM/iwmmxt loadsMatt Turner1-0/+7
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27mmx: compile on ARM for iwmmxt optimizationsMatt Turner4-4/+82
Check in configure for at least gcc-4.6, since gcc-4.7 (and hopefully 4.6) will be the eariest version capable of compiling the _mm_* intrinsics on ARM/iwmmxt. Even for suitable compile versions I use _mm_srli_si64 which is known to cause unpatched compilers to fail. Select iwmmxt at runtime only after NEON, since we expect the NEON optimizations to be more capable and faster than iwmmxt. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27mmx: prepare pixman-mmx.c to be compiled for ARM/iwmmxtMatt Turner1-2/+11
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27mmx: fix unaligned accessesMatt Turner1-56/+129
Simply return *p in the unaligned access functions, since alignment constraints are very relaxed on x86 and this allows us to generate identical code as before. Tested with the test suite, lowlevel-blit-test, and cairo-perf-trace on ARM and Alpha with no unaligned accesses found. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27mmx: wrap x86/MMX inline assembly in ifdef USE_X86_MMXMatt Turner1-4/+4
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27mmx: rename USE_MMX to USE_X86_MMXMatt Turner6-12/+12
This will make upcoming ARM usage of pixman-mmx.c unambiguous. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26mmx: convert while (w) to if (w) when possibleMatt Turner1-24/+5
gcc isn't able to see that w is no greater than 1, so it generates unnecessary loop instructions with while (w). Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26mmx: fix formats in commented codeMatt Turner1-2/+2
b8r8g8 is apparently no longer supported sometime since this code was commented. Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26lowlevel-blt: add over_x888_8_8888Matt Turner1-0/+1
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-21BILINEAR->NEAREST filter optimization for simple rotation and translationSiarhei Siamashka1-1/+38
Simple rotation and translation are the additional cases when BILINEAR filter can be safely reduced to NEAREST.
2011-09-21Strength-reduce BILINEAR filter to NEAREST filter for identity transformsSøren Sandmann Pedersen5-38/+62
An image with a bilinear filter and an identity transform is equivalent to one with a nearest filter, so there is no reason the standard fast paths shouldn't be usable. But because a BILINEAR filter samples a 2x2 pixel block in the source image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the source area is the entire image, because some compositing operations might then read pixels outside the image. This patch fixes the problem by splitting the FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip covers the samples taking into account NEAREST/BILINEAR filters respectively. All the existing compositing operations that require FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which filter they depend on. In compute_image_info() both COVER_CILP_NEAREST and COVER_CLIP_BILINEAR can be set depending on how much room there is around the clip rectangle. Finally, images with an identity transform and a bilinear filter get FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER. Performance measurementas with render_bench against Xephyr: Before *** ROUND 1 *** --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 5.720 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 5.149 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.237 sec. After: *** ROUND 1 *** --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 4.947 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 4.487 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.235 sec.
2011-09-21test: Occasionally use a BILINEAR filter in blitters-testSøren Sandmann Pedersen1-1/+4
To test that reductions of BILINEAR->NEAREST for identity transformations happen correctly, occasionally use a bilinear filter in blitters test.
2011-09-21test: better coverage for BILINEAR->NEAREST filter optimizationSiarhei Siamashka1-8/+32
The upcoming optimization which is going to be able to replace BILINEAR filter with NEAREST where appropriate needs to analyze the transformation matrix and not to make any mistakes. The changes to affine-test include: 1. Higher chance of using the same scale factor for x and y axes. This can help to stress some special cases (for example the case when both x and y scale factors are integer). The same applies to x/y translation. 2. Introduced a small chance for "corrupting" transformation matrix by flipping random bits. This supposedly can help to identify the cases when some of the fast paths or other code logic is wrongly activated due to insufficient checks.
2011-09-21Eliminate compute_sample_extents() functionSøren Sandmann Pedersen1-58/+42
In analyze_extents(), instead of calling compute_sample_extents() call compute_transformed_extents() and inline the remaining part of compute_sample_extents(). The upcoming bilinear->nearest optimization will do something different with these two pieces of code.
2011-09-21Split computation of sample area into own functionSøren Sandmann Pedersen1-62/+76
compute_sample_extents() have two parts: one that computes the transformed extents, and one that checks whether the computed extents fit within the 16.16 coordinate space. Split the first part into its own function compute_transformed_extents().
2011-09-21Remove x and y coordinates from analyze_extents() and compute_sample_extents()Søren Sandmann Pedersen1-26/+37
These coordinates were only ever used for subtracting from the extents box to put it into the coordinate space of the image, so we might as well do this coordinate translation only once before entering the functions.
2011-09-20Use MAKE_ACCESSORS() to generate accessors for paletted formatsSøren Sandmann Pedersen1-230/+46
Add support in convert_pixel_from_a8r8g8b8() and convert_pixel_to_a8r8g8b8() for conversion to/from paletted formats, then use MAKE_ACCESSORS() to generate accessors for the indexed formats: c8, g8, g4, c4, g1
2011-09-20Use MAKE_ACCESSORS() to generate accessors for the a1 format.Søren Sandmann Pedersen1-79/+46
Add FETCH_1 and STORE_1 macros and use them to add support for 1bpp pixels to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the accessors for the a1 format. (Not the g1 format as it is indexed).
2011-09-20Use MAKE_ACCESSORS() to generate accessors for 24bpp formatsSøren Sandmann Pedersen1-153/+46
Add FETCH_24 and STORE_24 macros and use them to add support for 24bpp pixels in fetch_and_convert_pixel() and convert_and_store_pixel(). Then use MAKE_ACCESSORS() to generate accessors for the 24 bpp formats: r8g8b8 b8g8r8
2011-09-20Use MAKE_ACCESSORS() to generate accessors for 4 bpp RGB formatsSøren Sandmann Pedersen1-381/+70
Use FETCH_4 and STORE_4 macros to add support for 4bpp pixels to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate accessors for 4 bpp formats, except g4 and c4 which are indexed: a4 r1g2b1 b1g2r1 a1r1g1b1 a1b1g1r1
2011-09-20Use MAKE_ACCESSORS() to generate accessors for 8bpp RGB formatsSøren Sandmann Pedersen1-382/+14
Add support for 8 bpp formats to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the accessors for all the 8 bpp formats, except g8 and c8, which are indexed: a8 r3g3b2 b2g3r3 a2r2g2b2 a2b2g2r2 x4a4
2011-09-20Use MAKE_ACCESSORS() to generate accessors for all the 16bpp formatsSøren Sandmann Pedersen1-640/+18
Add support for 16bpp pixels to fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate accessors for all the 16bpp formats: r5g6b5 b5g6r5 a1r5g5b5 x1r5g5b5 a1b5g5r5 x1b5g5r5 a4r4g4b4 x4r4g4b4 a4b4g4r4 x4b4g4r4
2011-09-20Use MAKE_ACCESSORS() to generate all the 32 bit accessorsSøren Sandmann Pedersen1-466/+17
Add support for 32bpp formats in fetch_and_convert_pixel() and convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate accessors for all the 32 bpp formats: a8r8g8b8 x8r8g8b8 a8b8g8r8 x8b8g8r8 x14r6g6b6 b8g8r8a8 b8g8r8x8 r8g8b8x8 r8g8b8a8
2011-09-20Add initial version of the MAKE_ACCESSORS() macroSøren Sandmann Pedersen1-0/+114
This macro will eventually allow the fetchers and storers to be generated automatically. For now, it's just a skeleton that doesn't actually do anything.
2011-09-20Add general pixel converterSøren Sandmann Pedersen1-0/+100
This function can convert between any <= 32 bpp formats. Nothing uses it yet.
2011-09-20Add a generic unorm_to_unorm() conversion utilitySøren Sandmann Pedersen2-29/+48
This function can convert between normalized numbers of different depths. When converting to higher bit depths, it will replicate the existing bits, when converting to lower bit depths, it will simply truncate. This function replaces the expand16() function in pixman-utils.c