Age | Commit message (Collapse) | Author | Files | Lines |
|
Binutils 2.21 does not complain about missing comma between ARM
register and alignement specifier in vld/vst instructions which
causes build error on binutils 2.20.
|
|
Performance numbers of before/after on cortex-a8 @ 1GHz
- before
L1: 28.05 L2: 28.26 M: 26.97 ( 4.48%) HT: 19.79 VT: 19.14 R: 17.61 RT: 9.88 ( 101Kops/s)
- after
L1:1430.28 L2:1252.10 M:421.93 ( 75.48%) HT:170.16 VT:138.03 R:145.86 RT: 35.51 ( 255Kops/s)
|
|
Performance numbers of before/after on cortex-a8 @ 1GHz
- before
L1: 32.39 L2: 31.79 M: 30.84 ( 13.77%) HT: 21.58 VT: 19.75 R: 18.83 RT: 10.46 ( 106Kops/s)
- after
L1: 516.25 L2: 372.00 M:193.49 ( 85.59%) HT:136.93 VT:109.10 R:104.48 RT: 34.77 ( 253Kops/s)
|
|
Instructions are reordered to eliminate pipeline stalls and get
better memory access.
Performance of before/after on cortex-a8 @ 1GHz
<< 2000 x 2000 with scale factor close to 1.x >>
before : 40.53 Mpix/s
after : 50.76 Mpix/s
|
|
Instructions are reordered to eliminate pipeline stalls and get
better memory access.
Performance of before/after on cortex-a8 @ 1GHz
<< 2000 x 2000 with scale factor close to 1.x >>
before : 50.43 Mpix/s
after : 61.09 Mpix/s
|
|
Bilinear scanline functions in pixman-arm-neon-asm-bilinear.S can
be replaced with new template just by wrapping existing macros.
|
|
This macro template takes 6 code blocks.
1. process_last_pixel
2. process_two_pixels
3. process_four_pixels
4. process_pixblock_head
5. process_pixblock_tail
6. process_pixblock_tail_head
process_last_pixel does not need to update horizontal weight. This
is done by the template. two and four code block should update
horizontal weight inside of them. head/tail/tail_head blocks
consist unrolled core loop. You can apply instruction scheduling
to the tail_head blocks.
You can also specify size of the pixel block. Supported size is 4
and 8. If you want to use mask, give BILINEAR_FLAG_USE_MASK flags
to the template, then you can use register MASK. When using d8~d15
registers, give BILINEAR_FLAG_USE_ALL_NEON_REGS to make sure
registers are properly saved on the stack and later restored.
|
|
Use STRIDE and initial horizontal weight update is done before
entering interpolation loop. Cache preload for mask and dst.
|
|
|
|
|
|
Too short scanlines can cause repeat handling overhead and optimized
pixman composite functions usually process a bunch of pixels in a
single loop iteration it might be beneficial to pre-extend source
scanlines. The temporary buffers will usually reside in cache, so
accessing them should be quite efficient.
|
|
We can implement simple repeat by stitching existing fast path
functions. First lookup COVER_CLIP function for given input and
then stitch horizontally using the function.
|
|
|
|
These flags are useful in the various compositing routines, and the
flags stored in the image structs are missing some bits of information
that can only be computed when pixman_image_composite() is called.
|
|
This fast path flag indicate that type of the image is bits image.
|
|
pixman_image_t itself can be on stack or heap. So segregating
init/fini from create/unref can be useful when we want to use
pixman_image_t on stack or other memory.
|
|
|
|
|
|
Primitive bilinear interpolation code is reusable to implement other
bilinear functions.
BILINEAR_DECLARE_VARIABLES
- Declare variables needed to interpolate src pixels.
BILINEAR_INTERPOLATE_ONE_PIXEL
- Interpolate one pixel and advance to next pixel
BILINEAR_SKIP_ONE_PIXEL
- Skip interpolation and just advance to next pixel
This is useful for skipping zero mask
|
|
Spotted by Søren Sandmann.
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
iwMMXt is incorrectly detected on x86 and amd64. This happens because
the test uses standard _mm_* intrinsic functions which it compiles with
-march=iwmmxt, but when the user has set CFLAGS=-march=k8 for instance,
no error is generated from -march=iwmmxt, even though it's not a valid
flag on x86/amd64. Passing CFLAGS=-march=native does not override the
-march=iwmmxt flag though, which is why it wasn't noticed before.
So, just #error out in the test if the __arm__ preprocessor directive
isn't defined.
Fixes https://bugs.gentoo.org/show_bug.cgi?id=385179
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Fixes bug 41257.
|
|
|
|
PNG flags were accidentally included by gdk-pixbuf. This has been fixed
recently, so we need to make sure to include it ourselves.
|
|
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Check in configure for at least gcc-4.6, since gcc-4.7 (and hopefully
4.6) will be the eariest version capable of compiling the _mm_*
intrinsics on ARM/iwmmxt. Even for suitable compile versions I use
_mm_srli_si64 which is known to cause unpatched compilers to fail.
Select iwmmxt at runtime only after NEON, since we expect the NEON
optimizations to be more capable and faster than iwmmxt.
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Simply return *p in the unaligned access functions, since alignment
constraints are very relaxed on x86 and this allows us to generate
identical code as before.
Tested with the test suite, lowlevel-blit-test, and cairo-perf-trace on
ARM and Alpha with no unaligned accesses found.
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
This will make upcoming ARM usage of pixman-mmx.c unambiguous.
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
gcc isn't able to see that w is no greater than 1, so it generates
unnecessary loop instructions with while (w).
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
b8r8g8 is apparently no longer supported sometime since this code was
commented.
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Simple rotation and translation are the additional cases when BILINEAR
filter can be safely reduced to NEAREST.
|
|
An image with a bilinear filter and an identity transform is
equivalent to one with a nearest filter, so there is no reason the
standard fast paths shouldn't be usable.
But because a BILINEAR filter samples a 2x2 pixel block in the source
image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the
source area is the entire image, because some compositing operations
might then read pixels outside the image.
This patch fixes the problem by splitting the
FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags
FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip
covers the samples taking into account NEAREST/BILINEAR filters
respectively.
All the existing compositing operations that require
FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick
either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which
filter they depend on.
In compute_image_info() both COVER_CILP_NEAREST and
COVER_CLIP_BILINEAR can be set depending on how much room there is
around the clip rectangle.
Finally, images with an identity transform and a bilinear filter get
FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER.
Performance measurementas with render_bench against Xephyr:
Before
*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 5.720 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 5.149 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.237 sec.
After:
*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 4.947 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 4.487 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.235 sec.
|
|
To test that reductions of BILINEAR->NEAREST for identity
transformations happen correctly, occasionally use a bilinear filter
in blitters test.
|
|
The upcoming optimization which is going to be able to replace BILINEAR filter
with NEAREST where appropriate needs to analyze the transformation matrix
and not to make any mistakes.
The changes to affine-test include:
1. Higher chance of using the same scale factor for x and y axes. This can help
to stress some special cases (for example the case when both x and y scale
factors are integer). The same applies to x/y translation.
2. Introduced a small chance for "corrupting" transformation matrix by flipping
random bits. This supposedly can help to identify the cases when some of the
fast paths or other code logic is wrongly activated due to insufficient checks.
|
|
In analyze_extents(), instead of calling compute_sample_extents() call
compute_transformed_extents() and inline the remaining part of
compute_sample_extents(). The upcoming bilinear->nearest optimization
will do something different with these two pieces of code.
|
|
compute_sample_extents() have two parts: one that computes the
transformed extents, and one that checks whether the computed extents
fit within the 16.16 coordinate space.
Split the first part into its own function
compute_transformed_extents().
|
|
These coordinates were only ever used for subtracting from the extents
box to put it into the coordinate space of the image, so we might as
well do this coordinate translation only once before entering the
functions.
|
|
Add support in convert_pixel_from_a8r8g8b8() and
convert_pixel_to_a8r8g8b8() for conversion to/from paletted formats,
then use MAKE_ACCESSORS() to generate accessors for the indexed
formats: c8, g8, g4, c4, g1
|
|
Add FETCH_1 and STORE_1 macros and use them to add support for 1bpp
pixels to fetch_and_convert_pixel() and convert_and_store_pixel(),
then use MAKE_ACCESSORS() to generate the accessors for the a1
format. (Not the g1 format as it is indexed).
|
|
Add FETCH_24 and STORE_24 macros and use them to add support for 24bpp
pixels in fetch_and_convert_pixel() and
convert_and_store_pixel(). Then use MAKE_ACCESSORS() to generate
accessors for the 24 bpp formats:
r8g8b8
b8g8r8
|
|
Use FETCH_4 and STORE_4 macros to add support for 4bpp pixels to
fetch_and_convert_pixel() and convert_and_store_pixel(), then use
MAKE_ACCESSORS() to generate accessors for 4 bpp formats, except g4 and
c4 which are indexed:
a4
r1g2b1
b1g2r1
a1r1g1b1
a1b1g1r1
|
|
Add support for 8 bpp formats to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the
accessors for all the 8 bpp formats, except g8 and c8, which are
indexed:
a8
r3g3b2
b2g3r3
a2r2g2b2
a2b2g2r2
x4a4
|
|
Add support for 16bpp pixels to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 16bpp formats:
r5g6b5
b5g6r5
a1r5g5b5
x1r5g5b5
a1b5g5r5
x1b5g5r5
a4r4g4b4
x4r4g4b4
a4b4g4r4
x4b4g4r4
|
|
Add support for 32bpp formats in fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 32 bpp formats:
a8r8g8b8
x8r8g8b8
a8b8g8r8
x8b8g8r8
x14r6g6b6
b8g8r8a8
b8g8r8x8
r8g8b8x8
r8g8b8a8
|
|
This macro will eventually allow the fetchers and storers to be
generated automatically. For now, it's just a skeleton that doesn't
actually do anything.
|
|
This function can convert between any <= 32 bpp formats. Nothing uses
it yet.
|
|
This function can convert between normalized numbers of different
depths. When converting to higher bit depths, it will replicate the
existing bits, when converting to lower bit depths, it will simply
truncate.
This function replaces the expand16() function in pixman-utils.c
|