Age | Commit message (Collapse) | Author | Files | Lines |
|
Simple rotation and translation are the additional cases when BILINEAR
filter can be safely reduced to NEAREST.
|
|
An image with a bilinear filter and an identity transform is
equivalent to one with a nearest filter, so there is no reason the
standard fast paths shouldn't be usable.
But because a BILINEAR filter samples a 2x2 pixel block in the source
image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the
source area is the entire image, because some compositing operations
might then read pixels outside the image.
This patch fixes the problem by splitting the
FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags
FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip
covers the samples taking into account NEAREST/BILINEAR filters
respectively.
All the existing compositing operations that require
FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick
either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which
filter they depend on.
In compute_image_info() both COVER_CILP_NEAREST and
COVER_CLIP_BILINEAR can be set depending on how much room there is
around the clip rectangle.
Finally, images with an identity transform and a bilinear filter get
FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER.
Performance measurementas with render_bench against Xephyr:
Before
*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 5.720 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 5.149 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.237 sec.
After:
*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 4.947 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 4.487 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.235 sec.
|
|
To test that reductions of BILINEAR->NEAREST for identity
transformations happen correctly, occasionally use a bilinear filter
in blitters test.
|
|
The upcoming optimization which is going to be able to replace BILINEAR filter
with NEAREST where appropriate needs to analyze the transformation matrix
and not to make any mistakes.
The changes to affine-test include:
1. Higher chance of using the same scale factor for x and y axes. This can help
to stress some special cases (for example the case when both x and y scale
factors are integer). The same applies to x/y translation.
2. Introduced a small chance for "corrupting" transformation matrix by flipping
random bits. This supposedly can help to identify the cases when some of the
fast paths or other code logic is wrongly activated due to insufficient checks.
|
|
In analyze_extents(), instead of calling compute_sample_extents() call
compute_transformed_extents() and inline the remaining part of
compute_sample_extents(). The upcoming bilinear->nearest optimization
will do something different with these two pieces of code.
|
|
compute_sample_extents() have two parts: one that computes the
transformed extents, and one that checks whether the computed extents
fit within the 16.16 coordinate space.
Split the first part into its own function
compute_transformed_extents().
|
|
These coordinates were only ever used for subtracting from the extents
box to put it into the coordinate space of the image, so we might as
well do this coordinate translation only once before entering the
functions.
|
|
|
|
|
|
Profiling ign.com, 20% of the entire render time was absorbed in this
single operation:
<< /content //COLOR_ALPHA /width 480 /height 800 >> surface context
<< /width 1 /height 677 /format //ARGB32 /source <|!!!@jGb!m5gD']#$jFHGWtZcK&2i)Up=!TuR9`G<8;ZQp[FQk;emL9ibhbEL&NTh-j63LhHo$E=mSG,0p71`cRJHcget4%<S\X+~> >> image pattern
//EXTEND_REPEAT set-extend
set-source
n 0 0 480 677 rectangle
fill+
pop
which is a simple composition of a single pixel wide image. Sadly this
is a workaround for lack of independent repeat-x/y handling in cairo and
pixman. Worse still is that the worst-case behaviour of the general repeat
path is for width 1 images...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
New head, tail, tail/head blocks are added and instructions
are reordered to eliminate pipeline stalls
Performance numbers of before/after
- cortex a8 -
before : L1: 375.39 L2: 391.93 M:114.39 ( 40.99%) HT: 99.37 VT: 98.20 R: 90.24 RT: 32.87 ( 240Kops/s)
after : L1: 481.90 L2: 483.46 M:114.29 ( 40.69%) HT:106.91 VT: 93.38 R: 90.74 RT: 29.51 ( 236Kops/s)
- cortex a9 -
before : L1: 324.50 L2: 332.79 M:155.55 ( 47.51%) HT:111.93 VT: 93.58 R: 71.92 RT: 28.21 ( 233Kops/s)
after : L1: 355.87 L2: 364.49 M:156.90 ( 47.59%) HT:111.52 VT: 91.76 R: 72.16 RT: 28.22 ( 234Kops/s)
|
|
tail/head block is expanded and reordered to eliminate stalls
Performance numbers of before/after
- cortex a8 -
before : L1: 201.35 L2: 190.48 M:101.94 ( 54.85%) HT: 78.41 VT: 63.83 R: 58.25 RT: 21.74 ( 191Kops/s)
after : L1: 257.65 L2: 255.49 M:102.04 ( 55.33%) HT: 79.19 VT: 65.46 R: 59.23 RT: 21.12 ( 189Kops/s)
- cortex a9 -
before : L1: 157.35 L2: 159.81 M:133.00 ( 60.94%) HT: 82.44 VT: 63.64 R: 51.66 RT: 19.15 ( 179Kops/s)
after : L1: 216.83 L2: 219.40 M:135.83 ( 61.80%) HT: 85.60 VT: 64.80 R: 52.23 RT: 19.16 ( 179Kops/s)
|
|
llvm-gcc (shipped in Apple XCode 4.1.1 as the default compiler or in
the 2.9 release of LLVM) performs an invalid optimization which
unifies the empty_region and the bad_region structures because they
have the same content.
A bugreport has been filed against Apple Developers Tool for this
issue. This commit works around this bug by making one of the two
structures volatile, so that it cannot be merged.
Fixes region-contains-test.
|
|
Add the makefile rules needed to compile lowlevel-blt-bench on win32
and fix the compilation errors.
|
|
|
|
The repeat() functionality was duplicated between pixman-bits-image.c
and pixman-inlines.h
|
|
It is not really specific to pixman-fast-path.c.
|
|
There is no reason for pixman_image_create_bits() to check that the
image size fits in int32_t. The correct check is against size_t since
that is what the argument to calloc() is.
This patch fixes this by adding a new _pixman_multiply_overflows_size()
and using it in create_bits(). Also prepend an underscore to the names
of other similar functions since they are internal to pixman.
V2: Use int, not ssize_t for the arguments in create_bits() since
width/height are still limited to 32 bits, as pointed out by Chris
Wilson.
|
|
Some systems don't have the file, and the types are already defined in
pixman.h.
https://bugs.freedesktop.org//show_bug.cgi?id=37422
|
|
The same binary search from the previous commit can be used in this
function too.
V2: Remove check from loop that is not needed anymore, pointed out by
Andrea Canciani.
|
|
When someone selects some text in Firefox under a non-composited X
server and initiates a drag, a shaped window is created with a complex
shape corresponding to the outline of the text. Then, on every mouse
movement pixman_region_contains_rectangle() is called many times on
that complicated region. And pixman_region_contains_rectangle() is
doing a linear scan through the rectangles in the region, although the
scan does exit when it finds the first box that can't possibly
intersect the passed-in rectangle.
This patch changes the loop so that it uses a binary search to skip
boxes that don't overlap the current y position. The performance
improvement for the text dragging case is easily noticable.
V2: Use the binary search for the "getting up to speed or skippping
remainder of band" as well.
|
|
This test generates random regions and checks whether random boxes and
points are contained within them. The results are combined and a CRC32
value is computed and compared to a known-correct one.
|
|
The lcg_rand() function only returns 15 random bits, so lcg_rand_u32()
would always have 0 in bit 31 and bit 15. Fix that by calling
lcg_rand() three times, to generate 15, 15, and 2 random bits
respectively.
V2: Use the 10/11 most significant bits from the 3 lcg results and mix
them with the low ones from the adjacent one, as suggested by Andrea
Canciani.
|
|
This fast path is frequently used by cairo to do polygon rendering.
Existing NEON code generation framework is used.
|
|
Correct a typo reported by James Cloos and some reported by automatic
spellchecking.
Remove trailing whitespace.
|
|
More details in binutils bugtracker:
http://sourceware.org/bugzilla/show_bug.cgi?id=12931
The problem was encountered in the wild by Mozilla:
https://bugzilla.mozilla.org/show_bug.cgi?id=672787
|
|
The necessity is justified by a message in the pixman mailing list:
http://lists.freedesktop.org/archives/pixman/2011-July/001330.html
NONE repeat is not supported, but could be added by tweaking
the interpretation and making use of 'fully_transparent_src'
scanline function argument.
|
|
Add a comment to explain why the tests guarantee that the code always
computes the greatest valid root.
Rename "det" as "discr" to make it match the mathematical name
"discriminant".
Based on a patch by Jeff Muizelaar <jmuizelaar@mozilla.com>.
|
|
|
|
|
|
|
|
To avoid function call and other calculation overhead, extend source
scanline into temporary buffer when source width is too small.
Temporary buffer will be repeatedly accessed, so extension cost is
very small due to cache effect.
|
|
|
|
Now bilinear template support REPEAT_NORMAL, so functions for that
is added to PIXMAN_ARM_BIND_SCALED_BILINEAR_ macros. Fast path
entries are not enabled yet.
|
|
Now bilinear template support REPEAT_NORMAL, so declare composite
functions using it. Function is just declared not used yet.
|
|
The basic idea is to break down normal repeat into a set of
non-repeat scanline compositions and stitching them together.
Bilinear may interpolate last and first pixels of source scanline.
In this case, we can use temporary wrap around buffer.
|
|
By replacing boolean arguments with flags, the code can be more
readable and flags can be extended to do some more things later.
Currently following flags are defined.
FLAG_NONE
- No flags are turned on.
FLAG_HAVE_SOLID_MASK
- Template will generate solid mask composite functions.
FLAG_HAVE_NON_SOLID_MASK
- Template will generate bits mask composite functions.
FLAG_HAVE_SOLID_MASK and FLAG_NON_SOLID_MASK should be mutually
exclusive.
|
|
|
|
The first bug is that a vmull.u8 instruction would store its result in
the q1 register, clobbering the d2 register used later on. The second
is that a vraddhn instruction would overwrite d25, corrupting the q12
register used later.
Fixing the second bug caused a pipeline bubble where the d18 register
would be unavailable for a clock cycle. This is fixed by swapping the
instruction with its successor.
|
|
Move the eight most common formats to the top of the list of image
formats and make create_random_image() much more likely to select one
of those eight formats.
This should help catch more bugs in SIMD optimized operations.
|
|
Autoconf 2.86 reports:
warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
Every code fragment must be wrapped in [AC_LANG_SOURCE([...])]
|
|
This allows more information, such as flags or the composite region,
to be passed to the composite functions.
|
|
All the fast paths generally use these names as well.
|
|
The variables in question were dst_x, dst_y, dst_image. The majority
of _x and _y uses were already dest_x and dest_y, while the majority
of _image uses were dst_image.
|
|
|
|
|
|
|
|
|
|
Some equations have too much multiplication with alpha.
|
|
The iterator for sources where neither RGB nor ALPHA is needed, really
belongs in the noop implementation.
|