summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-09-10BILINEAR->NEAREST filter optimization for simple rotation and translationbilinear-reductionSiarhei Siamashka1-1/+38
Simple rotation and translation are the additional cases when BILINEAR filter can be safely reduced to NEAREST.
2011-09-10Strength-reduce BILINEAR filter to NEAREST filter for identity transformsSøren Sandmann Pedersen5-38/+62
An image with a bilinear filter and an identity transform is equivalent to one with a nearest filter, so there is no reason the standard fast paths shouldn't be usable. But because a BILINEAR filter samples a 2x2 pixel block in the source image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the source area is the entire image, because some compositing operations might then read pixels outside the image. This patch fixes the problem by splitting the FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip covers the samples taking into account NEAREST/BILINEAR filters respectively. All the existing compositing operations that require FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which filter they depend on. In compute_image_info() both COVER_CILP_NEAREST and COVER_CLIP_BILINEAR can be set depending on how much room there is around the clip rectangle. Finally, images with an identity transform and a bilinear filter get FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER. Performance measurementas with render_bench against Xephyr: Before *** ROUND 1 *** --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 5.720 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 5.149 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.237 sec. After: *** ROUND 1 *** --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 4.947 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 4.487 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.235 sec.
2011-09-10test: Occasionally use a BILINEAR filter in blitters-testSøren Sandmann Pedersen1-1/+4
To test that reductions of BILINEAR->NEAREST for identity transformations happen correctly, occasionally use a bilinear filter in blitters test.
2011-09-10test: better coverage for BILINEAR->NEAREST filter optimizationSiarhei Siamashka1-6/+30
The upcoming optimization which is going to be able to replace BILINEAR filter with NEAREST where appropriate needs to analyze the transformation matrix and not to make any mistakes. The changes to affine-test include: 1. Higher chance of using the same scale factor for x and y axes. This can help to stress some special cases (for example the case when both x and y scale factors are integer). The same applies to x/y translation. 2. Introduced a small chance for "corrupting" transformation matrix by flipping random bits. This supposedly can help to identify the cases when some of the fast paths or other code logic is wrongly activated due to insufficient checks.
2011-09-10Eliminate compute_sample_extents() functionSøren Sandmann Pedersen1-58/+42
In analyze_extents(), instead of calling compute_sample_extents() call compute_transformed_extents() and inline the remaining part of compute_sample_extents(). The upcoming bilinear->nearest optimization will do something different with these two pieces of code.
2011-09-10Split computation of sample area into own functionSøren Sandmann Pedersen1-62/+76
compute_sample_extents() have two parts: one that computes the transformed extents, and one that checks whether the computed extents fit within the 16.16 coordinate space. Split the first part into its own function compute_transformed_extents().
2011-09-10Remove x and y coordinates from analyze_extents() and compute_sample_extents()Søren Sandmann Pedersen1-26/+37
These coordinates were only ever used for subtracting from the extents box to put it into the coordinate space of the image, so we might as well do this coordinate translation only once before entering the functions.
2011-09-09Post-release version bump to 0.23.5Søren Sandmann Pedersen1-1/+1
2011-09-09Pre-release version bump to 0.23.4Søren Sandmann Pedersen1-1/+1
2011-09-09bits: optimise fetching width==1 repeatsChris Wilson1-14/+44
Profiling ign.com, 20% of the entire render time was absorbed in this single operation: << /content //COLOR_ALPHA /width 480 /height 800 >> surface context << /width 1 /height 677 /format //ARGB32 /source <|!!!@jGb!m5gD']#$jFHGWtZcK&2i)Up=!TuR9`G<8;ZQp[FQk;emL9ibhbEL&NTh-j63LhHo$E=mSG,0p71`cRJHcget4%<S\X+~> >> image pattern //EXTEND_REPEAT set-extend set-source n 0 0 480 677 rectangle fill+ pop which is a simple composition of a single pixel wide image. Sadly this is a workaround for lack of independent repeat-x/y handling in cairo and pixman. Worse still is that the worst-case behaviour of the general repeat path is for width 1 images... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-07ARM: NEON better instruction scheduling of over_n_8888Taekyun Kim1-5/+48
New head, tail, tail/head blocks are added and instructions are reordered to eliminate pipeline stalls Performance numbers of before/after - cortex a8 - before : L1: 375.39 L2: 391.93 M:114.39 ( 40.99%) HT: 99.37 VT: 98.20 R: 90.24 RT: 32.87 ( 240Kops/s) after : L1: 481.90 L2: 483.46 M:114.29 ( 40.69%) HT:106.91 VT: 93.38 R: 90.74 RT: 29.51 ( 236Kops/s) - cortex a9 - before : L1: 324.50 L2: 332.79 M:155.55 ( 47.51%) HT:111.93 VT: 93.58 R: 71.92 RT: 28.21 ( 233Kops/s) after : L1: 355.87 L2: 364.49 M:156.90 ( 47.59%) HT:111.52 VT: 91.76 R: 72.16 RT: 28.22 ( 234Kops/s)
2011-09-07ARM: NEON better instruction scheduling of over_n_8_8888Taekyun Kim1-26/+60
tail/head block is expanded and reordered to eliminate stalls Performance numbers of before/after - cortex a8 - before : L1: 201.35 L2: 190.48 M:101.94 ( 54.85%) HT: 78.41 VT: 63.83 R: 58.25 RT: 21.74 ( 191Kops/s) after : L1: 257.65 L2: 255.49 M:102.04 ( 55.33%) HT: 79.19 VT: 65.46 R: 59.23 RT: 21.12 ( 189Kops/s) - cortex a9 - before : L1: 157.35 L2: 159.81 M:133.00 ( 60.94%) HT: 82.44 VT: 63.64 R: 51.66 RT: 19.15 ( 179Kops/s) after : L1: 216.83 L2: 219.40 M:135.83 ( 61.80%) HT: 85.60 VT: 64.80 R: 52.23 RT: 19.16 ( 179Kops/s)
2011-08-29Workaround bug in llvm-gccAndrea Canciani1-0/+4
llvm-gcc (shipped in Apple XCode 4.1.1 as the default compiler or in the 2.9 release of LLVM) performs an invalid optimization which unifies the empty_region and the bad_region structures because they have the same content. A bugreport has been filed against Apple Developers Tool for this issue. This commit works around this bug by making one of the two structures volatile, so that it cannot be merged. Fixes region-contains-test.
2011-08-29win32: Build benchmarksAndrea Canciani2-5/+6
Add the makefile rules needed to compile lowlevel-blt-bench on win32 and fix the compilation errors.
2011-08-19Move bilinear interpolation to pixman-inlines.hSøren Sandmann Pedersen2-91/+91
2011-08-19Use repeat() function from pixman-inlines.h in pixman-bits-image.cSøren Sandmann Pedersen1-42/+15
The repeat() functionality was duplicated between pixman-bits-image.c and pixman-inlines.h
2011-08-19Rename pixman-fast-path.h to pixman-inlines.hSøren Sandmann Pedersen8-7/+7
It is not really specific to pixman-fast-path.c.
2011-08-15In pixman_image_create_bits() allow images larger than 2GBSøren Sandmann Pedersen3-11/+18
There is no reason for pixman_image_create_bits() to check that the image size fits in int32_t. The correct check is against size_t since that is what the argument to calloc() is. This patch fixes this by adding a new _pixman_multiply_overflows_size() and using it in create_bits(). Also prepend an underscore to the names of other similar functions since they are internal to pixman. V2: Use int, not ssize_t for the arguments in create_bits() since width/height are still limited to 32 bits, as pointed out by Chris Wilson.
2011-08-11Don't include stdint.h in lowlevel-blt-bench.cSøren Sandmann Pedersen1-1/+0
Some systems don't have the file, and the types are already defined in pixman.h. https://bugs.freedesktop.org//show_bug.cgi?id=37422
2011-08-11Use find_box_for_y() in pixman_region_contains_point() tooSøren Sandmann Pedersen1-6/+6
The same binary search from the previous commit can be used in this function too. V2: Remove check from loop that is not needed anymore, pointed out by Andrea Canciani.
2011-08-11Speed up pixman_region{,32}_contains_rectangle()Søren Sandmann Pedersen1-6/+42
When someone selects some text in Firefox under a non-composited X server and initiates a drag, a shaped window is created with a complex shape corresponding to the outline of the text. Then, on every mouse movement pixman_region_contains_rectangle() is called many times on that complicated region. And pixman_region_contains_rectangle() is doing a linear scan through the rectangles in the region, although the scan does exit when it finds the first box that can't possibly intersect the passed-in rectangle. This patch changes the loop so that it uses a binary search to skip boxes that don't overlap the current y position. The performance improvement for the text dragging case is easily noticable. V2: Use the binary search for the "getting up to speed or skippping remainder of band" as well.
2011-08-11New test of pixman_region_contains_{rectangle,point}Søren Sandmann Pedersen4-2/+184
This test generates random regions and checks whether random boxes and points are contained within them. The results are combined and a CRC32 value is computed and compared to a known-correct one.
2011-08-11Fix lcg_rand_u32() to return 32 random bits.Søren Sandmann Pedersen1-4/+8
The lcg_rand() function only returns 15 random bits, so lcg_rand_u32() would always have 0 in bit 31 and bit 15. Fix that by calling lcg_rand() three times, to generate 15, 15, and 2 random bits respectively. V2: Use the 10/11 most significant bits from the 3 lcg results and mix them with the low ones from the adjacent one, as suggested by Andrea Canciani.
2011-08-04ARM NEON: Standard fast path out_reverse_8_8888Taekyun Kim2-0/+54
This fast path is frequently used by cairo to do polygon rendering. Existing NEON code generation framework is used.
2011-07-29radial: Fix typos and trailing whitespaceAndrea Canciani1-8/+7
Correct a typo reported by James Cloos and some reported by automatic spellchecking. Remove trailing whitespace.
2011-07-27ARM: workaround binutils bug #12931 (code sections alignment)Siarhei Siamashka3-0/+3
More details in binutils bugtracker: http://sourceware.org/bugzilla/show_bug.cgi?id=12931 The problem was encountered in the wild by Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=672787
2011-07-22C fast path for scaled src_x888_8888 with nearest filterSiarhei Siamashka3-0/+12
The necessity is justified by a message in the pixman mailing list: http://lists.freedesktop.org/archives/pixman/2011-July/001330.html NONE repeat is not supported, but could be added by tweaking the interpretation and making use of 'fully_transparent_src' scanline function argument.
2011-07-15radial: Improve documentation and namingAndrea Canciani1-10/+21
Add a comment to explain why the tests guarantee that the code always computes the greatest valid root. Rename "det" as "discr" to make it match the mathematical name "discriminant". Based on a patch by Jeff Muizelaar <jmuizelaar@mozilla.com>.
2011-07-04Makefile.am: Add pixman@lists.freedesktop.org to RELEASE_ANNOUNCE_LISTSøren Sandmann Pedersen1-1/+1
2011-07-04Post-release version bump to 0.23.3Søren Sandmann Pedersen1-1/+1
2011-07-04Pre-release version bump to 0.23.2Søren Sandmann Pedersen1-1/+1
2011-06-28Bilinear REPEAT_NORMAL source line extension for too short src_widthTaekyun Kim1-3/+47
To avoid function call and other calculation overhead, extend source scanline into temporary buffer when source width is too small. Temporary buffer will be repeatedly accessed, so extension cost is very small due to cache effect.
2011-06-28Enable REPEAT_NORMAL bilinear fast path entriesTaekyun Kim1-3/+39
2011-06-28ARM: Add REPEAT_NORMAL functions to bilinear BIND macrosTaekyun Kim1-1/+10
Now bilinear template support REPEAT_NORMAL, so functions for that is added to PIXMAN_ARM_BIND_SCALED_BILINEAR_ macros. Fast path entries are not enabled yet.
2011-06-28sse2: Declare bilinear src_8888_8888 REPEAT_NORMAL composite functionTaekyun Kim1-0/+5
Now bilinear template support REPEAT_NORMAL, so declare composite functions using it. Function is just declared not used yet.
2011-06-28REPEAT_NORMAL support for bilinear fast path templateTaekyun Kim1-0/+90
The basic idea is to break down normal repeat into a set of non-repeat scanline compositions and stitching them together. Bilinear may interpolate last and first pixels of source scanline. In this case, we can use temporary wrap around buffer.
2011-06-28Replace boolean arguments with flags for bilinear fast path templateTaekyun Kim3-30/+49
By replacing boolean arguments with flags, the code can be more readable and flags can be extended to do some more things later. Currently following flags are defined. FLAG_NONE - No flags are turned on. FLAG_HAVE_SOLID_MASK - Template will generate solid mask composite functions. FLAG_HAVE_NON_SOLID_MASK - Template will generate bits mask composite functions. FLAG_HAVE_SOLID_MASK and FLAG_NON_SOLID_MASK should be mutually exclusive.
2011-06-25test: Make fuzzer-find-diff.pl executableSøren Sandmann1-0/+0
2011-06-25ARM: Fix two bugs in neon_composite_over_n_8888_0565_ca().Søren Sandmann1-4/+4
The first bug is that a vmull.u8 instruction would store its result in the q1 register, clobbering the d2 register used later on. The second is that a vraddhn instruction would overwrite d25, corrupting the q12 register used later. Fixing the second bug caused a pipeline bubble where the d18 register would be unavailable for a clock cycle. This is fixed by swapping the instruction with its successor.
2011-06-25blitters-test: Make common formats more likely to be tested.Søren Sandmann Pedersen1-8/+14
Move the eight most common formats to the top of the list of image formats and make create_random_image() much more likely to select one of those eight formats. This should help catch more bugs in SIMD optimized operations.
2011-06-23Silence autoconf warningsAndrea Canciani1-20/+20
Autoconf 2.86 reports: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body Every code fragment must be wrapped in [AC_LANG_SOURCE([...])]
2011-06-20Replace argumentxs to composite functions with a pointer to a structSøren Sandmann Pedersen11-1041/+278
This allows more information, such as flags or the composite region, to be passed to the composite functions.
2011-06-12In pixman-general.c rename image_parameters to {src, mask, dest}_imageSøren Sandmann Pedersen1-17/+16
All the fast paths generally use these names as well.
2011-06-12Replace instances of "dst_*" with "dest_*"Søren Sandmann Pedersen11-275/+275
The variables in question were dst_x, dst_y, dst_image. The majority of _x and _y uses were already dest_x and dest_y, while the majority of _image uses were dst_image.
2011-05-31demos: Comment out some unused variablesSøren Sandmann2-1/+7
2011-05-31sse2: Delete some unused variablesSøren Sandmann1-14/+4
2011-05-31mmx: Delete some unused variablesSøren Sandmann1-14/+3
2011-05-29Include noop in win32 buildsAndrea Canciani1-0/+1
2011-05-24Fix a few typos in pixman-combine.c.templateNis Martensen1-4/+3
Some equations have too much multiplication with alpha.
2011-05-19Move NOP src iterator into noop implementation.Søren Sandmann Pedersen2-9/+6
The iterator for sources where neither RGB nor ALPHA is needed, really belongs in the noop implementation.