summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-03-11demos/scale: Add good/best filter typesspitzak14Bill Spitzak1-5/+7
Allows testing them. Good is the default to match default behavior of pixman/cairo at startup. v14: Locked axis put in it's own commit Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-image: Implement PIXMAN_FILTER_GOOD/BEST as separable convolutionsBill Spitzak3-7/+206
In my opinion the low level code has to be in control of the filtering. This is necessary to allow hardware implementations, to allow developement of new filtering algorithms, and to allow optimization based on knowledge of the exact filter being used. This implements the same GOOD/BEST as Cairo, with minor improvements. The GOOD will produce exactly the same BILINEAR result as before for any scale greater than 3/4, or for scale of 1/2 with no rotation and an integer translation. This means the output is unchanged for most current users of GOOD. GOOD uses: scale < 1/16 : BOX.BOX at size 16 scale < 3/4 : BOX.BOX at size 1/scale larger : BOX.BOX at size 1 If both directions have a scale >= 3/4 or a scale of 1/2 and an integer translation, the faster PIXMAN_FILTER_BILINEAR code is used. This is compatable at these scales with older versions of pixman where bilinear was always used for GOOD. BEST uses: scale < 1/24 : BOX.BOX at size 24 scale < 1/16 : BOX.BOX at size 1/scale scale < 1 : IMPULSE.LANCZOS2 at size 1/scale scale < 2.333 : IMPULSE.LANCZOS2 at size 1 scale < 128 : BOX.LANCZOS2 at size 1/(scale-1) (antialiased square pixels) larger : BOX.LANCZOS2 at size 1/127 (antialias blur gets thicker) v8: Cutoff in BEST between IMPULSE.LANCZOS2 and BOX.LANCZOS2 adjusted for a better match between the filters. v9: Fixed divide-by-zero from all-zero matrix found by stress-test v11: Whitespace and formatting fixes Moved demo changes to a later patch v12: Whitespace and formatting fixes v14: Compute subsample bits here Works for non-affine and when fast paths are disabled Fixed big memory leak of array for BEST Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-image: Detect all 8 transforms that can do nearest filterBill Spitzak1-62/+57
This patch consolidates the examination of the matrix into one place, and detects the reflected versions of the transforms that can be done with nearest filter. v14: Split this code from the GOOD/BEST as I think it will be accepted. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: Add description to pixman_filter_create_separable_convolution()Bill Spitzak1-0/+51
v9: Described arguments and more filter combinations, fixed some errors. v11: Further correction, in particular replaced "scale" with "size" Signed-off-by: Bill Spitzak <spitzak@gmail.com> Acked-by: Oded Gabbay <oded.gabbay@redhat.com>
2016-03-11pixman-filter: Made Gaussian a bit widerBill Spitzak1-1/+1
Expanded the size slightly (from ~4.25 to 5) to make the cutoff less noticable. Previouly the value at the cutoff was gaussian_filter(sqrt(2)*3/2) = 0.00626 which is larger than the difference between 8-bit pixels (1/255 = 0.003921). New cutoff is gaussian_filter(2.5) = 0.001089 which is smaller. v11: added some math to commit message v14: left SIGMA in there Signed-off-by: Bill Spitzak <spitzak@gmail.com> Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-03-11pixman-filter: Nested polynomial for cubicBill Spitzak1-6/+8
v11: Restored range checks Signed-off-by: Bill Spitzak <spitzak@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-03-11pixman-filter: distribute normalization error over filterBill Spitzak1-1/+11
This removes a high-frequency spike in the middle of some filters that is caused by math errors all being in the same direction. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: Use array index instead of pointersBill Spitzak1-11/+9
This removes some confusion and at least one bug: the error on a 1-wide filter is now added to the sample, rather than after the end of the filter. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: Do BOX.BOX much fasterBill Spitzak1-0/+5
The desired result from the integration is directly available, as the range has been clipped to the intersection of the two boxes. As this filter is probably the most common one, this optimization looks very useful. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: integral splitting is only needed for triangle filterBill Spitzak1-2/+2
Only the triangle is discontinuous at 0. The other filters resemble a cubic closely enough that Simpsons integration works without splitting. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: fix subsample_bits == 0Bill Spitzak1-1/+4
The position of only one subsample was wrong as ceil() was done on an integer. Use a different function for all odd numbers of subsamples that gets this right. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: Directly calculate normalized valuesBill Spitzak1-4/+4
Fix the nice filter and the integral to directly compute normalized values. This makes it easier to test the normalization as it can be commented out and still get usable results. Renormalization is still necessary as there are sufficient math errors to need it. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: Correct integration with impulse filtersBill Spitzak1-62/+49
The IMPULSE special-cases did not sample the center of the of the region. This caused it to sample the filters outside their range, and produce assymetric filters and other errors. Fixing this required changing the arguments to integral() so the correct point could be determined. Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11pixman-filter: Correct Simpsons integrationBill Spitzak1-6/+15
Simpsons uses cubic curve fitting, with 3 samples defining each cubic. This makes the weights of the samples be in a pattern of 1,4,2,4,2...4,1, and then dividing the result by 3. The previous code was using weights of 1,2,0,6,0,6...,2,1. With this fix the integration is accurate enough that the number of samples could be reduced a lot. Multiples of 12 seem to work best. v7: Merged with patch to reduce from 128 samples to 16 v9: Changed samples from 16 to 12 v10: Fixed rebase error that made it not compile v11: minor whitespace change v14: more whitespace changes Signed-off-by: Bill Spitzak <spitzak@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-03-11pixman-filter: rename "scale" to "size" when it is 1/scaleBill Spitzak2-12/+12
This is to remove some confusion when reading the code. "scale" gets larger as the picture gets larger, while "size" (ie the size of the filter) gets smaller. v14: Removed changes to integral function Signed-off-by: Bill Spitzak <spitzak@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-03-11pixman-filter: reduce amount of malloc/free/memcpy to generate filterBill Spitzak1-34/+23
Rearranged so that the entire block of memory for the filter pair is allocated first, and then filled in. Previous version allocated and freed two temporary buffers for each filter and did an extra memcpy. v8: small refactor to remove the filter_width function v10: Restored filter_width function but with arguments changed to match later patches v11: Removed unused arg and pointer from filter_width function Whitespace fixes. Signed-off-by: Bill Spitzak <spitzak@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-03-11pixman-image: Added enable-gnuplot config to view filters in gnuplotBill Spitzak2-0/+58
If enable-gnuplot is configured, then you can pipe the output of a pixman-using program to gnuplot and get a continuously-updated plot of the horizontal filter. This works well with demos/scale to test the filter generation. The plot is all the different subposition filters shuffled together. This is misleading in a few cases: IMPULSE.BOX - goes up and down as the subfilters have different numbers of non-zero samples IMPULSE.TRIANGLE - somewhat crooked for the same reason 1-wide filters - looks triangular, but a 1-wide box would be more accurate v7: First time this ability was included v8: Use config option Moved code to the filter generator Modified scale demo to not call filter generator a second time. v10: Only print if successful generation of plots Use #ifdef, not #if v11: small whitespace fixes Signed-off-by: Bill Spitzak <spitzak@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-03-11demos/scale: Default to locked axisBill Spitzak1-0/+1
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11demos/scale: fix blank subsamples spin boxBill Spitzak1-0/+1
It now shows the initial value of 4 when the demo is started Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11demos/scale: Only generate filters when used for separableBill Spitzak1-10/+19
This makes the speed of the demo more accurate, as the filter generation is a visible fraction of the time it takes to do a transform. This also prevents the output of unused filters in the gnuplot option in the next patch. Note this is not dependent on other patches, as use can choose linear and bilinear in the existing version. Signed-off-by: Bill Spitzak <spitzak@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-03-11demos/scale: Added pulldown to choose PIXMAN_FILTER_* valueBill Spitzak2-11/+41
This is very useful for comparing the results of SEPARABLE_CONVOLUTION with BILINEAR and NEAREST. v14: Removed good/best items Signed-off-by: Bill Spitzak <spitzak@gmail.com>
2016-03-11demos/scale: Compute filter size using boundary of xformed ellipse, not ↵Bill Spitzak1-41/+61
rectangle This is much more accurate and less blurry. In particular the filtering does not change as the image is rotated. Signed-off-by: Bill Spitzak <spitzak@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Soren Sandmann <soren.sandmann@gmail.com>
2016-01-31pixman-private: include <float.h> only in C codeThomas Petazzoni1-2/+1
<float.h> is included unconditionally by pixman-private.h, which in turn gets included by assembler files. Unfortunately, with certain C libraries (like the musl C library), <float.h> cannot be included in assembler files: CCLD libpixman-arm-simd.la /home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h: Assembler messages: /home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h:8: Error: bad instruction `int __flt_rounds(void)' /home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h: Assembler messages: /home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h:8: Error: bad instruction `int __flt_rounds(void)' It turns out however that <float.h> is not needed by assembly files, so we move its inclusion within the #ifndef __ASSEMBLER__ condition, which solves the problem. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-12-30build: Distinguish SKIP and FAIL on Win32Andrea Canciani1-11/+20
The `check` target in test/Makefile.win32 assumed that any non-0 exit code from the tests was an error, but the testsuite is currently using 77 as a SKIP exit code (based on the convention used in autotools). Fixes fence-image-self-test and cover-test (now reported as SKIP). Signed-off-by: Andrea Canciani <ranma42@gmail.com> Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-23build: Use `del` instead of `rm` on `cmd.exe` shellsSimon Richter1-2/+6
The `rm` command is not usually available when running on Win32 in a `cmd.exe` shell. Instead the shell provides the `del` builtin, which has somewhat more limited wildcars expansion and error handling. This makes all of the Makefile targets work on Win32 both using `cmd.exe` and using the MSYS environment. Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> Signed-off-by: Andrea Canciani <ranma42@gmail.com> Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-23build: Do not use `mkdir -p` on WindowsAndrea Canciani1-2/+3
When the build is performed using `cmd.exe` as shell, the `mkdir` command does not support the `-p` flag. The ability to create multiple netsted folder is not used, hence it can be easily replaced by only creating the directory if it does not exist. This makes the build work on the `cmd.exe` shell, except for the `clean` targets. Signed-off-by: Andrea Canciani <ranma42@gmail.com> Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-23build: Avoid phony `pixman` target in test/Makefile.win32Andrea Canciani1-6/+4
Instead of explicitly depending on "pixman" for the "all" and "check" targets, rely on the dependency to the .lib file Signed-off-by: Andrea Canciani <ranma42@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-23build: Remove use of BUILT_SOURCES from Makefile.win32Andrea Canciani1-1/+1
Since 3d81d89c292058522cce91338028d9b4c4a23c24 BUILT_SOURCES is not used anymore, but it was unintentionally left in Win32 Makefiles. Signed-off-by: Andrea Canciani <ranma42@gmail.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-23Post 0.34 branch creation version bump to 0.35.1Oded Gabbay1-2/+2
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-22Post-release version bump to 0.33.7Oded Gabbay1-1/+1
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-22Pre-release version bump to 0.33.6Oded Gabbay1-1/+1
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-12-22configura.ac: fix test for SSE2 & SSSE3 assembler supportOded Gabbay1-4/+6
This patch modifies the SSE2 & SSSE3 tests in configure.ac to use a global variable to initialize vector variables. In addition, we now return the value of the computation instead of 0. This is done so gcc 4.9 (and lower) won't optimize the SSE assembly instructions (when using -O1 and higher), because then the configure test might incorrectly pass even though the assembler doesn't support the SSE instructions (the test will pass because the compiler does support the intrinsics). v2: instead of using volatile, use a global variable as input Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-11-18mmx: Improve detection of support for "K" constraintAndrea Canciani2-21/+18
Older versions of clang emitted an error on the "K" constraint, but at least since version 3.7 it is supported. Just like gcc, this constraint is only allowed for constants, but apparently clang requires them to be known before inlining. Using the macro definition _mm_shuffle_pi16(A, N) ensures that the "K" constraint is always applied to a literal constant, independently from the compiler optimizations and allows building pixman-mmx on modern clang. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrea Canciani <ranma42@gmail.com>
2015-11-18Revert "mmx: Use MMX2 intrinsics from xmmintrin.h directly."Matt Turner2-8/+71
This reverts commit 7de61d8d14e84623b6fa46506eb74f938287f536. Newer versions of gcc allow inclusion of xmmintrin.h without -msse, but still won't allow usage of the intrinsics. Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=564024
2015-10-23Post-release version bump to 0.33.5Oded Gabbay1-1/+1
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-10-23Pre-release version bump to 0.33.4Oded Gabbay1-1/+1
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-10-16test: Fix fence-image-self-test on MacAndrea Canciani2-8/+10
On MacOS X, according to the manpage of mprotect(), "When a program violates the protections of a page, it gets a SIGBUS or SIGSEGV signal.", but fence-image-self-test was only accepting a SIGSEGV as notification of invalid access. Fixes fence-image-self-test Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-10-13mmx: Use MMX2 intrinsics from xmmintrin.h directly.Matt Turner2-71/+8
We had lots of hacks to handle the inability to include xmmintrin.h without compiling with -msse (lest SSE instructions be used in pixman-mmx.c). Some recent version of gcc relaxed this restriction. Change configure.ac to test that xmmintrin.h can be included and that we can use some intrinsics from it, and remove the work-around code from pixman-mmx.c. Evidently allows gcc 4.9.3 to optimize better as well: text data bss dec hex filename 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Signed-off-by: Matt Turner <mattst88@gmail.com>
2015-09-29vmx: implement fast path vmx_composite_over_n_8888Siarhei Siamashka1-0/+54
Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz, Gentoo ppc (32-bit userland) gave the following results: before: over_n_8888 = L1: 147.47 L2: 205.86 M:121.07 after: over_n_8888 = L1: 287.27 L2: 261.09 M:133.48 Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores: ocitysmap 659.69 -> 611.71 : 1.08x speedup xfce4-terminal-a1 2725.22 -> 2547.47 : 1.07x speedup Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-09-25affine-bench: remove 8e margin from COVER areaBen Avison1-6/+18
Patch "Remove the 8e extra safety margin in COVER_CLIP analysis" reduced the required image area for setting the COVER flags in pixman.c:analyze_extent(). Do the same reduction in affine-bench. Leaving the old calculations in place would be very confusing for anyone reading the code. Also add a comment that explains how affine-bench wants to hit the COVER paths. This explains why the intricate extent calculations are copied from pixman.c. [Pekka: split patch, change comments, write commit message] Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-09-25Remove the 8e extra safety margin in COVER_CLIP analysisBen Avison1-13/+4
As discussed in http://lists.freedesktop.org/archives/pixman/2015-August/003905.html the 8 * pixman_fixed_e (8e) adjustment which was applied to the transformed coordinates is a legacy of rounding errors which used to occur in old versions of Pixman, but which no longer apply. For any affine transform, you are now guaranteed to get the same result by transforming the upper coordinate as though you transform the lower coordinate and add (size-1) steps of the increment in source coordinate space. No projective transform routines use the COVER_CLIP flags, so they cannot be affected. Proof by Siarhei Siamashka: Let's take a look at the following affine transformation matrix (with 16.16 fixed point values) and two vectors: | a b c | M = | d e f | | 0 0 0x10000 | | x_dst | P = | y_dst | | 0x10000 | | 0x10000 | ONE_X = | 0 | | 0 | The current matrix multiplication code does the following calculations: | (a * x_dst + b * y_dst + 0x8000) / 0x10000 + c | M * P = | (d * x_dst + e * y_dst + 0x8000) / 0x10000 + f | | 0x10000 | These calculations are not perfectly exact and we may get rounding because the integer coordinates are adjusted by 0.5 (or 0x8000 in the 16.16 fixed point format) before doing matrix multiplication. For example, if the 'a' coefficient is an odd number and 'b' is zero, then we are losing some of the least significant bits when dividing by 0x10000. So we need to strictly prove that the following expression is always true even though we have to deal with rounding: | a | M * (P + ONE_X) - M * P = M * ONE_X = | d | | 0 | or ((a * (x_dst + 0x10000) + b * y_dst + 0x8000) / 0x10000 + c) - ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c) = a It's easy to see that this is equivalent to a + ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c) - ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c) = a Which means that stepping exactly by one pixel horizontally in the destination image space (advancing 'x_dst' by 0x10000) is the same as changing the transformed 'x_src' coordinate in the source image space exactly by 'a'. The same applies to the vertical direction too. Repeating these steps, we can reach any pixel in the source image space and get exactly the same fixed point coordinates as doing matrix multiplications per each pixel. By the way, the older matrix multiplication implementation, which was relying on less accurate calculations with three intermediate roundings "((a + 0x8000) >> 16) + ((b + 0x8000) >> 16) + ((c + 0x8000) >> 16)", also has the same properties. However reverting http://cgit.freedesktop.org/pixman/commit/?id=ed39992564beefe6b12f81e842caba11aff98a9c and applying this "Remove the 8e extra safety margin in COVER_CLIP analysis" patch makes the cover test fail. The real reason why it fails is that the old pixman code was using "pixman_transform_point_3d()" function http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n49 for getting the transformed coordinate of the top left corner pixel in the image scaling code, but at the same time using a different "pixman_transform_point()" function http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n82 in the extents calculation code for setting the cover flag. And these functions did the intermediate rounding differently. That's why the 8e safety margin was needed. ** proof ends However, for COVER_CLIP_NEAREST, the actual margins added were not 8e. Because the half-way cases round down, that is, coordinate 0 hits pixel index -1 while coordinate e hits pixel index 0, the extra safety margins were actually 7e to the left and up, and 9e to the right and down. This patch removes the 7e and 9e margins and restores the -e adjustment required for NEAREST sampling in Pixman. For reference, see pixman/rounding.txt. For COVER_CLIP_BILINEAR, the margins were exactly 8e as there are no additional offsets to be restored, so simply removing the 8e additions is enough. Proof: All implementations must give the same numerical results as bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear(). The former does int x0 = pixman_fixed_to_int (x - pixman_fixed_e); which maps directly to the new test for the nearest flag, when you consider that x0 must fall in the interval [0,width). The latter does x1 = x - pixman_fixed_1 / 2; x1 = pixman_fixed_to_int (x1); x2 = x1 + 1; When you write a COVER path, you take advantage of the assumption that both x1 and x2 fall in the interval [0, width). As samplers are allowed to fetch the pixel at x2 unconditionally, we require x1 >= 0 x2 < width so x - pixman_fixed_1 / 2 >= 0 x - pixman_fixed_1 / 2 + pixman_fixed_1 < width * pixman_fixed_1 so pixman_fixed_to_int (x - pixman_fixed_1 / 2) >= 0 pixman_fixed_to_int (x + pixman_fixed_1 / 2) < width which matches the source code lines for the bilinear case, once you delete the lines that add the 8e margin. Signed-off-by: Ben Avison <bavison@riscosopen.org> [Pekka: adjusted commit message, left affine-bench changes for another patch] [Pekka: add commit message parts from Siarhei] Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-09-25pixman-general: Tighten up calculation of temporary buffer sizesBen Avison1-2/+2
Each of the aligns can only add a maximum of 15 bytes to the space requirement. This permits some edge cases to use the stack buffer where previously it would have deduced that a heap buffer was required. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-09-22pixman-general: Fix stack related pointer arithmetic overflowSiarhei Siamashka1-9/+7
As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains, the stack is allocated at the very top of the process address space in some configurations (32-bit x86 systems with ASLR disabled). And the careless computations done with the 'dest_buffer' pointer may overflow, failing the buffer upper limit check. The problem can be reproduced using the 'stress-test' program, which segfaults when executed via setarch: export CFLAGS="-O2 -m32" && ./autogen.sh ./configure --disable-libpng --disable-gtk && make setarch i686 -R test/stress-test This patch introduces the required corrections. The extra check for negative 'width' may be redundant (the invalid 'width' value is not supposed to reach here), but it's better to play safe when dealing with the buffers allocated on stack. Reported-by: Ludovic Courtès <ludo@gnu.org> Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> Reviewed-by: soren.sandmann@gmail.com Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-09-20test: add a check for FE_DIVBYZEROThomas Petazzoni2-0/+7
Some architectures, such as Microblaze and Nios2, currently do not implement FE_DIVBYZERO, even though they have <fenv.h> and feenableexcept(). This commit adds a configure.ac check to verify whether FE_DIVBYZERO is defined or not, and if not, disables the problematic code in test/utils.c. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-09-18vmx: Remove unused expensive functionsOded Gabbay1-196/+0
Now that we replaced the expensive functions with better performing alternatives, we should remove them so they will not be used again. Running Cairo benchmark on trimmed traces gave the following results: POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le. Speedups ======== t-firefox-scrolling 1232.30 -> 1096.55 : 1.12x t-gnome-terminal-vim 613.86 -> 553.10 : 1.11x t-evolution 405.54 -> 371.02 : 1.09x t-firefox-talos-gfx 919.31 -> 862.27 : 1.07x t-gvim 653.02 -> 616.85 : 1.06x t-firefox-canvas-alpha 941.29 -> 890.42 : 1.06x Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-09-18vmx: implement fast path vmx_composite_over_n_8_8888Oded Gabbay1-0/+111
POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le. reference memcpy speed = 25008.9MB/s (6252.2MP/s for 32bpp fills) Before After Change --------------------------------------------- L1 91.32 182.84 +100.22% L2 94.94 182.83 +92.57% M 95.55 181.51 +89.96% HT 88.96 162.09 +82.21% VT 87.4 168.35 +92.62% R 83.37 146.23 +75.40% RT 66.4 91.5 +37.80% Kops/s 683 859 +25.77% Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-09-18vmx: optimize vmx_composite_over_n_8888_8888_caOded Gabbay1-31/+21
This patch optimizes vmx_composite_over_n_8888_8888_ca by removing use of expand_alpha_1x128, unpack/pack and in_over_2x128 in favor of splat_alpha, in_over and MUL/ADD macros from pixman_combine32.h. Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le gave the following results: reference memcpy speed = 23475.4MB/s (5868.8MP/s for 32bpp fills) Before After Change -------------------------------------------- L1 244.97 474.05 +93.51% L2 243.74 473.05 +94.08% M 243.29 467.16 +92.02% HT 144.03 252.79 +75.51% VT 174.24 279.03 +60.14% R 109.86 149.98 +36.52% RT 47.96 53.18 +10.88% Kops/s 524 576 +9.92% Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-09-18vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVEROded Gabbay1-62/+17
This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all the functions it calls (combine1, combine4 and core_combine_over_u_pixel_vmx). The optimization is done by removing use of expand_alpha_1x128 and expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from pixman_combine32.h. Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le gave the following results: reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills) Before After Change -------------------------------------------- L1 182.05 210.22 +15.47% L2 180.6 208.92 +15.68% M 180.52 208.22 +15.34% HT 130.17 178.97 +37.49% VT 145.82 184.22 +26.33% R 104.51 129.38 +23.80% RT 48.3 61.54 +27.41% Kops/s 430 504 +17.21% v2: Check *pm is not NULL before dereferencing it in combine1() Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-09-17armv6: enable over_n_8888Pekka Paalanen1-5/+4
Enable the fast path added in the previous patch by moving the lookup table entries to their proper locations. Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the effect of adding this one patch on top of "armv6: Add over_n_8888 fast path (disabled)", which was applied on fd595692941f3d9ddea8934462bd1d18aed07c65. Before After Mean StdDev Mean StdDev Confidence Change L1 12.5 0.04 45.2 0.10 100.00% +263.1% L2 11.1 0.02 43.2 0.03 100.00% +289.3% M 9.4 0.00 42.4 0.02 100.00% +351.7% HT 8.5 0.02 25.4 0.10 100.00% +198.8% VT 8.4 0.02 22.3 0.07 100.00% +167.0% R 8.2 0.02 23.1 0.09 100.00% +183.6% RT 5.4 0.05 11.4 0.21 100.00% +110.3% At most 3 outliers rejected per test per set. Iterating here means that lowlevel-blt-bench was executed 30 times, and the statistics above were computed from the output. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-09-17armv6: Add over_n_8888 fast path (disabled)Ben Avison2-0/+48
This new fast path is initially disabled by putting the entries in the lookup table after the sentinel. The compiler cannot tell the new code is not used, so it cannot eliminate the code. Also the lookup table size will include the new fast path. When the follow-up patch then enables the new fast path, the binary layout (alignments, size, etc.) will stay the same compared to the disabled case. Keeping the binary layout identical is important for benchmarking on Raspberry Pi 1. The addresses at which functions are loaded will have a significant impact on benchmark results, causing unexpected performance changes. Keeping all function addresses the same across the patch enabling a new fast path improves the reliability of benchmarks. Benchmark results are included in the patch enabling this fast path. [Pekka: disabled the fast path, commit message] Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>