summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-03-13mmx: faster bilinear interpolation (get rid of XOR instruction)mmx-nearestMatt Turner1-7/+5
This is a port of Siarhei's commit d768558ce to the MMX code.
2014-03-13mmx: Don't read+write dst when not neededMatt Turner1-13/+9
Nearest: over_8888_8888 = L1: 235.79 L2: 243.24 M:225.78 ( 11.84%) HT:305.29 VT:242.82 R:210.29 RT: 99.14 ( 818Kops/s) over_8888_8888 = L1: 251.10 L2: 256.59 M:239.93 ( 12.65%) HT:294.51 VT:242.61 R:218.32 RT:107.90 ( 853Kops/s) Bilinear over_8888_8888 = L1: 121.62 L2: 122.41 M:118.91 ( 6.29%) HT:126.99 VT:117.31 R:101.50 RT: 57.56 ( 561Kops/s) over_8888_8888 = L1: 121.14 L2: 121.81 M:118.30 ( 6.24%) HT:134.91 VT:124.28 R:115.04 RT: 69.20 ( 634Kops/s)
2014-03-13mmx: Don't unpack+repack when not neededMatt Turner1-8/+27
Nearest: over_8888_8888 = L1: 225.75 L2: 230.91 M:217.17 ( 11.54%) HT:266.81 VT:212.29 R:184.76 RT: 86.19 ( 752Kops/s) over_8888_8888 = L1: 235.79 L2: 243.24 M:225.78 ( 11.84%) HT:305.29 VT:242.82 R:210.29 RT: 99.14 ( 818Kops/s) Bilinear: over_8888_8888 = L1: 111.66 L2: 112.01 M:108.58 ( 5.69%) HT:118.60 VT:109.76 R: 95.89 RT: 55.55 ( 547Kops/s) over_8888_8888 = L1: 121.62 L2: 122.41 M:118.91 ( 6.29%) HT:126.99 VT:117.31 R:101.50 RT: 57.56 ( 561Kops/s)
2014-03-13mmx: Add nearest over_8888_8888Matt Turner1-0/+57
Unscaled: over_8888_8888 = L1: 341.81 L2: 349.64 M:320.45 ( 16.90%) HT:401.75 VT:332.77 R:296.34 RT:150.63 (1572Kops/s) Before: over_8888_8888 = L1: 149.48 L2: 158.03 M:147.25 ( 7.75%) HT:386.07 VT:309.35 R:290.92 RT:132.73 ( 954Kops/s) After: over_8888_8888 = L1: 225.75 L2: 230.91 M:217.17 ( 11.54%) HT:266.81 VT:212.29 R:184.76 RT: 86.19 ( 752Kops/s)
2014-03-13mmx: Add nearest over_8888_n_8888Matt Turner1-0/+62
Unscaled: over_8888_n_8888 = L1: 395.65 L2: 395.19 M:372.80 ( 19.59%) HT:257.44 VT:194.30 R:164.31 RT: 75.20 ( 882Kops/s) Before: over_8888_n_8888 = L1: 107.65 L2: 106.34 M:103.18 ( 5.44%) HT: 49.62 VT: 45.55 R: 43.48 RT: 20.38 ( 242Kops/s) After: over_8888_n_8888 = L1: 224.50 L2: 223.42 M:215.18 ( 11.30%) HT:172.70 VT:148.65 R:127.62 RT: 64.50 ( 615Kops/s)
2014-01-04Remove all the operators that use division from pixman-combine32.cSøren Sandmann1-1429/+0
These are now handled by floating point combiners.
2014-01-04Copy the comments from pixman-combine32.c to pixman-combine-float.cSøren Sandmann1-96/+238
An upcoming commit will delete many of the operators from pixman-combine32.c and rely on the ones in pixman-combine-float.c. The comments about how the operators were derived are still useful though, so copy them into pixman-combine-float.c before the deletion.
2014-01-04utils.c: Set DEVIATION to 0.0128Søren Sandmann Pedersen1-1/+1
Consider a HARD_LIGHT operation with the following pixels: - source: 15 (6 bits) - source alpha: 255 (8 bits) - mask alpha: 223 (8 bits) - dest 255 (8 bits) - dest alpha: 0 (8 bits) Since 2 times the source is less than source alpha, the first branch of the hard light blend mode is taken: (1 - sa) * d + (1 - da) * s + 2 * s * d Since da is 0 and d is 1, this degenerates to: (1 - sa) + 3 * s Taking (src IN mask) into account along with the fact that sa is 1, this becomes: (1 - ma) + 3 * s * ma = (1 - 223/255.0) + 3 * (15/63.0) * (223/255.0) = 0.7501400560224089 When computed with the source converted by bit replication to eight bits, and additionally with the (src IN mask) part rounded to eight bits, we get: ma = 223/255.0 s * ma = (60 / 255.0) * (223/255.0) which rounds to 52 / 255 and the result is (1 - ma) + 3 * s * ma = (1 - 223/255.0) + 3 * 52/255.0 = 0.7372549019607844 so now we have an error of 0.012885. Without making changes to the way pixman does integer rounding/arithmetic, this error must then be considered acceptable. Due to conservative computations in the test suite we can however get away with 0.0128 as the acceptable deviation. This fixes the remaining failures in pixel-test.
2014-01-04Use floating point combiners for all operators that involve divisionsSøren Sandmann3-41/+29
Consider a DISJOINT_ATOP operation with the following pixels: - source: 0xff (8 bits) - source alpha: 0x01 (8 bits) - mask alpha: 0x7b (8 bits) - dest: 0x00 (8 bits) - dest alpha: 0xff (8 bits) When (src IN mask) is computed in 8 bits, the resulting alpha channel is 0 due to rounding: floor ((0x01 * 0x7b) / 255.0 + 0.5) = floor (0.9823) = 0 which means that since Render defines any division by zero as infinity, the Fa and Fb for this operator end up as follows: Fa = max (1 - (1 - 1) / 0, 0) = 0 Fb = min (1, (1 - 0) / 1) = 1 and so since dest is 0x00, the overall result is 0. However, when computed in full precision, the alpha value no longer rounds to 0, and so Fa ends up being Fa = max (1 - (1 - 1) / 0.0001, 0) = 1 and so the result is now s * ma * Fa + d * Fb = (1.0 * (0x7b / 255.0) * 1) + d * 0 = 0x7b / 255.0 = 0.4823 so the error in this case ends up being 0.48235294, which is clearly not something that can be considered acceptable. In order to avoid this problem, we need to do all arithmetic in such a way that a multiplication of two tiny numbers can never end up being zero unless one of the input numbers is itself zero. This patch makes all computations that involve divisions take place in floating point, which is sufficient to fix the test cases This brings the number of failures in pixel-test down to 14.
2014-01-04Soft Light: Consistent approach to division by zeroSøren Sandmann3-3/+3
The Soft Light operator has several branches. One them is decided based on whether 2 * s is less than or equal to 2 * sa. In floating point implementations, when those two values are very close to each other, it may not be completely predictable which branch we hit. This is a problem because in one branch, when destination alpha is zero, we get the result r = d * as and in the other we get r = 0 So when d and as are not 0, this causes two different results to be returned from essentially identical input values. In other words, there is a discontinuity in the current implementation. This patch randomly changes the second branch such that it now returns d * sa instead. There is no deep meaning behind this, because essentially this is an attempt to assign meaning to division by zero, and all that is requires is that that meaning doesn't depend on minute differences in input values. This makes the number of failed pixels in pixel-test go down to 347.
2014-01-04pixman-combine32.c: Fix bugs related to integer promotionSøren Sandmann Pedersen2-4/+9
In the component alpha part of the PDF_SEPARABLE_BLEND_MODE macro, the expression ~RED_8 (m) is used. Because RED_8(m) gets promoted to int before ~ is applied, the whole expression typically becomes some negative value rather than (255 - RED_8(m)) as desired. Fix this by using unsigned temporary variables. This reduces the number of failures in pixel-test to 363.
2014-01-04pixman/pixman-combine32.c: Bug fixes for separable blend modesSøren Sandmann Pedersen3-62/+94
This commit fixes four separate bugs: 1. In the computation (1 - sa) * d + (1 - da) * s + sa * da * B(s, d) we were using regular addition for all four channels, but for superluminescent pixels, the addition could overflow causing nonsensical results. 2. The variables and return types used for the results of the blend mode calculations were unsigned, but for various blend modes (and especially with superluminescent pixels), the blend mode calculations could be negative, resulting in underflows. 3. The blend mode computations were returned as 8-bit values, which is not sufficient precision (especially considering that we need signed results). 4. The value before the final division by 255 was not properly clamped to [0, 255]. This patch fixes all those bugs. The blend mode computations are now returned as signed 16 bit values with 1 represented as 255 * 255. With these fixes, the number of failing pixels in pixel-test goes down from 431 to 384.
2014-01-04pixel-test.c: Add a number of pixels that have failed at some pointSøren Sandmann1-0/+2680
This commit adds a large number of pixel regressions to pixel-test. All of these have at some point been failing in blend-mode-test, and most of them do fail currently. To be specific, with this commit, pixel-test reports 431 failed tests.
2014-01-04test/tolerance-test: New test programSøren Sandmann Pedersen2-0/+361
This new test program is similar to test/composite in that it relies on the pixel_checker_t API to do tolerance based verification. But unlike the composite test, which verifies combinations of a fixed set of pixels, this one generates random images and verifies that those composite correctly. Also unlike composite, tolerance-test supports all the separable blend mode operators in addition to the original Render operators. When tests fail, a C struct is printed that can be pasted into pixel-test for regression purposes. There is an option "--forever" which causes the random seed to be set to the current time, and then the test runs until interrupted. This is useful for overnight runs. This test currently fails badly due to various bugs in the blend mode operators. Later commits will fix those.
2014-01-04pixel-test: Command line argument to specify the regression to runSøren Sandmann1-1/+13
A new command line argument allows the user to specify which one of the regressions should be run.
2014-01-04pixel-test: Add support for mask pixelsSøren Sandmann1-11/+75
Support is added to pixel-test for verifying operations involving masks. If a regression includes a mask, it is verified with the pixel_checker API in in both unified and component alpha modes.
2014-01-04test/check-formats.c: Add support for separable blend modesSøren Sandmann Pedersen1-0/+16
2014-01-04test/utils.c: Add support for separable blend mode ops to do_composite()Søren Sandmann Pedersen1-4/+178
The implementations are copied from the floating point pipeline, but use double precision instead of single precision.
2013-12-26configure.ac: Check and use -Wno-unused-local-typedefs GCC optionSøren Sandmann1-0/+1
With GCC 4.8.2 the COMPILE_TIME_ASSERT macro produces a spurious warning about an unused local typedef: In file included from pixman.c:29:0: pixman.c: In function 'optimize_operator': pixman-private.h:1019:22: warning: typedef 'compile_time_assertion' locally defined but not used [-Wunused-local-typedefs] The flag -Wno-unused-local-typedefs suppresses that warning.
2013-12-03Soft Light: The first comparison should be <=, not <Søren Sandmann2-2/+2
According to the definition of soft light, the first comparison is less-than-or-equal, not less-than.
2013-11-23general: Support component alpha for all image typesSøren Sandmann3-17/+2
Currently, if you attempt to use component alpha on source images or images without RGB channels, Pixman will silently just use unified alpha instead. This patch makes such images supported for component alpha. There is no particularly compelling usecase at the moment, but this patch does get rid of a bit of special-case code both in pixman-general.c and in test/composite.c.
2013-11-17test/utils.c: Make the stack unaligned only on 32 bit WindowsSøren Sandmann1-1/+1
The call_test_function() contains some assembly that deliberately causes the stack to be aligned to 32 bits rather than 128 bits on x86-32. The intention is to catch bugs that surface when pixman is called from code that only uses a 32 bit alignment. However, recent versions of GCC apparently make the assumption (either accidentally or deliberately) that that the incoming stack is aligned to 128 bits, where older versions only seemed to make this assumption when compiling with -msse2. This causes the vector code in the PRNG to now segfault when called from call_test_function() on x86-32. This patch fixes that by only making the stack unaligned on 32 bit Windows, where it would definitely be incorrect for GCC to assume that the incoming stack is aligned to 128 bits. V2: Put "defined(...)" around __GNUC__ Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=491110
2013-11-12Fix the SSSE3 CPUID detection.Jakub Bogusz1-1/+1
SSSE3 is detected by bit 9 of ECX, but we were checking bit 9 of EDX which is APIC leading to SSSE3 routines being called on CPUs without SSSE3. Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-11-11demos/Makefile.am: Move EXTRA_DIST outside "if HAVE_GTK"Søren Sandmann1-2/+2
Without this, if tarballs are generated on a system that doesn't have GTK+ 2 development headers available, the files in EXTRA_DIST will not be included, which then causes builds from the tarball to fail on systems that do have GTK+ 2 headers available. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=71465
2013-11-11test: Fix the win32 buildAndrea Canciani1-2/+1
The win32 build has no config.h, so HAVE_CONFIG_H should be checked before including it, as in utils.h.
2013-11-10Post-release version bump to 0.33.1Søren Sandmann1-2/+2
2013-11-10Pre-release version bump to 0.32.0Søren Sandmann1-2/+2
2013-11-01Post-release version bump to 0.31.3Søren Sandmann Pedersen1-1/+1
2013-11-01Pre-release version bump to 0.31.2Søren Sandmann Pedersen1-1/+1
2013-11-01pixman_trapezoid_valid(): Fix underflow when bottom is close to MIN_INTRitesh Khadgaray1-1/+1
If t->bottom is close to MIN_INT (probably invalid value), subtracting top can lead to underflow which causes crashes. Attached patch will fix the issue. This fixes bug 67484.
2013-11-01test/trap-crasher.c: Add trapezoid that demonstrates a crashSøren Sandmann Pedersen1-1/+13
This trapezoid causes a crash due to an underflow in the pixman_trapezoid_valid(). Test case from Ritesh Khadgaray.
2013-11-01Fix pixman build with older GCC releasesBrad Smith2-1/+17
The following patch fixes building pixman with older GCC releases such as GCC 3.3 and older (OpenBSD; some older archs use GCC 3.3.6) by changing the method of detecting the presence of __builtin_clz to utilizing an autoconf check to determine its presence. Compilers that pretend to be GCC, implement __builtin_clz and are already utilizing the intrinsic include LLVM/Clang, Open64, EKOPath and PCC.
2013-10-17pixman-glyph.c: Add __force_align_arg_pointer to composite functionsSøren Sandmann Pedersen1-0/+6
The functions pixman_composite_glyphs_no_mask() and pixman_composite_glyphs() can call into code compiled with -msse2, which requires the stack to be aligned to 16 bytes. Since the ABIs on Windows and Linux for x86-32 don't provide this guarantee, we need to use this attribute to make GCC generate a prologue that realigns the stack. This fixes the crash introduced in the previous commit and also https://bugs.freedesktop.org/show_bug.cgi?id=70348 and https://bugs.freedesktop.org/show_bug.cgi?id=68300
2013-10-17utils.c: On x86-32 unalign the stack before calling test_functionSøren Sandmann Pedersen1-2/+30
GCC when compiling with -msse2 and -mssse3 will assume that the stack is aligned to 16 bytes even on x86-32 and accordingly issue movdqa instructions for stack allocated variables. But despite what GCC thinks, the standard ABI on x86-32 only requires a 4-byte aligned stack. This is true at least on Windows, but there also was (and maybe still is) Linux code in the wild that assumed this. When such code calls into pixman and hits something compiled with -msse2, we get a segfault from the unaligned movdqas. Pixman has worked around this issue in the past with the gcc attribute "force_align_arg_pointer" but the problem has resurfaced now in https://bugs.freedesktop.org/show_bug.cgi?id=68300 because pixman_composite_glyphs() is missing this attribute. This patch makes fuzzer_test_main() call the test_function through a trampoline, which, on x86-32, has a bit of assembly that deliberately avoids aligning the stack to 16 bytes as GCC normally expects. The result is that glyph-test now crashes. V2: Mark caller-save registers as clobbered, rather than using noinline on the trampoline.
2013-10-14configure.ac: check and use -Wdeclaration-after-statement GCC optionSiarhei Siamashka1-0/+1
The accidental use of declaration after statement breaks compilation with C89 compilers such as MSVC. Assuming that MSVC is one of the supported compilers, it makes sense to ask GCC to at least report warnings for such problematic code.
2013-10-14sse2: bilinear fast path for src_x888_8888Siarhei Siamashka1-0/+67
Running cairo-perf-trace benchmark on Intel Core2 T7300: Before: [ 0] image t-firefox-canvas-swscroll 1.989 2.008 0.43% 8/8 [ 1] image firefox-canvas-scroll 4.574 4.609 0.50% 8/8 After: [ 0] image t-firefox-canvas-swscroll 1.404 1.418 0.51% 8/8 [ 1] image firefox-canvas-scroll 4.228 4.259 0.36% 8/8
2013-10-12configure.ac: Add check for pmulhuw assemblySøren Sandmann Pedersen1-0/+6
Clang 3.0 chokes on the following bit of assembly asm ("pmulhuw %1, %0\n\t" : "+y" (__A) : "y" (__B) ); from pixman-mmx.c with this error message: fatal error: error in backend: Unsupported asm: input constraint with a matching output constraint of incompatible type! So add a check in configure to only enable MMX when the compiler can deal with it.
2013-10-12scale.c: Use int instead of kernel_t for values in named_int_tSøren Sandmann Pedersen1-3/+3
The 'value' field in the 'named_int_t' struct is used for both pixman_repeat_t and pixman_kernel_t values, so the type should be int, not pixman_kernel_t. Fixes some warnings like this scale.c:124:33: warning: implicit conversion from enumeration type 'pixman_repeat_t' to different enumeration type 'pixman_kernel_t' [-Wconversion] { "None", PIXMAN_REPEAT_NONE }, ~ ^~~~~~~~~~~~~~~~~~ when compiled with clang.
2013-10-12pixman-combine32.c: Make Color Burn routine follow the math more closelySøren Sandmann Pedersen3-10/+9
For superluminescent destinations, the old code could underflow in uint32_t r = (ad - d) * as / s; when (ad - d) was negative. The new code avoids this problem (and therefore causes changes in the checksums of thread-test and blitters-test), but it is likely still buggy due to the use of unsigned variables and other issues in the blend mode code.
2013-10-12pixman-combine32: Make Color Dodge routine follow the math more closelySøren Sandmann Pedersen3-10/+9
Change blend_color_dodge() to follow the math in the comment more closely. Note, the new code here is in some sense worse than the old code because it can now underflow the unsigned variables when the source is superluminescent and (as - s) is therefore negative. The old code was careful to clamp to 0. But for superluminescent variables we really need the ability for the blend function to become negative, and so the solution the underflow problem is to just use signed variables. The use of unsigned variables is a general problem in all of the blend mode code that will have to be solved later. The CRC32 values in thread-test and blitters-test are updated to account for the changes in output.
2013-10-12pixman-combine32: Rename a number of variable from sa/sca to as/sSøren Sandmann Pedersen1-100/+99
There are no semantic changes, just variables renames. The motivation for these renames is so that the names are shorter and better match the one used in the comments.
2013-10-12pixman-combine32: Improve documentation for blend mode operatorsSøren Sandmann Pedersen1-126/+204
This commit overhauls the comments in pixman-comine32.c regarding blend modes: - Add a link to the PDF supplement that clarifies the specification of ColorBurn and ColorDodge - Clarify how the formulas for premultiplied colors are derived form the ones in the PDF specifications - Write out the derivation of the formulas in each blend routine
2013-10-12pixman-combine32.c: Formatting fixesSøren Sandmann Pedersen1-62/+64
Fix a bunch of spacing issues. V2: More spacing issues, in the _ca combiners
2013-10-09Fix thread-test on non-OpenMP systemsAndrea Canciani1-6/+9
The non-reentrant versions of prng_* functions are thread-safe only in OpenMP-enabled builds. Fixes thread-test failing when compiled with Clang (both on Linux and on MacOS).
2013-10-09Add support for SSSE3 to the MSVC build systemAndrea Canciani1-2/+27
Handle SSSE3 just like MMX and SSE2.
2013-10-09Fix build of check-formats on MSVCAndrea Canciani1-0/+5
Fixes check-formats.obj : error LNK2019: unresolved external symbol _strcasecmp referenced in function _format_from_string check-formats.obj : error LNK2019: unresolved external symbol _snprintf referenced in function _list_operators
2013-10-09Fix building of "other" programs on MSVCAndrea Canciani1-3/+3
In d1434d112ca5cd325e4fb85fc60afd1b9e902786 the benchmarks have been extended to include other programs as well and the variable names have been updated accordingly in the autotools-based build system, but not in the MSVC one.
2013-10-09Fix build on MSVCAndrea Canciani2-2/+2
After a4c79d695d52c94647b1aff78548e5892d616b70 the MMX and SSE2 code has some declarations after the beginning of a block, which is not allowed by MSVC. Fixes multiple errors like: pixman-mmx.c(3625) : error C2275: '__m64' : illegal use of this type as an expression pixman-sse2.c(5708) : error C2275: '__m128i' : illegal use of this type as an expression
2013-10-04fast: Swap image and iter flags in generated fast pathsSøren Sandmann Pedersen1-3/+3
The generated fast paths that were moved into the 'fast' implementation in ec0e38cbb746a673f8e989ab8eae356c8c77dac7 had their image and iter flag arguments swapped; as a result, none of the fast paths were ever called.
2013-10-01vmx: there is no need to handle unaligned destination anymoreSiarhei Siamashka1-81/+36
So the redundant variables, memory reads/writes and reshuffles can be safely removed. For example, this makes the inner loop of 'vmx_combine_add_u_no_mask' function much more simple. Before: 7a20:7d a8 48 ce lvx v13,r8,r9 7a24:7d 80 48 ce lvx v12,r0,r9 7a28:7d 28 50 ce lvx v9,r8,r10 7a2c:7c 20 50 ce lvx v1,r0,r10 7a30:39 4a 00 10 addi r10,r10,16 7a34:10 0d 62 eb vperm v0,v13,v12,v11 7a38:10 21 4a 2b vperm v1,v1,v9,v8 7a3c:11 2c 6a eb vperm v9,v12,v13,v11 7a40:10 21 4a 00 vaddubs v1,v1,v9 7a44:11 a1 02 ab vperm v13,v1,v0,v10 7a48:10 00 0a ab vperm v0,v0,v1,v10 7a4c:7d a8 49 ce stvx v13,r8,r9 7a50:7c 00 49 ce stvx v0,r0,r9 7a54:39 29 00 10 addi r9,r9,16 7a58:42 00 ff c8 bdnz+ 7a20 <.vmx_combine_add_u_no_mask+0x120> After: 76c0:7c 00 48 ce lvx v0,r0,r9 76c4:7d a8 48 ce lvx v13,r8,r9 76c8:39 29 00 10 addi r9,r9,16 76cc:7c 20 50 ce lvx v1,r0,r10 76d0:10 00 6b 2b vperm v0,v0,v13,v12 76d4:10 00 0a 00 vaddubs v0,v0,v1 76d8:7c 00 51 ce stvx v0,r0,r10 76dc:39 4a 00 10 addi r10,r10,16 76e0:42 00 ff e0 bdnz+ 76c0 <.vmx_combine_add_u_no_mask+0x120>