Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a port of Siarhei's commit d768558ce to the MMX code.
|
|
Nearest:
over_8888_8888 = L1: 235.79 L2: 243.24 M:225.78 ( 11.84%) HT:305.29 VT:242.82 R:210.29 RT: 99.14 ( 818Kops/s)
over_8888_8888 = L1: 251.10 L2: 256.59 M:239.93 ( 12.65%) HT:294.51 VT:242.61 R:218.32 RT:107.90 ( 853Kops/s)
Bilinear
over_8888_8888 = L1: 121.62 L2: 122.41 M:118.91 ( 6.29%) HT:126.99 VT:117.31 R:101.50 RT: 57.56 ( 561Kops/s)
over_8888_8888 = L1: 121.14 L2: 121.81 M:118.30 ( 6.24%) HT:134.91 VT:124.28 R:115.04 RT: 69.20 ( 634Kops/s)
|
|
Nearest:
over_8888_8888 = L1: 225.75 L2: 230.91 M:217.17 ( 11.54%) HT:266.81 VT:212.29 R:184.76 RT: 86.19 ( 752Kops/s)
over_8888_8888 = L1: 235.79 L2: 243.24 M:225.78 ( 11.84%) HT:305.29 VT:242.82 R:210.29 RT: 99.14 ( 818Kops/s)
Bilinear:
over_8888_8888 = L1: 111.66 L2: 112.01 M:108.58 ( 5.69%) HT:118.60 VT:109.76 R: 95.89 RT: 55.55 ( 547Kops/s)
over_8888_8888 = L1: 121.62 L2: 122.41 M:118.91 ( 6.29%) HT:126.99 VT:117.31 R:101.50 RT: 57.56 ( 561Kops/s)
|
|
Unscaled:
over_8888_8888 = L1: 341.81 L2: 349.64 M:320.45 ( 16.90%) HT:401.75 VT:332.77 R:296.34 RT:150.63 (1572Kops/s)
Before:
over_8888_8888 = L1: 149.48 L2: 158.03 M:147.25 ( 7.75%) HT:386.07 VT:309.35 R:290.92 RT:132.73 ( 954Kops/s)
After:
over_8888_8888 = L1: 225.75 L2: 230.91 M:217.17 ( 11.54%) HT:266.81 VT:212.29 R:184.76 RT: 86.19 ( 752Kops/s)
|
|
Unscaled:
over_8888_n_8888 = L1: 395.65 L2: 395.19 M:372.80 ( 19.59%) HT:257.44 VT:194.30 R:164.31 RT: 75.20 ( 882Kops/s)
Before:
over_8888_n_8888 = L1: 107.65 L2: 106.34 M:103.18 ( 5.44%) HT: 49.62 VT: 45.55 R: 43.48 RT: 20.38 ( 242Kops/s)
After:
over_8888_n_8888 = L1: 224.50 L2: 223.42 M:215.18 ( 11.30%) HT:172.70 VT:148.65 R:127.62 RT: 64.50 ( 615Kops/s)
|
|
These are now handled by floating point combiners.
|
|
An upcoming commit will delete many of the operators from
pixman-combine32.c and rely on the ones in pixman-combine-float.c. The
comments about how the operators were derived are still useful though,
so copy them into pixman-combine-float.c before the deletion.
|
|
Consider a HARD_LIGHT operation with the following pixels:
- source: 15 (6 bits)
- source alpha: 255 (8 bits)
- mask alpha: 223 (8 bits)
- dest 255 (8 bits)
- dest alpha: 0 (8 bits)
Since 2 times the source is less than source alpha, the first branch
of the hard light blend mode is taken:
(1 - sa) * d + (1 - da) * s + 2 * s * d
Since da is 0 and d is 1, this degenerates to:
(1 - sa) + 3 * s
Taking (src IN mask) into account along with the fact that sa is 1,
this becomes:
(1 - ma) + 3 * s * ma
= (1 - 223/255.0) + 3 * (15/63.0) * (223/255.0)
= 0.7501400560224089
When computed with the source converted by bit replication to eight
bits, and additionally with the (src IN mask) part rounded to eight
bits, we get:
ma = 223/255.0
s * ma = (60 / 255.0) * (223/255.0) which rounds to 52 / 255
and the result is
(1 - ma) + 3 * s * ma
= (1 - 223/255.0) + 3 * 52/255.0
= 0.7372549019607844
so now we have an error of 0.012885.
Without making changes to the way pixman does integer
rounding/arithmetic, this error must then be considered
acceptable. Due to conservative computations in the test suite we can
however get away with 0.0128 as the acceptable deviation.
This fixes the remaining failures in pixel-test.
|
|
Consider a DISJOINT_ATOP operation with the following pixels:
- source: 0xff (8 bits)
- source alpha: 0x01 (8 bits)
- mask alpha: 0x7b (8 bits)
- dest: 0x00 (8 bits)
- dest alpha: 0xff (8 bits)
When (src IN mask) is computed in 8 bits, the resulting alpha channel
is 0 due to rounding:
floor ((0x01 * 0x7b) / 255.0 + 0.5) = floor (0.9823) = 0
which means that since Render defines any division by zero as
infinity, the Fa and Fb for this operator end up as follows:
Fa = max (1 - (1 - 1) / 0, 0) = 0
Fb = min (1, (1 - 0) / 1) = 1
and so since dest is 0x00, the overall result is 0.
However, when computed in full precision, the alpha value no longer
rounds to 0, and so Fa ends up being
Fa = max (1 - (1 - 1) / 0.0001, 0) = 1
and so the result is now
s * ma * Fa + d * Fb
= (1.0 * (0x7b / 255.0) * 1) + d * 0
= 0x7b / 255.0
= 0.4823
so the error in this case ends up being 0.48235294, which is clearly
not something that can be considered acceptable.
In order to avoid this problem, we need to do all arithmetic in such a
way that a multiplication of two tiny numbers can never end up being
zero unless one of the input numbers is itself zero.
This patch makes all computations that involve divisions take place in
floating point, which is sufficient to fix the test cases
This brings the number of failures in pixel-test down to 14.
|
|
The Soft Light operator has several branches. One them is decided
based on whether 2 * s is less than or equal to 2 * sa. In floating
point implementations, when those two values are very close to each
other, it may not be completely predictable which branch we hit.
This is a problem because in one branch, when destination alpha is
zero, we get the result
r = d * as
and in the other we get
r = 0
So when d and as are not 0, this causes two different results to be
returned from essentially identical input values. In other words,
there is a discontinuity in the current implementation.
This patch randomly changes the second branch such that it now returns
d * sa instead. There is no deep meaning behind this, because
essentially this is an attempt to assign meaning to division by zero,
and all that is requires is that that meaning doesn't depend on minute
differences in input values.
This makes the number of failed pixels in pixel-test go down to 347.
|
|
In the component alpha part of the PDF_SEPARABLE_BLEND_MODE macro, the
expression ~RED_8 (m) is used. Because RED_8(m) gets promoted to int
before ~ is applied, the whole expression typically becomes some
negative value rather than (255 - RED_8(m)) as desired.
Fix this by using unsigned temporary variables.
This reduces the number of failures in pixel-test to 363.
|
|
This commit fixes four separate bugs:
1. In the computation
(1 - sa) * d + (1 - da) * s + sa * da * B(s, d)
we were using regular addition for all four channels, but for
superluminescent pixels, the addition could overflow causing
nonsensical results.
2. The variables and return types used for the results of the blend
mode calculations were unsigned, but for various blend modes (and
especially with superluminescent pixels), the blend mode
calculations could be negative, resulting in underflows.
3. The blend mode computations were returned as 8-bit values, which is
not sufficient precision (especially considering that we need
signed results).
4. The value before the final division by 255 was not properly clamped
to [0, 255].
This patch fixes all those bugs. The blend mode computations are now
returned as signed 16 bit values with 1 represented as 255 * 255.
With these fixes, the number of failing pixels in pixel-test goes down
from 431 to 384.
|
|
This commit adds a large number of pixel regressions to
pixel-test. All of these have at some point been failing in
blend-mode-test, and most of them do fail currently.
To be specific, with this commit, pixel-test reports 431 failed tests.
|
|
This new test program is similar to test/composite in that it relies
on the pixel_checker_t API to do tolerance based verification. But
unlike the composite test, which verifies combinations of a fixed set
of pixels, this one generates random images and verifies that those
composite correctly.
Also unlike composite, tolerance-test supports all the separable blend
mode operators in addition to the original Render operators.
When tests fail, a C struct is printed that can be pasted into
pixel-test for regression purposes.
There is an option "--forever" which causes the random seed to be set
to the current time, and then the test runs until interrupted. This is
useful for overnight runs.
This test currently fails badly due to various bugs in the blend mode
operators. Later commits will fix those.
|
|
A new command line argument allows the user to specify which one of
the regressions should be run.
|
|
Support is added to pixel-test for verifying operations involving
masks. If a regression includes a mask, it is verified with the
pixel_checker API in in both unified and component alpha modes.
|
|
|
|
The implementations are copied from the floating point pipeline, but
use double precision instead of single precision.
|
|
With GCC 4.8.2 the COMPILE_TIME_ASSERT macro produces a spurious
warning about an unused local typedef:
In file included from pixman.c:29:0:
pixman.c: In function 'optimize_operator':
pixman-private.h:1019:22: warning: typedef 'compile_time_assertion' locally defined but not used [-Wunused-local-typedefs]
The flag -Wno-unused-local-typedefs suppresses that warning.
|
|
According to the definition of soft light, the first comparison is
less-than-or-equal, not less-than.
|
|
Currently, if you attempt to use component alpha on source images or
images without RGB channels, Pixman will silently just use unified
alpha instead. This patch makes such images supported for component
alpha.
There is no particularly compelling usecase at the moment, but this
patch does get rid of a bit of special-case code both in
pixman-general.c and in test/composite.c.
|
|
The call_test_function() contains some assembly that deliberately
causes the stack to be aligned to 32 bits rather than 128 bits on
x86-32. The intention is to catch bugs that surface when pixman is
called from code that only uses a 32 bit alignment.
However, recent versions of GCC apparently make the assumption (either
accidentally or deliberately) that that the incoming stack is aligned
to 128 bits, where older versions only seemed to make this assumption
when compiling with -msse2. This causes the vector code in the PRNG to
now segfault when called from call_test_function() on x86-32.
This patch fixes that by only making the stack unaligned on 32 bit
Windows, where it would definitely be incorrect for GCC to assume that
the incoming stack is aligned to 128 bits.
V2: Put "defined(...)" around __GNUC__
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=491110
|
|
SSSE3 is detected by bit 9 of ECX, but we were checking bit 9 of EDX
which is APIC leading to SSSE3 routines being called on CPUs without
SSSE3.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Without this, if tarballs are generated on a system that doesn't have
GTK+ 2 development headers available, the files in EXTRA_DIST will not
be included, which then causes builds from the tarball to fail on
systems that do have GTK+ 2 headers available.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=71465
|
|
The win32 build has no config.h, so HAVE_CONFIG_H should be checked
before including it, as in utils.h.
|
|
|
|
|
|
|
|
|
|
If t->bottom is close to MIN_INT (probably invalid value), subtracting
top can lead to underflow which causes crashes. Attached patch will
fix the issue.
This fixes bug 67484.
|
|
This trapezoid causes a crash due to an underflow in the
pixman_trapezoid_valid().
Test case from Ritesh Khadgaray.
|
|
The following patch fixes building pixman with older GCC releases
such as GCC 3.3 and older (OpenBSD; some older archs use GCC 3.3.6)
by changing the method of detecting the presence of __builtin_clz
to utilizing an autoconf check to determine its presence. Compilers
that pretend to be GCC, implement __builtin_clz and are already
utilizing the intrinsic include LLVM/Clang, Open64, EKOPath and
PCC.
|
|
The functions pixman_composite_glyphs_no_mask() and
pixman_composite_glyphs() can call into code compiled with -msse2,
which requires the stack to be aligned to 16 bytes. Since the ABIs on
Windows and Linux for x86-32 don't provide this guarantee, we need to
use this attribute to make GCC generate a prologue that realigns the
stack.
This fixes the crash introduced in the previous commit and also
https://bugs.freedesktop.org/show_bug.cgi?id=70348
and
https://bugs.freedesktop.org/show_bug.cgi?id=68300
|
|
GCC when compiling with -msse2 and -mssse3 will assume that the stack
is aligned to 16 bytes even on x86-32 and accordingly issue movdqa
instructions for stack allocated variables.
But despite what GCC thinks, the standard ABI on x86-32 only requires
a 4-byte aligned stack. This is true at least on Windows, but there
also was (and maybe still is) Linux code in the wild that assumed
this. When such code calls into pixman and hits something compiled
with -msse2, we get a segfault from the unaligned movdqas.
Pixman has worked around this issue in the past with the gcc attribute
"force_align_arg_pointer" but the problem has resurfaced now in
https://bugs.freedesktop.org/show_bug.cgi?id=68300
because pixman_composite_glyphs() is missing this attribute.
This patch makes fuzzer_test_main() call the test_function through a
trampoline, which, on x86-32, has a bit of assembly that deliberately
avoids aligning the stack to 16 bytes as GCC normally expects. The
result is that glyph-test now crashes.
V2: Mark caller-save registers as clobbered, rather than using
noinline on the trampoline.
|
|
The accidental use of declaration after statement breaks compilation
with C89 compilers such as MSVC. Assuming that MSVC is one of the
supported compilers, it makes sense to ask GCC to at least report
warnings for such problematic code.
|
|
Running cairo-perf-trace benchmark on Intel Core2 T7300:
Before:
[ 0] image t-firefox-canvas-swscroll 1.989 2.008 0.43% 8/8
[ 1] image firefox-canvas-scroll 4.574 4.609 0.50% 8/8
After:
[ 0] image t-firefox-canvas-swscroll 1.404 1.418 0.51% 8/8
[ 1] image firefox-canvas-scroll 4.228 4.259 0.36% 8/8
|
|
Clang 3.0 chokes on the following bit of assembly
asm ("pmulhuw %1, %0\n\t"
: "+y" (__A)
: "y" (__B)
);
from pixman-mmx.c with this error message:
fatal error: error in backend: Unsupported asm: input constraint
with a matching output constraint of incompatible type!
So add a check in configure to only enable MMX when the compiler can
deal with it.
|
|
The 'value' field in the 'named_int_t' struct is used for both
pixman_repeat_t and pixman_kernel_t values, so the type should be int,
not pixman_kernel_t.
Fixes some warnings like this
scale.c:124:33: warning: implicit conversion from enumeration
type 'pixman_repeat_t' to different enumeration type
'pixman_kernel_t' [-Wconversion]
{ "None", PIXMAN_REPEAT_NONE },
~ ^~~~~~~~~~~~~~~~~~
when compiled with clang.
|
|
For superluminescent destinations, the old code could underflow in
uint32_t r = (ad - d) * as / s;
when (ad - d) was negative. The new code avoids this problem (and
therefore causes changes in the checksums of thread-test and
blitters-test), but it is likely still buggy due to the use of
unsigned variables and other issues in the blend mode code.
|
|
Change blend_color_dodge() to follow the math in the comment more
closely.
Note, the new code here is in some sense worse than the old code
because it can now underflow the unsigned variables when the source is
superluminescent and (as - s) is therefore negative. The old code was
careful to clamp to 0.
But for superluminescent variables we really need the ability for the
blend function to become negative, and so the solution the underflow
problem is to just use signed variables. The use of unsigned variables
is a general problem in all of the blend mode code that will have to
be solved later.
The CRC32 values in thread-test and blitters-test are updated to
account for the changes in output.
|
|
There are no semantic changes, just variables renames. The motivation
for these renames is so that the names are shorter and better match
the one used in the comments.
|
|
This commit overhauls the comments in pixman-comine32.c regarding
blend modes:
- Add a link to the PDF supplement that clarifies the specification of
ColorBurn and ColorDodge
- Clarify how the formulas for premultiplied colors are derived form
the ones in the PDF specifications
- Write out the derivation of the formulas in each blend routine
|
|
Fix a bunch of spacing issues.
V2: More spacing issues, in the _ca combiners
|
|
The non-reentrant versions of prng_* functions are thread-safe only in
OpenMP-enabled builds.
Fixes thread-test failing when compiled with Clang (both on Linux and
on MacOS).
|
|
Handle SSSE3 just like MMX and SSE2.
|
|
Fixes
check-formats.obj : error LNK2019: unresolved external symbol
_strcasecmp referenced in function _format_from_string
check-formats.obj : error LNK2019: unresolved external symbol
_snprintf referenced in function _list_operators
|
|
In d1434d112ca5cd325e4fb85fc60afd1b9e902786 the benchmarks have been
extended to include other programs as well and the variable names have
been updated accordingly in the autotools-based build system, but not
in the MSVC one.
|
|
After a4c79d695d52c94647b1aff78548e5892d616b70 the MMX and SSE2 code
has some declarations after the beginning of a block, which is not
allowed by MSVC.
Fixes multiple errors like:
pixman-mmx.c(3625) : error C2275: '__m64' : illegal use of this type
as an expression
pixman-sse2.c(5708) : error C2275: '__m128i' : illegal use of this
type as an expression
|
|
The generated fast paths that were moved into the 'fast'
implementation in ec0e38cbb746a673f8e989ab8eae356c8c77dac7 had their
image and iter flag arguments swapped; as a result, none of the fast
paths were ever called.
|
|
So the redundant variables, memory reads/writes and reshuffles
can be safely removed. For example, this makes the inner loop
of 'vmx_combine_add_u_no_mask' function much more simple.
Before:
7a20:7d a8 48 ce lvx v13,r8,r9
7a24:7d 80 48 ce lvx v12,r0,r9
7a28:7d 28 50 ce lvx v9,r8,r10
7a2c:7c 20 50 ce lvx v1,r0,r10
7a30:39 4a 00 10 addi r10,r10,16
7a34:10 0d 62 eb vperm v0,v13,v12,v11
7a38:10 21 4a 2b vperm v1,v1,v9,v8
7a3c:11 2c 6a eb vperm v9,v12,v13,v11
7a40:10 21 4a 00 vaddubs v1,v1,v9
7a44:11 a1 02 ab vperm v13,v1,v0,v10
7a48:10 00 0a ab vperm v0,v0,v1,v10
7a4c:7d a8 49 ce stvx v13,r8,r9
7a50:7c 00 49 ce stvx v0,r0,r9
7a54:39 29 00 10 addi r9,r9,16
7a58:42 00 ff c8 bdnz+ 7a20 <.vmx_combine_add_u_no_mask+0x120>
After:
76c0:7c 00 48 ce lvx v0,r0,r9
76c4:7d a8 48 ce lvx v13,r8,r9
76c8:39 29 00 10 addi r9,r9,16
76cc:7c 20 50 ce lvx v1,r0,r10
76d0:10 00 6b 2b vperm v0,v0,v13,v12
76d4:10 00 0a 00 vaddubs v0,v0,v1
76d8:7c 00 51 ce stvx v0,r0,r10
76dc:39 4a 00 10 addi r10,r10,16
76e0:42 00 ff e0 bdnz+ 76c0 <.vmx_combine_add_u_no_mask+0x120>
|