summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-08-01Post-release version bump to 0.33.3HEADmasterOded Gabbay1-1/+1
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-08-01Pre-release version bump to 0.33.2pixman-0.33.2Oded Gabbay1-1/+1
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-07-16vmx: implement fast path iterator vmx_fetch_a8Oded Gabbay1-0/+46
no changes were observed when running cairo trimmed benchmarks. Running "lowlevel-blt-bench src_8_8888" on POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le gave the following results: reference memcpy speed = 25197.2MB/s (6299.3MP/s for 32bpp fills) Before After Change -------------------------------------------- L1 965.34 3936 +307.73% L2 942.99 3436.29 +264.40% M 902.24 2757.77 +205.66% HT 448.46 784.99 +75.04% VT 430.05 819.78 +90.62% R 412.9 717.04 +73.66% RT 168.93 220.63 +30.60% Kops/s 1025 1303 +27.12% It was benchmarked against commid id e2d211a from pixman/master Siarhei Siamashka reported that on playstation3, it shows the following results: == before == src_8_8888 = L1: 194.37 L2: 198.46 M:155.90 (148.35%) HT: 59.18 VT: 36.71 R: 38.93 RT: 12.79 ( 106Kops/s) == after == src_8_8888 = L1: 373.96 L2: 391.10 M:245.81 (233.88%) HT: 80.81 VT: 44.33 R: 48.10 RT: 14.79 ( 122Kops/s) Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path iterator vmx_fetch_x8r8g8b8Oded Gabbay1-0/+48
It was benchmarked against commid id 2be523b from pixman/master POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. cairo trimmed benchmarks : Speedups ======== t-firefox-asteroids 533.92 -> 489.94 : 1.09x Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path scaled nearest vmx_8888_8888_OVEROded Gabbay1-0/+128
It was benchmarked against commid id 2be523b from pixman/master POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills) Before After Change --------------------------------------------- L1 134.36 181.68 +35.22% L2 135.07 180.67 +33.76% M 134.6 180.51 +34.11% HT 121.77 128.79 +5.76% VT 120.49 145.07 +20.40% R 93.83 102.3 +9.03% RT 50.82 46.93 -7.65% Kops/s 448 422 -5.80% cairo trimmed benchmarks : Speedups ======== t-firefox-asteroids 533.92 -> 497.92 : 1.07x t-midori-zoomed 692.98 -> 651.24 : 1.06x Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path vmx_composite_src_x888_8888Oded Gabbay1-0/+60
It was benchmarked against commid id 2be523b from pixman/master POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills) Before After Change --------------------------------------------- L1 1115.4 5006.49 +348.85% L2 1112.26 4338.01 +290.02% M 1110.54 2524.15 +127.29% HT 745.41 1140.03 +52.94% VT 749.03 1287.13 +71.84% R 423.91 547.6 +29.18% RT 205.79 194.98 -5.25% Kops/s 1414 1361 -3.75% cairo trimmed benchmarks : Speedups ======== t-gnome-system-monitor 1402.62 -> 1212.75 : 1.16x t-firefox-asteroids 533.92 -> 474.50 : 1.13x Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path vmx_composite_over_n_8888_8888_caOded Gabbay1-0/+112
It was benchmarked against commid id 2be523b from pixman/master POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills) Before After Change --------------------------------------------- L1 61.92 244.91 +295.53% L2 62.74 243.3 +287.79% M 63.03 241.94 +283.85% HT 59.91 144.22 +140.73% VT 59.4 174.39 +193.59% R 53.6 111.37 +107.78% RT 37.99 46.38 +22.08% Kops/s 436 506 +16.06% cairo trimmed benchmarks : Speedups ======== t-xfce4-terminal-a1 1540.37 -> 1226.14 : 1.26x t-firefox-talos-gfx 1488.59 -> 1209.19 : 1.23x Slowdowns ========= t-evolution 553.88 -> 581.63 : 1.05x t-poppler 364.99 -> 383.79 : 1.05x t-firefox-scrolling 1223.65 -> 1304.34 : 1.07x The slowdowns can be explained in cases where the images are small and un-aligned to 16-byte boundary. In that case, the function will first work on the un-aligned area, even in operations of 1 byte. In case of small images, the overhead of such operations can be more than the savings we get from using the vmx instructions that are done on the aligned part of the image. In the C fast-path implementation, there is no special treatment for the un-aligned part, as it works in 4 byte quantities on the entire image. Because llbb is a synthetic test, I would assume it has much less alignment issues than "real-world" scenario, such as cairo benchmarks, which are basically recorded traces of real application activity. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path composite_add_8888_8888Oded Gabbay1-0/+27
Copied impl. from sse2 file and edited to use vmx functions It was benchmarked against commid id 2be523b from pixman/master POWER8, 16 cores, 3.4GHz, ppc64le : reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills) Before After Change --------------------------------------------- L1 248.76 3284.48 +1220.34% L2 264.09 2826.47 +970.27% M 261.24 2405.06 +820.63% HT 217.27 857.3 +294.58% VT 213.78 980.09 +358.46% R 176.61 442.95 +150.81% RT 107.54 150.08 +39.56% Kops/s 917 1125 +22.68% Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path composite_add_8_8Oded Gabbay1-0/+55
Copied impl. from sse2 file and edited to use vmx functions It was benchmarked against commid id 2be523b from pixman/master POWER8, 16 cores, 3.4GHz, ppc64le : reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills) Before After Change --------------------------------------------- L1 687.63 9140.84 +1229.33% L2 715 7495.78 +948.36% M 717.39 8460.14 +1079.29% HT 569.56 1020.12 +79.11% VT 520.3 1215.56 +133.63% R 514.81 874.35 +69.84% RT 341.28 305.42 -10.51% Kops/s 1621 1579 -2.59% Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path composite_over_8888_8888Oded Gabbay1-0/+30
Copied impl. from sse2 file and edited to use vmx functions It was benchmarked against commid id 2be523b from pixman/master POWER8, 16 cores, 3.4GHz, ppc64le : reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills) Before After Change --------------------------------------------- L1 129.47 1054.62 +714.57% L2 138.31 1011.02 +630.98% M 139.99 1008.65 +620.52% HT 122.11 468.45 +283.63% VT 121.06 532.21 +339.62% R 108.48 240.5 +121.70% RT 77.87 116.7 +49.87% Kops/s 758 981 +29.42% Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: implement fast path vmx_fillOded Gabbay1-0/+153
Based on sse2 impl. It was benchmarked against commid id e2d211a from pixman/master Tested cairo trimmed benchmarks on POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le : speedups ======== t-swfdec-giant-steps 1383.09 -> 718.63 : 1.92x speedup t-gnome-system-monitor 1403.53 -> 918.77 : 1.53x speedup t-evolution 552.34 -> 415.24 : 1.33x speedup t-xfce4-terminal-a1 1573.97 -> 1351.46 : 1.16x speedup t-firefox-paintball 847.87 -> 734.50 : 1.15x speedup t-firefox-asteroids 565.99 -> 492.77 : 1.15x speedup t-firefox-canvas-swscroll 1656.87 -> 1447.48 : 1.14x speedup t-midori-zoomed 724.73 -> 642.16 : 1.13x speedup t-firefox-planet-gnome 975.78 -> 911.92 : 1.07x speedup t-chromium-tabs 292.12 -> 274.74 : 1.06x speedup t-firefox-chalkboard 690.78 -> 653.93 : 1.06x speedup t-firefox-talos-gfx 1375.30 -> 1303.74 : 1.05x speedup t-firefox-canvas-alpha 1016.79 -> 967.24 : 1.05x speedup Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: add helper functionsOded Gabbay1-0/+476
This patch adds the following helper functions for reuse of code, hiding BE/LE differences and maintainability. All of the functions were defined as static force_inline. Names were copied from pixman-sse2.c so conversion of fast-paths between sse2 and vmx would be easier from now on. Therefore, I tried to keep the input/output of the functions to be as close as possible to the sse2 definitions. The functions are: - load_128_aligned : load 128-bit from a 16-byte aligned memory address into a vector - load_128_unaligned : load 128-bit from memory into a vector, without guarantee of alignment for the source pointer - save_128_aligned : save 128-bit vector into a 16-byte aligned memory address - create_mask_16_128 : take a 16-bit value and fill with it a new vector - create_mask_1x32_128 : take a 32-bit pointer and fill a new vector with the 32-bit value from that pointer - create_mask_32_128 : take a 32-bit value and fill with it a new vector - unpack_32_1x128 : unpack 32-bit value into a vector - unpacklo_128_16x8 : unpack the eight low 8-bit values of a vector - unpackhi_128_16x8 : unpack the eight high 8-bit values of a vector - unpacklo_128_8x16 : unpack the four low 16-bit values of a vector - unpackhi_128_8x16 : unpack the four high 16-bit values of a vector - unpack_128_2x128 : unpack the eight low 8-bit values of a vector into one vector and the eight high 8-bit values into another vector - unpack_128_2x128_16 : unpack the four low 16-bit values of a vector into one vector and the four high 16-bit values into another vector - unpack_565_to_8888 : unpack an RGB_565 vector to 8888 vector - pack_1x128_32 : pack a vector and return the LSB 32-bit of it - pack_2x128_128 : pack two vectors into one and return it - negate_2x128 : xor two vectors with mask_00ff (separately) - is_opaque : returns whether all the pixels contained in the vector are opaque - is_zero : returns whether the vector equals 0 - is_transparent : returns whether all the pixels contained in the vector are transparent - expand_pixel_8_1x128 : expand an 8-bit pixel into lower 8 bytes of a vector - expand_alpha_1x128 : expand alpha from vector and return the new vector - expand_alpha_2x128 : expand alpha from one vector and another alpha from a second vector - expand_alpha_rev_2x128 : expand a reversed alpha from one vector and another reversed alpha from a second vector - pix_multiply_2x128 : do pix_multiply for two vectors (separately) - over_2x128 : perform over op. on two vectors - in_over_2x128 : perform in-over op. on two vectors v2: removed expand_pixel_32_1x128 as it was not used by any function and its implementation was erroneous Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-16vmx: add LOAD_VECTOR macroOded Gabbay1-26/+24
This patch adds a macro for loading a single vector. It also make the other LOAD_VECTORx macros use this macro as a base so code would be re-used. In addition, I fixed minor coding style issues. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-07-11MIPS: update author's e-mail addressNemanja Lukic4-4/+4
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-07-06lowlevel-blt-bench: add option to skip memcpy measurementPekka Paalanen1-3/+8
The memcpy speed measurement takes several seconds. When you are running single tests in a harness that iterates dozens or hundreds of times, the repeated measurements are redundant and take a lot of time. It is also an open question whether the measured speed changes over long test runs due to unidentified platform reasons (Raspberry Pi). Add a command line option to set the reference memcpy speed, skipping the measuring. The speed is mainly used to compute how many iterations do run inside the bench_*() functions, so for repeated testing on the same hardware, it makes sense to lock that number to a constant. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06lowlevel-blt-bench: add CSV output modePekka Paalanen1-21/+47
Add a command line option for choosing CSV output mode. In CSV mode, only the results in Mpixels/s are printed in an easily machine-parseable format. All user-friendly printing is suppressed. This is intended for cases where you benchmark one particular operation at a time. Running the "all" set of benchmarks will print just fine, but you may have trouble matching rows to operations as you have to look at the tests_tbl[] to see what row is which. Reviewed-by: Ben Avison <bavison@riscosopen.org> v2: don't add a space after comma in CSV. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-06lowlevel-blt-bench: refactor to Mpx_per_sec()Pekka Paalanen1-7/+16
Refactor the Mpixels/s computations into a function. Easier to read and better documents what is being computed. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06lowlevel-blt-bench: all bench funcs to return pix_cntPekka Paalanen1-14/+16
The bench_* functions, that did not already do it, are modified to return the number of pixels processed during the benchmark. This moves the computation to the site that actually determines the number, and simplifies bench_composite() a bit. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06lowlevel-blt-bench: move speed and scaling printingPekka Paalanen1-15/+22
Move the printing of the memory speed and scaling mode into a new function. This will help with implementing a machine-readable output option. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06lowlevel-blt-bench: print single pattern detailsPekka Paalanen1-3/+25
When given just a single test pattern instead of "all", print the test details. This can be used to verify the pattern parser agrees with the user, just like scaling settings are printed. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06lowlevel-blt-bench: make test_entry::testname constPekka Paalanen1-7/+7
We assign string literals to it, so it better be const. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06lowlevel-blt-bench: move explanation printingPekka Paalanen1-27/+33
Move explanation printing to a new function. This will help with implementing a machine-readable output option. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06lowlevel-blt-bench: move usage to a functionPekka Paalanen1-3/+9
Move printing of usage into a new function and use argv[0] as the program name. This will help printing usage from multiple places. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-02vmx: fix pix_multiply for ppc64leOded Gabbay1-0/+21
vec_mergeh/l operates differently for BE and LE, because of the order of the vector elements (l->r in BE and r->l in LE). To fix that, we simply need to swap between the input parameters, in case we are working in LE. v2: - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency - fixed whitespaces and indentation issues Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02vmx: fix unused var warningsOded Gabbay1-31/+58
v2: don't put ';' at the end of macro definition. Instead, move it to each line the macro is used. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02vmx: encapsulate the temporary variables inside the macrosOded Gabbay1-33/+39
v2: fixed whitespaces and indentation issues Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02vmx: adjust macros when loading vectors on ppc64leFernando Seiti Furusato1-0/+25
Replaced usage of vec_lvsl to direct unaligned assignment operation (=). That is because, according to Power ABI Specification, the usage of lvsl is deprecated on ppc64le. Changed COMPUTE_SHIFT_{MASK,MASKS,MASKC} macro usage to no-op for powerpc little endian since unaligned access is supported on ppc64le. v2: - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency - fixed whitespaces and indentation issues Signed-off-by: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02vmx: fix splat_alpha for ppc64leOded Gabbay1-0/+7
The permutation vector isn't correct for LE, so correct its values in case we are in LE mode. v2: - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency - change #ifndef to #ifdef for readability Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-06-01mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH for NORMAL repeatBen Avison3-9/+2
These two architectures were the only place where SIMPLE_NEAREST_SOLID_MASK_FAST_PATH was used, and in both cases the equivalent SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL macro was used immediately afterwards, so including the NORMAL case in the main macro simplifies the fast path table. [Pekka: removed extra comma from the end of SIMPLE_NEAREST_SOLID_MASK_FAST_PATH] Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01mmx/sse2: Use SIMPLE_NEAREST_FAST_PATH macroBen Avison2-32/+8
There is some reordering, but the only significant thing to ensure that the same routine is chosen is that a COVER fast path for a given combination of operator and source/destination pixel formats must precede all the variants of repeated fast paths for the same combination. This patch (and the other mmx/sse2 one) still follows that rule. I believe that in every other case, the set of operations that match any pair of fast paths that are reordered in these patches are mutually exclusive. While there will be a very subtle timing difference due to the distance through the table we have to search to find a match (sometimes faster, sometime slower) there is no evidence that the tables have been carefully ordered by frequency of occurrence - just for ease of copy-and-pasting. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01mips: Retire PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATHBen Avison2-10/+4
This macro does exactly the same thing as the platform-neutral macro SIMPLE_NEAREST_A8_MASK_FAST_PATH. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01arm: Simplify PIXMAN_ARM_SIMPLE_NEAREST_A8_MASK_FAST_PATHBen Avison1-3/+1
This macro is a superset of the platform-neutral macro SIMPLE_NEAREST_A8_MASK_FAST_PATH. In other words, in addition to the _COVER, _NONE and _PAD suffixes, its expansion includes the _NORMAL suffix. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01arm: Retire PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATHBen Avison3-28/+21
This macro does exactly the same thing as the platform-neutral macro SIMPLE_NEAREST_FAST_PATH. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01test: Fix solid-test for big-endian targetsBen Avison1-3/+6
When generating test data, we need to make sure the interpretation of the data is the same regardless of endianess. That is, the pixel value for each channel is the same on both little and big-endians. This fixes a test failure on ppc64 (big-endian). Tested-by: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com> (ppc64le, ppc64, powerpc) Tested-by: Ben Avison <bavison@riscosopen.org> (armv6l, armv7l, i686) [Pekka: added commit message] Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> (x86_64)
2015-05-15test: Add new fuzz tester targeting solid imagesBen Avison2-0/+351
This places a heavier emphasis on solid images than the other fuzz testers, and tests both single-pixel repeating bitmap images as well as those created using pixman_image_create_solid_fill(). In the former case, it also exercises the case where the bitmap contents are written to after the image's first use, which is not a use-case that any other test has previously covered. [Pekka: added the default case to the switch in test_solid ().] Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-05-07MIPS: Drop #ifdef __ELF__ in definition of LEAF_MIPS32R2James Cowgill1-2/+0
Commit 6d2cf40166d8 ("MIPS: Fix exported symbols in public API") attempted to add a .hidden assembly directive, conditional on the code being compiled for an ELF target. Unfortunately the #ifdef added was already inside a macro and wasn't expanded properly by the preprocessor. Fix by removing the check. It's unlikely there are many non-ELF MIPS systems around anyway. Fixes: Bug 83358 (https://bugs.freedesktop.org/83358) Fixes: 6d2cf40166d8 ("MIPS: Fix exported symbols in public API") Signed-off-by: James Cowgill <james410@cowgill.org.uk> Cc: Vicente Olivert Riera <Vincent.Riera@imgtec.com> Cc: Nemanja Lukic <nemanja.lukic@rt-rk.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-05-05test: Added more demos and tests to .gitignore fileBill Spitzak1-40/+5
Uses a wildcard to handle the majority which end in "-test". Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-24test: Add a new benchmarker targeting affine operationsBen Avison3-0/+438
Affine-bench is written by following the example of lowlevel-blt-bench. Affine-bench differs from lowlevel-blt-bench in the following: - does not test different sized operations fitting to specific caches, destination is always 1920x1080 - allows defining the affine transformation parameters - carefully computes operation extents to hit the COVER_CLIP fast paths Original version by Ben Avison. Changes by Pekka in v3: - commit message - style fixes - more comments - refactoring (e.g. bench_info_t) - help output tweak Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-20lowlevel-blt-bench: use a8r8g8b8 for CA solid masksPekka Paalanen1-1/+5
When doing component alpha with a solid mask, use a mask format that has all the color channels instead of just a8. As Ben Avison explains it: "Lowlevel-blt-bench initialises all its images using memset(0xCC) so an a8 solid image would be converted by _pixman_image_get_solid() to 0xCC000000 whereas an a8r8g8b8 would be 0xCCCCCCCC. When you're not in component alpha mode, only the alpha byte matters for the mask image, but in the case of component alpha operations, a fast path might decide that it can save itself a lot of multiplications if it spots that 3 constant mask components are already 0." No (default) test so far has a solid mask with CA. This is just future-proofing lowlevel-blt-bench to do what one would expect. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15lowlevel-blt-bench: use the test pattern parserPekka Paalanen1-20/+44
Let lowlevel-blt-bench parse the test name string from the command line, allowing to run almost infinitely more tests. One is no longer limited to the tests listed in the big table. While you can use the old short-hand names like src_8888_8888, you can also use all possible operators now, and specify pixel formats exactly rather than just x888, for instance. This even allows to run crazy patterns like conjoint_over_reverse_a8b8g8r8_n_r8g8b8x8. All individual patterns are now interpreted through the parser. The pattern "all" runs the same old default test set as before but through the parser instead of the hard-coded parameters. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15lowlevel-blt-bench: add test name parser and self-testPekka Paalanen1-3/+228
This patch is inspired by "lowlevel-blt-bench: Parse test name strings in general case" by Ben Avison. From Ben's commit message: "There are many types of composite operation that are useful to benchmark but which are omitted from the table. Continually having to add extra entries to the table is a nuisance and is prone to human error, so this patch adds the ability to break down unknow strings of the format <operation>_<src>[_<mask]_<dst>[_ca] where bitmap formats are specified by number of bits of each component (assumed in ARGB order) or 'n' to indicate a solid source or mask." Add the parser to lowlevel-blt-bench.c, but do not hook it up to the command line just yet. Instead, make it run a self-test. As we now dynamically parse strings similar to the test names in the huge table 'tests_tbl', we should make sure we can parse the old well-known test names and produce exactly the same test parameters. The self-test goes through this old table and verifies the parsing results. Unfortunately the old table is not exactly consistent, it contains some special cases that cannot be produced by the parsing rules. Whether these special cases are intentional or just an oversight is not always clear. Anyway, add a small table to reproduce the special cases verbatim. If we wanted, we could remove the big old table in a follow-up commit, but then we would also lose the parser self-test. The point of this whole excercise to let lowlevel-blt-bench recognize novel test patterns in the future, following exactly the conventions used in the old table. Ben, from what I see, this parser has one major difference to what you wrote. For a solid mask, your parser uses a8r8g8b8 format, while mine uses a8 which comes from the old table. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15test/utils: add format aliases used by lowlevel-blt-benchPekka Paalanen1-0/+11
Lowlevel-blt-bench uses several pixel format shorthands. Pick them from the great table in lowlevel-blt-bench.c and add them here so that format_from_string() can recognize them. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15test/utils: add operator aliases for lowlevel-blt-benchPekka Paalanen1-0/+4
Lowlevel-blt-bench uses the operator alias "outrev". Add an alias for it in the operator-name table. Also add aliases for overrev, inrev and atoprev, so that lowlevel-blt-bench can later recognize them for new test cases. The aliases are added such, that an operator to name lookup will never return them; it returns the proper names instead. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15test/utils: support format name aliasesPekka Paalanen1-127/+126
Previously there was a flat list of formats, used to iterate over all formats when looking up a format from name or listing them. This cannot support name aliases. To support name aliases (multiple name strings mapping to the same format), create a format-name mapping table. Functions format_name(), format_from_string(), and list_formats() should keep on working exactly like before, except format_from_string() now recognizes the additional formats that format_name() already supported. The only the formats from the old format list are added with ENTRY, so that list_formats() works as before. The whole list is verified against the authoritative list in pixman.h, entries missing from the old list are commented out. The extra formats supported by the old format_name() are added as ALIASes. A side-effect of that is that now also format_from_string() recognizes the following new names: x4c4 / c8, x4g4 / g8, c4, g4, g1, yuy2, yv12, null, solid, pixbuf, rpixbuf, unknown. Name aliases will be useful in follow-up patches, where lowlevel-blt-bench.c is converted to parse short-hand format names from strings. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15test/utils: support operator name aliasesPekka Paalanen1-126/+104
Previously there was a flat list of operators (pixman_op_t), used to iterate over all operators when looking up an operator from name or listing them. This cannot support name aliases. To support name aliases (multiple name strings mapping to the same operator), create an operator-name mapping table. Functions operator_name, operator_from_string, and list_operators should keep on working exactly like before, except operator_from_string now recognizes a few aliases too. Name aliases will be useful in follow-up patches, where lowlevel-blt-bench.c is converted to parse operator names from strings. Lowlevel-blt-bench uses shorthand names instead of the usual names. This change allows lowlevel-blt-bench.s to use operator_from_string in the future. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-13test: Move format and operator string functions to utils.[ch]Ben Avison3-192/+205
This permits format_from_string(), list_formats(), list_operators() and operator_from_string() to be used from tests other than check-formats. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-09pixman.c: Coding styleBen Avison1-8/+10
A few violations of coding style were identified in code copied from here into affine-bench. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-01armv6: Fix typo in preload macroBen Avison1-2/+2
Missing "lsl" meant that cases with a 32-bit source and/or mask, and an 8-bit destination, the code would not assemble.
2014-10-24mmx: Fix _mm_empty problems for over_8888_8888/over_8888_n_8888Siarhei Siamashka1-0/+6
Using "--disable-sse2 --disable-ssse3" configure options and CFLAGS="-m32 -O2 -g" on an x86 system results in pixman "make check" failures: ../test-driver: line 95: 29874 Aborted FAIL: affine-test ../test-driver: line 95: 29887 Aborted FAIL: scaling-test One _mm_empty () was missing and another one is needed to workaround an old GCC bug https://gcc.gnu.org/PR47759 (GCC may move MMX instructions around and cause test suite failures). Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-10-05Fix comment about BILINEAR_INTERPOLATION_BITS to say < 8 rather than <= 8Søren Sandmann Pedersen2-1/+3
Since a4c79d695d52c94647b1aff7 the constant BILINEAR_INTERPOLATION_BITS must be strictly less than 8, so fix the comment to say this, and also add a COMPILE_TIME_ASSERT in the bilinear fetcher in pixman-fast-path.c