~siamashka/pixman - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2015-07-11	MIPS: update author's e-mail address	Nemanja Lukic	4	-4/+4
	Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2015-07-06	lowlevel-blt-bench: add option to skip memcpy measurement	Pekka Paalanen	1	-3/+8
	The memcpy speed measurement takes several seconds. When you are running single tests in a harness that iterates dozens or hundreds of times, the repeated measurements are redundant and take a lot of time. It is also an open question whether the measured speed changes over long test runs due to unidentified platform reasons (Raspberry Pi). Add a command line option to set the reference memcpy speed, skipping the measuring. The speed is mainly used to compute how many iterations do run inside the bench_*() functions, so for repeated testing on the same hardware, it makes sense to lock that number to a constant. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06	lowlevel-blt-bench: add CSV output mode	Pekka Paalanen	1	-21/+47
	Add a command line option for choosing CSV output mode. In CSV mode, only the results in Mpixels/s are printed in an easily machine-parseable format. All user-friendly printing is suppressed. This is intended for cases where you benchmark one particular operation at a time. Running the "all" set of benchmarks will print just fine, but you may have trouble matching rows to operations as you have to look at the tests_tbl[] to see what row is which. Reviewed-by: Ben Avison <bavison@riscosopen.org> v2: don't add a space after comma in CSV. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-06	lowlevel-blt-bench: refactor to Mpx_per_sec()	Pekka Paalanen	1	-7/+16
	Refactor the Mpixels/s computations into a function. Easier to read and better documents what is being computed. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06	lowlevel-blt-bench: all bench funcs to return pix_cnt	Pekka Paalanen	1	-14/+16
	The bench_* functions, that did not already do it, are modified to return the number of pixels processed during the benchmark. This moves the computation to the site that actually determines the number, and simplifies bench_composite() a bit. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06	lowlevel-blt-bench: move speed and scaling printing	Pekka Paalanen	1	-15/+22
	Move the printing of the memory speed and scaling mode into a new function. This will help with implementing a machine-readable output option. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06	lowlevel-blt-bench: print single pattern details	Pekka Paalanen	1	-3/+25
	When given just a single test pattern instead of "all", print the test details. This can be used to verify the pattern parser agrees with the user, just like scaling settings are printed. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06	lowlevel-blt-bench: make test_entry::testname const	Pekka Paalanen	1	-7/+7
	We assign string literals to it, so it better be const. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06	lowlevel-blt-bench: move explanation printing	Pekka Paalanen	1	-27/+33
	Move explanation printing to a new function. This will help with implementing a machine-readable output option. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-06	lowlevel-blt-bench: move usage to a function	Pekka Paalanen	1	-3/+9
	Move printing of usage into a new function and use argv[0] as the program name. This will help printing usage from multiple places. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-07-02	vmx: fix pix_multiply for ppc64le	Oded Gabbay	1	-0/+21
	vec_mergeh/l operates differently for BE and LE, because of the order of the vector elements (l->r in BE and r->l in LE). To fix that, we simply need to swap between the input parameters, in case we are working in LE. v2: - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency - fixed whitespaces and indentation issues Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02	vmx: fix unused var warnings	Oded Gabbay	1	-31/+58
	v2: don't put ';' at the end of macro definition. Instead, move it to each line the macro is used. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02	vmx: encapsulate the temporary variables inside the macros	Oded Gabbay	1	-33/+39
	v2: fixed whitespaces and indentation issues Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02	vmx: adjust macros when loading vectors on ppc64le	Fernando Seiti Furusato	1	-0/+25
	Replaced usage of vec_lvsl to direct unaligned assignment operation (=). That is because, according to Power ABI Specification, the usage of lvsl is deprecated on ppc64le. Changed COMPUTE_SHIFT_{MASK,MASKS,MASKC} macro usage to no-op for powerpc little endian since unaligned access is supported on ppc64le. v2: - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency - fixed whitespaces and indentation issues Signed-off-by: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-07-02	vmx: fix splat_alpha for ppc64le	Oded Gabbay	1	-0/+7
	The permutation vector isn't correct for LE, so correct its values in case we are in LE mode. v2: - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency - change #ifndef to #ifdef for readability Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-06-01	mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH for NORMAL repeat	Ben Avison	3	-9/+2
	These two architectures were the only place where SIMPLE_NEAREST_SOLID_MASK_FAST_PATH was used, and in both cases the equivalent SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL macro was used immediately afterwards, so including the NORMAL case in the main macro simplifies the fast path table. [Pekka: removed extra comma from the end of SIMPLE_NEAREST_SOLID_MASK_FAST_PATH] Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01	mmx/sse2: Use SIMPLE_NEAREST_FAST_PATH macro	Ben Avison	2	-32/+8
	There is some reordering, but the only significant thing to ensure that the same routine is chosen is that a COVER fast path for a given combination of operator and source/destination pixel formats must precede all the variants of repeated fast paths for the same combination. This patch (and the other mmx/sse2 one) still follows that rule. I believe that in every other case, the set of operations that match any pair of fast paths that are reordered in these patches are mutually exclusive. While there will be a very subtle timing difference due to the distance through the table we have to search to find a match (sometimes faster, sometime slower) there is no evidence that the tables have been carefully ordered by frequency of occurrence - just for ease of copy-and-pasting. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01	mips: Retire PIXMAN_MIPS_SIMPLE_NEAREST_A8_MASK_FAST_PATH	Ben Avison	2	-10/+4
	This macro does exactly the same thing as the platform-neutral macro SIMPLE_NEAREST_A8_MASK_FAST_PATH. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01	arm: Simplify PIXMAN_ARM_SIMPLE_NEAREST_A8_MASK_FAST_PATH	Ben Avison	1	-3/+1
	This macro is a superset of the platform-neutral macro SIMPLE_NEAREST_A8_MASK_FAST_PATH. In other words, in addition to the _COVER, _NONE and _PAD suffixes, its expansion includes the _NORMAL suffix. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01	arm: Retire PIXMAN_ARM_SIMPLE_NEAREST_FAST_PATH	Ben Avison	3	-28/+21
	This macro does exactly the same thing as the platform-neutral macro SIMPLE_NEAREST_FAST_PATH. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-06-01	test: Fix solid-test for big-endian targets	Ben Avison	1	-3/+6
	When generating test data, we need to make sure the interpretation of the data is the same regardless of endianess. That is, the pixel value for each channel is the same on both little and big-endians. This fixes a test failure on ppc64 (big-endian). Tested-by: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com> (ppc64le, ppc64, powerpc) Tested-by: Ben Avison <bavison@riscosopen.org> (armv6l, armv7l, i686) [Pekka: added commit message] Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> (x86_64)
2015-05-15	test: Add new fuzz tester targeting solid images	Ben Avison	2	-0/+351
	This places a heavier emphasis on solid images than the other fuzz testers, and tests both single-pixel repeating bitmap images as well as those created using pixman_image_create_solid_fill(). In the former case, it also exercises the case where the bitmap contents are written to after the image's first use, which is not a use-case that any other test has previously covered. [Pekka: added the default case to the switch in test_solid ().] Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-05-07	MIPS: Drop #ifdef __ELF__ in definition of LEAF_MIPS32R2	James Cowgill	1	-2/+0
	Commit 6d2cf40166d8 ("MIPS: Fix exported symbols in public API") attempted to add a .hidden assembly directive, conditional on the code being compiled for an ELF target. Unfortunately the #ifdef added was already inside a macro and wasn't expanded properly by the preprocessor. Fix by removing the check. It's unlikely there are many non-ELF MIPS systems around anyway. Fixes: Bug 83358 (https://bugs.freedesktop.org/83358) Fixes: 6d2cf40166d8 ("MIPS: Fix exported symbols in public API") Signed-off-by: James Cowgill <james410@cowgill.org.uk> Cc: Vicente Olivert Riera <Vincent.Riera@imgtec.com> Cc: Nemanja Lukic <nemanja.lukic@rt-rk.com> Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-05-05	test: Added more demos and tests to .gitignore file	Bill Spitzak	1	-40/+5
	Uses a wildcard to handle the majority which end in "-test". Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-24	test: Add a new benchmarker targeting affine operations	Ben Avison	3	-0/+438
	Affine-bench is written by following the example of lowlevel-blt-bench. Affine-bench differs from lowlevel-blt-bench in the following: - does not test different sized operations fitting to specific caches, destination is always 1920x1080 - allows defining the affine transformation parameters - carefully computes operation extents to hit the COVER_CLIP fast paths Original version by Ben Avison. Changes by Pekka in v3: - commit message - style fixes - more comments - refactoring (e.g. bench_info_t) - help output tweak Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-20	lowlevel-blt-bench: use a8r8g8b8 for CA solid masks	Pekka Paalanen	1	-1/+5
	When doing component alpha with a solid mask, use a mask format that has all the color channels instead of just a8. As Ben Avison explains it: "Lowlevel-blt-bench initialises all its images using memset(0xCC) so an a8 solid image would be converted by _pixman_image_get_solid() to 0xCC000000 whereas an a8r8g8b8 would be 0xCCCCCCCC. When you're not in component alpha mode, only the alpha byte matters for the mask image, but in the case of component alpha operations, a fast path might decide that it can save itself a lot of multiplications if it spots that 3 constant mask components are already 0." No (default) test so far has a solid mask with CA. This is just future-proofing lowlevel-blt-bench to do what one would expect. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15	lowlevel-blt-bench: use the test pattern parser	Pekka Paalanen	1	-20/+44
	Let lowlevel-blt-bench parse the test name string from the command line, allowing to run almost infinitely more tests. One is no longer limited to the tests listed in the big table. While you can use the old short-hand names like src_8888_8888, you can also use all possible operators now, and specify pixel formats exactly rather than just x888, for instance. This even allows to run crazy patterns like conjoint_over_reverse_a8b8g8r8_n_r8g8b8x8. All individual patterns are now interpreted through the parser. The pattern "all" runs the same old default test set as before but through the parser instead of the hard-coded parameters. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15	lowlevel-blt-bench: add test name parser and self-test	Pekka Paalanen	1	-3/+228
	This patch is inspired by "lowlevel-blt-bench: Parse test name strings in general case" by Ben Avison. From Ben's commit message: "There are many types of composite operation that are useful to benchmark but which are omitted from the table. Continually having to add extra entries to the table is a nuisance and is prone to human error, so this patch adds the ability to break down unknow strings of the format <operation>_<src>[_<mask]_<dst>[_ca] where bitmap formats are specified by number of bits of each component (assumed in ARGB order) or 'n' to indicate a solid source or mask." Add the parser to lowlevel-blt-bench.c, but do not hook it up to the command line just yet. Instead, make it run a self-test. As we now dynamically parse strings similar to the test names in the huge table 'tests_tbl', we should make sure we can parse the old well-known test names and produce exactly the same test parameters. The self-test goes through this old table and verifies the parsing results. Unfortunately the old table is not exactly consistent, it contains some special cases that cannot be produced by the parsing rules. Whether these special cases are intentional or just an oversight is not always clear. Anyway, add a small table to reproduce the special cases verbatim. If we wanted, we could remove the big old table in a follow-up commit, but then we would also lose the parser self-test. The point of this whole excercise to let lowlevel-blt-bench recognize novel test patterns in the future, following exactly the conventions used in the old table. Ben, from what I see, this parser has one major difference to what you wrote. For a solid mask, your parser uses a8r8g8b8 format, while mine uses a8 which comes from the old table. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15	test/utils: add format aliases used by lowlevel-blt-bench	Pekka Paalanen	1	-0/+11
	Lowlevel-blt-bench uses several pixel format shorthands. Pick them from the great table in lowlevel-blt-bench.c and add them here so that format_from_string() can recognize them. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15	test/utils: add operator aliases for lowlevel-blt-bench	Pekka Paalanen	1	-0/+4
	Lowlevel-blt-bench uses the operator alias "outrev". Add an alias for it in the operator-name table. Also add aliases for overrev, inrev and atoprev, so that lowlevel-blt-bench can later recognize them for new test cases. The aliases are added such, that an operator to name lookup will never return them; it returns the proper names instead. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15	test/utils: support format name aliases	Pekka Paalanen	1	-127/+126
	Previously there was a flat list of formats, used to iterate over all formats when looking up a format from name or listing them. This cannot support name aliases. To support name aliases (multiple name strings mapping to the same format), create a format-name mapping table. Functions format_name(), format_from_string(), and list_formats() should keep on working exactly like before, except format_from_string() now recognizes the additional formats that format_name() already supported. The only the formats from the old format list are added with ENTRY, so that list_formats() works as before. The whole list is verified against the authoritative list in pixman.h, entries missing from the old list are commented out. The extra formats supported by the old format_name() are added as ALIASes. A side-effect of that is that now also format_from_string() recognizes the following new names: x4c4 / c8, x4g4 / g8, c4, g4, g1, yuy2, yv12, null, solid, pixbuf, rpixbuf, unknown. Name aliases will be useful in follow-up patches, where lowlevel-blt-bench.c is converted to parse short-hand format names from strings. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15	test/utils: support operator name aliases	Pekka Paalanen	1	-126/+104
	Previously there was a flat list of operators (pixman_op_t), used to iterate over all operators when looking up an operator from name or listing them. This cannot support name aliases. To support name aliases (multiple name strings mapping to the same operator), create an operator-name mapping table. Functions operator_name, operator_from_string, and list_operators should keep on working exactly like before, except operator_from_string now recognizes a few aliases too. Name aliases will be useful in follow-up patches, where lowlevel-blt-bench.c is converted to parse operator names from strings. Lowlevel-blt-bench uses shorthand names instead of the usual names. This change allows lowlevel-blt-bench.s to use operator_from_string in the future. Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-13	test: Move format and operator string functions to utils.[ch]	Ben Avison	3	-192/+205
	This permits format_from_string(), list_formats(), list_operators() and operator_from_string() to be used from tests other than check-formats. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-09	pixman.c: Coding style	Ben Avison	1	-8/+10
	A few violations of coding style were identified in code copied from here into affine-bench. Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-01	armv6: Fix typo in preload macro	Ben Avison	1	-2/+2
	Missing "lsl" meant that cases with a 32-bit source and/or mask, and an 8-bit destination, the code would not assemble.
2014-10-24	mmx: Fix _mm_empty problems for over_8888_8888/over_8888_n_8888	Siarhei Siamashka	1	-0/+6
	Using "--disable-sse2 --disable-ssse3" configure options and CFLAGS="-m32 -O2 -g" on an x86 system results in pixman "make check" failures: ../test-driver: line 95: 29874 Aborted FAIL: affine-test ../test-driver: line 95: 29887 Aborted FAIL: scaling-test One _mm_empty () was missing and another one is needed to workaround an old GCC bug https://gcc.gnu.org/PR47759 (GCC may move MMX instructions around and cause test suite failures). Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-10-05	Fix comment about BILINEAR_INTERPOLATION_BITS to say < 8 rather than <= 8	Søren Sandmann Pedersen	2	-1/+3
	Since a4c79d695d52c94647b1aff7 the constant BILINEAR_INTERPOLATION_BITS must be strictly less than 8, so fix the comment to say this, and also add a COMPILE_TIME_ASSERT in the bilinear fetcher in pixman-fast-path.c
2014-09-05	mmx: Add nearest over_8888_8888	Matt Turner	1	-0/+57
	lowlevel-blt-bench -n, over_8888_8888, 15 iterations on Loongson 2f: Before After Mean StdDev Mean StdDev Change L1 15.8 0.02 24.0 0.06 +52.0% L2 14.8 0.15 23.3 0.13 +56.9% M 10.3 0.01 13.8 0.03 +33.6% HT 10.0 0.02 14.5 0.05 +44.7% VT 9.7 0.02 13.5 0.04 +39.2% R 9.1 0.01 12.2 0.04 +34.4% RT 7.1 0.06 8.9 0.09 +25.2%
2014-09-05	mmx: Add nearest over_8888_n_8888	Matt Turner	1	-0/+62
	lowlevel-blt-bench -n, over_8888_n_8888, 15 iterations on Loongson 2f: Before After Mean StdDev Mean StdDev Change L1 9.7 0.01 19.2 0.02 +98.2% L2 9.6 0.11 19.2 0.16 +99.5% M 7.3 0.02 12.5 0.01 +72.0% HT 6.6 0.01 13.4 0.02 +103.2% VT 6.4 0.01 12.6 0.03 +96.1% R 6.3 0.01 11.2 0.01 +76.5% RT 4.4 0.01 8.1 0.03 +82.6%
2014-07-03	MIPS: Fix exported symbols in public API.	Nemanja Lukic	1	-0/+3

2014-06-28	test: Rearrange tests in order of increasing runtime	Søren Sandmann Pedersen	1	-30/+30
	Making short tests run first is convenient to catch obvious bugs early.
2014-05-15	pixman-gradient-walker: Make left_x and right_x 64 bit variables	Søren Sandmann Pedersen	2	-3/+3
	The variables left_x, and right_x in gradient_walker_reset() are computed from pos, which is a 64 bit quantity, so to avoid overflows, these variables must be 64 bit as well. Similarly, the left_x and right_x that are stored in pixman_gradient_walker_t need to be 64 bit as well; otherwise, pixman_gradient_walker_pixel() will call reset too often. This fixes the radial-invalid test, which was generating 'invalid' floating point exceptions when the overflows caused color values to be outside of [0, 255].
2014-05-15	test: Add radial-invalid test program	Søren Sandmann Pedersen	4	-0/+66
	This program demonstrates a bug in gradient walker, where some integer overflows cause colors outside the range [0, 255] to be generated, which in turns cause 'invalid' floating point exceptions when those colors are converted to uint8_t. The bug was first reported by Owen Taylor on the #cairo IRC channel.
2014-05-01	ARMv6: Add fast path for src_x888_0565	Ben Avison	2	-0/+84
	Benchmark results, "before" is upstream/master 5f661ee719be25c3aa0eb0d45e0db23a37e76468, and "after" contains this patch on top. lowlevel-blt-bench, src_8888_0565, 100 iterations: Before After Mean StdDev Mean StdDev Confidence Change L1 25.9 0.20 115.6 0.70 100.00% +347.1% L2 14.4 0.23 52.7 3.48 100.00% +265.0% M 14.1 0.01 79.8 0.17 100.00% +465.9% HT 10.2 0.03 32.9 0.31 100.00% +221.2% VT 9.8 0.03 29.8 0.25 100.00% +203.4% R 9.4 0.03 27.8 0.18 100.00% +194.7% RT 4.6 0.04 10.9 0.29 100.00% +135.9% At most 19 outliers rejected per test per set. cairo-perf-trace with trimmed traces results were indifferent. A system-wide perf_3.10 profile on Raspbian shows significant differences in the X server CPU usage. The following were measured from a 130x62 char lxterminal running 'dmesg' every 0.5 seconds for roughly 30 seconds. These profiles are libpixman.so symbols only. Before: Samples: 63K of event 'cpu-clock', Event count (approx.): 2941348112, DSO: libpixman-1.so.0.33.1 37.77% Xorg [.] fast_fetch_r5g6b5 14.39% Xorg [.] pixman_composite_over_n_8_8888_asm_armv6 8.51% Xorg [.] fast_write_back_r5g6b5 7.38% Xorg [.] pixman_composite_src_8888_8888_asm_armv6 4.39% Xorg [.] pixman_composite_add_8_8_asm_armv6 3.69% Xorg [.] pixman_composite_src_n_8888_asm_armv6 2.53% Xorg [.] _pixman_image_validate 2.35% Xorg [.] pixman_image_composite32 After: Samples: 31K of event 'cpu-clock', Event count (approx.): 3619782704, DSO: libpixman-1.so.0.33.1 22.36% Xorg [.] pixman_composite_over_n_8_8888_asm_armv6 13.59% Xorg [.] pixman_composite_src_x888_0565_asm_armv6 12.75% Xorg [.] pixman_composite_src_8888_8888_asm_armv6 6.79% Xorg [.] pixman_composite_add_8_8_asm_armv6 5.95% Xorg [.] pixman_composite_src_n_8888_asm_armv6 4.12% Xorg [.] pixman_image_composite32 3.69% Xorg [.] _pixman_image_validate 3.65% Xorg [.] _pixman_bits_image_setup_accessors Before, fast_fetch_r5g6b5 + fast_write_back_r5g6b5 took 46% of the samples in libpixman, and probably incurred some memcpy() load, too. After, pixman_composite_src_x888_0565_asm_armv6 takes 14%. Note, that the sample counts are very different before/after, as less time is spent in Pixman and running time is not exactly the same. Furthermore, in the above test, the CPU idle function was sampled 9% before, and 15% after. v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> : Re-benchmarked on Raspberry Pi, commit message.
2014-04-21	ARM: use pixman_asm_function in internal headers	Pekka Paalanen	2	-24/+5
	The two ARM headers contained open-coded copies of pixman_asm_function, replace these. Since it seems customary that ARM headers do not use CPP include guards, rely on the .S files to #include "pixman-arm-asm.h" first. They all do now. v2: Fix a build failure on rpi by adding one #include.
2014-04-21	ARMv6: Add fast path for in_reverse_8888_8888	Ben Avison	2	-0/+110
	Benchmark results, "before" is the patch * upstream/master 4b76bbfda670f9ede67d0449f3640605e1fc4df0 + ARMv6: Support for very variable-hungry composite operations + ARMv6: Add fast path for over_n_8888_8888_ca and "after" contains the additional patches on top: + ARMv6: Add fast path flag to force no preload of destination buffer + ARMv6: Add fast path for in_reverse_8888_8888 (this patch) lowlevel-blt-bench, in_reverse_8888_8888, 100 iterations: Before After Mean StdDev Mean StdDev Confidence Change L1 21.1 0.07 32.3 0.08 100.00% +52.9% L2 11.6 0.29 18.0 0.52 100.00% +54.4% M 10.5 0.01 16.1 0.03 100.00% +54.1% HT 8.2 0.02 12.0 0.04 100.00% +45.9% VT 8.1 0.02 11.7 0.04 100.00% +44.5% R 8.1 0.02 11.3 0.04 100.00% +39.7% RT 4.8 0.04 6.1 0.09 100.00% +27.3% At most 12 outliers rejected per test per set. cairo-perf-trace with trimmed traces, 30 iterations: Before After Mean StdDev Mean StdDev Confidence Change t-firefox-paintball.trace 18.0 0.01 14.1 0.01 100.00% +27.4% t-firefox-chalkboard.trace 36.7 0.03 36.0 0.02 100.00% +1.9% t-firefox-canvas-alpha.trace 20.7 0.22 20.3 0.22 100.00% +1.9% t-swfdec-youtube.trace 7.8 0.03 7.8 0.03 100.00% +0.9% t-firefox-talos-gfx.trace 25.8 0.44 25.6 0.29 93.87% +0.7% (insignificant) t-firefox-talos-svg.trace 20.6 0.04 20.6 0.03 100.00% +0.2% t-firefox-fishbowl.trace 21.2 0.04 21.1 0.02 100.00% +0.2% t-xfce4-terminal-a1.trace 4.8 0.01 4.8 0.01 98.85% +0.2% (insignificant) t-swfdec-giant-steps.trace 14.9 0.03 14.9 0.02 99.99% +0.2% t-poppler-reseau.trace 22.4 0.11 22.4 0.08 86.52% +0.2% (insignificant) t-gnome-system-monitor.trace 17.3 0.03 17.2 0.03 99.74% +0.2% t-firefox-scrolling.trace 24.8 0.12 24.8 0.11 70.15% +0.1% (insignificant) t-firefox-particles.trace 27.5 0.18 27.5 0.21 48.33% +0.1% (insignificant) t-grads-heat-map.trace 4.4 0.04 4.4 0.04 16.61% +0.0% (insignificant) t-firefox-fishtank.trace 13.2 0.01 13.2 0.01 7.64% +0.0% (insignificant) t-firefox-canvas.trace 18.0 0.05 18.0 0.05 1.31% -0.0% (insignificant) t-midori-zoomed.trace 8.0 0.01 8.0 0.01 78.22% -0.0% (insignificant) t-firefox-planet-gnome.trace 10.9 0.02 10.9 0.02 64.81% -0.0% (insignificant) t-gvim.trace 33.2 0.21 33.2 0.18 38.61% -0.1% (insignificant) t-firefox-canvas-swscroll.trace 32.2 0.09 32.2 0.11 73.17% -0.1% (insignificant) t-firefox-asteroids.trace 11.1 0.01 11.1 0.01 100.00% -0.2% t-evolution.trace 13.0 0.05 13.0 0.05 91.99% -0.2% (insignificant) t-gnome-terminal-vim.trace 19.9 0.14 20.0 0.14 97.38% -0.4% (insignificant) t-poppler.trace 9.8 0.06 9.8 0.04 99.91% -0.5% t-chromium-tabs.trace 4.9 0.02 4.9 0.02 100.00% -0.6% At most 6 outliers rejected per test per set. Cairo perf reports the running time, but the change is computed for operations per second instead (inverse of running time). Confidence is based on Welch's t-test. Absolute changes less than 1% can be accounted as measurement errors, even if statistically significant. There was a question of why FLAG_NO_PRELOAD_DST is used. It makes lowlevel-blt-bench results worse except for L1, but improves some Cairo trace benchmarks. "Ben Avison" <bavison@riscosopen.org> wrote: > The thing with the lowlevel-blt-bench benchmarks for the more > sophisticated composite types (as a general rule, anything that involves > branches at the per-pixel level) is that they are only profiling the case > where you have mid-level alpha values in the source/mask/destination. > Real-world images typically have a disproportionate number of fully > opaque and fully transparent pixels, which is why when there's a > discrepancy between which implementation performs best with cairo-perf > trace versus lowlevel-blt-bench, I usually favour the Cairo winner. > > The results of removing FLAG_NO_PRELOAD_DST (in other words, adding > preload of the destination buffer) are easy to explain in the > lowlevel-blt-bench results. In the L1 case, the destination buffer is > already in the L1 cache, so adding the preloads is simply adding extra > instruction cycles that have no effect on memory operations. The "in" > compositing operator depends upon the alpha of both source and > destination, so if you use uniform mid-alpha, then you actually do need > to read your destination pixels, so you benefit from preloading them. But > for fully opaque or fully transparent source pixels, you don't need to > read the corresponding destination pixel - it'll either be left alone or > overwritten. Since the ARM11 doesn't use write-allocate cacheing, both of > these cases avoid both the time taken to load the extra cachelines, as > well as increasing the efficiency of the cache for other data. If you > examine the source images being used by the Cairo test, you'll probably > find they mostly use transparent or opaque pixels. v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> : Rebased, re-benchmarked on Raspberry Pi, commit message. v5, Pekka Paalanen <pekka.paalanen@collabora.co.uk> : Rebased, re-benchmarked on Raspberry Pi due to a fix to "ARMv6: Add fast path for over_n_8888_8888_ca" patch.
2014-04-21	ARMv6: Add fast path flag to force no preload of destination buffer	Ben Avison	1	-1/+13

2014-04-21	ARMv6: Add fast path for over_n_8888_8888_ca	Ben Avison	3	-2/+283
	Benchmark results, "before" is * upstream/master 4b76bbfda670f9ede67d0449f3640605e1fc4df0 "after" contains the additional patches on top: + ARMv6: Support for very variable-hungry composite operations + ARMv6: Add fast path for over_n_8888_8888_ca (this patch) lowlevel-blt-bench, over_n_8888_8888_ca, 100 iterations: Before After Mean StdDev Mean StdDev Confidence Change L1 2.7 0.00 16.1 0.06 100.00% +500.7% L2 2.4 0.01 14.1 0.15 100.00% +489.9% M 2.3 0.00 14.3 0.01 100.00% +510.2% HT 2.2 0.00 9.7 0.03 100.00% +345.0% VT 2.2 0.00 9.4 0.02 100.00% +333.4% R 2.2 0.01 9.5 0.03 100.00% +331.6% RT 1.9 0.01 5.5 0.07 100.00% +192.7% At most 1 outliers rejected per test per set. cairo-perf-trace with trimmed traces, 30 iterations: Before After Mean StdDev Mean StdDev Confidence Change t-firefox-talos-gfx.trace 33.1 0.42 25.8 0.44 100.00% +28.6% t-firefox-scrolling.trace 31.4 0.11 24.8 0.12 100.00% +26.3% t-gnome-terminal-vim.trace 22.4 0.10 19.9 0.14 100.00% +12.5% t-evolution.trace 13.9 0.07 13.0 0.05 100.00% +6.5% t-firefox-planet-gnome.trace 11.6 0.02 10.9 0.02 100.00% +6.5% t-gvim.trace 34.0 0.21 33.2 0.21 100.00% +2.4% t-chromium-tabs.trace 4.9 0.02 4.9 0.02 100.00% +1.0% t-poppler.trace 9.8 0.05 9.8 0.06 100.00% +0.7% t-firefox-canvas-swscroll.trace 32.3 0.10 32.2 0.09 100.00% +0.4% t-firefox-paintball.trace 18.1 0.01 18.0 0.01 100.00% +0.3% t-poppler-reseau.trace 22.5 0.09 22.4 0.11 99.29% +0.3% t-firefox-canvas.trace 18.1 0.06 18.0 0.05 99.29% +0.2% t-xfce4-terminal-a1.trace 4.8 0.01 4.8 0.01 99.77% +0.2% t-firefox-fishbowl.trace 21.2 0.03 21.2 0.04 100.00% +0.2% t-gnome-system-monitor.trace 17.3 0.03 17.3 0.03 99.54% +0.1% t-firefox-asteroids.trace 11.1 0.01 11.1 0.01 100.00% +0.1% t-midori-zoomed.trace 8.0 0.01 8.0 0.01 99.98% +0.1% t-grads-heat-map.trace 4.4 0.04 4.4 0.04 34.08% +0.1% (insignificant) t-firefox-talos-svg.trace 20.6 0.03 20.6 0.04 54.06% +0.0% (insignificant) t-firefox-fishtank.trace 13.2 0.01 13.2 0.01 52.81% -0.0% (insignificant) t-swfdec-giant-steps.trace 14.9 0.02 14.9 0.03 85.50% -0.1% (insignificant) t-firefox-chalkboard.trace 36.6 0.02 36.7 0.03 100.00% -0.2% t-firefox-canvas-alpha.trace 20.7 0.32 20.7 0.22 55.76% -0.3% (insignificant) t-swfdec-youtube.trace 7.8 0.02 7.8 0.03 100.00% -0.5% t-firefox-particles.trace 27.4 0.16 27.5 0.18 99.94% -0.6% At most 4 outliers rejected per test per set. Cairo perf reports the running time, but the change is computed for operations per second instead (inverse of running time). Confidence is based on Welch's t-test. Absolute changes less than 1% can be accounted as measurement errors, even if statistically significant. v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> : Use pixman_asm_function instead of startfunc. Rebased. Re-benchmarked on Raspberry Pi. Commit message. v5, Ben Avison <bavison@riscosopen.org> : Fixed the bug exposed in blitters-test 4928372. 15 hours of testing, compared to the 45 minutes to hit the bug originally. Pekka Paalanen <pekka.paalanen@collabora.co.uk> : Squash the fix, re-benchmark on Raspberry Pi.
2014-04-21	ARMv6: Support for very variable-hungry composite operations	Ben Avison	1	-3/+53
	Previously, the variable ARGS_STACK_OFFSET was available to extract values from function arguments during the init macro. Now this changes dynamically around stack operations in the function as a whole so that arguments can be accessed at any point. It is also joined by LOCALS_STACK_OFFSET, which allows access to space reserved on the stack during the init macro. On top of this, composite macros now have the option of using all of WK0-WK3 registers rather than just the subset it was told to use; this requires the pixel count to be spilled to the stack over the leading pixels at the start of each line. Thus, at best, each composite operation can use 11 registers, plus any pointer registers not required for the composite type, plus as much stack space as it needs, divided up into constants and variables as necessary.
2014-04-15	create_bits(): Cast the result of height * stride to size_t	Søren Sandmann	1	-1/+1
	In create_bits() both height and stride are ints, so the result is also an int, which will overflow if height or stride are big enough and size_t is bigger than int. This patch simply casts height to size_t to prevent these overflows, which prevents the crash in: https://bugzilla.redhat.com/show_bug.cgi?id=972647 It's not even close to fixing the full problem of supporting big images in pixman. See also https://bugs.freedesktop.org/show_bug.cgi?id=69014