summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-04-05Revert "armv7: Retire old bilinear fast paths"20160405-arm-neon-release1-from-bavisonSiarhei Siamashka4-4/+2042
This reverts commit 70680f7c4f07e1b2c96a28be1f03be6a447d0b60. The environment variable PIXMAN_DISABLE=wholeops still allows to use separable bilinear scaling iterators in benchmarks.
2016-04-05lowlevel-blt-bench: horizontal/vertical variants of bilinear scalingSiarhei Siamashka1-10/+57
This patch adds the 'v' and 'h' modifiers to the command line parsing logic, which can be used together with the '-b' option. They enforce vertical-only or horizontal-only special cases of interpolation when running the bilinear scaling benchmark. The optimized implementations may have special shortcuts for doing only vertical or only horizontal scaling. This change allows to do benchmarking for these code paths. Also instead of just a minimal nudge to the x-axis scaling coefficient, apply a more sizeable nudge to the x- or y-axis translation coefficients. With the older matrix variant, a clever hack in the optimized code could be able to deduct that the matrix is in fact indistinguishable from the identity tranformation. Which would be an undesired effect and an opportunity to 'rig' benchmark scores. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
2015-10-15pixman-fast-path: Add fast path for "pad" type repeatsBen Avison3-0/+371
Similar in concept to fast_composite_tiled_repeat(), this breaks up any unscaled composites, where source/mask areas outside the bitmap grid are not clipped, into a series of simpler composites (either bitmap to bitmap or solid to bitmap). These simpler composites are usually likely to match existing fast path implementations, and so should benefit all platforms. This produces some significant speedups for some cairo-perf-trace tests. For example, timings on ARMv6 (using Siarhei's trimmed traces) are Before: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.29.3 [ 0] image t-firefox-chalkboard 35.715 35.736 0.03% 6/6 After: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.29.3 [ 0] image t-firefox-chalkboard 9.254 9.261 0.15% 6/6 That's a speedup of 3.86x. Also added a simple test program to check different repeat types.
2015-10-15Resolve implementation-defined behaviour for division rounded to -infinityBen Avison1-4/+4
The previous implementations of DIV and MOD relied upon the built-in / and % operators performing round-to-zero. This is true for C99, but rounding is implementation-defined for C89 when divisor and/or dividend is negative, and I believe Pixman is still supposed to support C89.
2015-10-15armv7: Retire old bilinear fast pathsBen Avison4-2042/+4
The new system of bilinear fetchers outperforms the old fast paths by a significant margin in nearly all cases. Since general_composite_rect() is capable of synthesising any fast path using fetchers, it is advantageous to simply remove all the old fast path code. This also simplifies the code base considerably. Benchmarks on Cortex-A7 (lowlevel-blt-bench -b) for the whole set of affected operations follow: src_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 35.9 0.03 65.2 1.43 100.00% +81.6% L2 35.6 0.17 62.7 0.65 100.00% +76.3% M 34.7 0.02 53.3 0.29 100.00% +53.7% HT 26.5 0.14 33.1 0.21 100.00% +24.7% VT 24.8 0.13 26.6 0.38 100.00% +7.6% R 21.9 0.12 23.7 0.25 100.00% +8.3% RT 12.0 0.21 9.2 0.08 100.00% -23.1% src_8888_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 33.5 0.17 54.3 0.67 100.00% +62.0% L2 33.3 0.18 52.8 1.04 100.00% +58.6% M 32.9 0.00 44.9 0.52 100.00% +36.4% HT 24.3 0.02 26.7 0.30 100.00% +9.8% VT 23.1 0.28 20.9 0.24 100.00% -9.6% R 19.6 0.24 18.9 0.20 100.00% -3.6% RT 10.6 0.19 7.0 0.11 100.00% -34.1% src_0565_x888 Before After Mean StdDev Mean StdDev Confidence Change L1 25.3 0.14 50.6 0.77 100.00% +99.7% L2 25.4 0.01 49.5 0.32 100.00% +95.0% M 25.0 0.00 43.5 0.39 100.00% +73.9% HT 21.7 0.22 29.7 0.32 100.00% +36.8% VT 19.7 0.20 24.5 0.29 100.00% +24.4% R 19.2 0.21 21.9 0.13 100.00% +13.9% RT 11.9 0.20 9.2 0.16 100.00% -22.9% src_0565_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 21.7 0.11 43.8 0.41 100.00% +102.5% L2 21.5 0.13 43.3 0.39 100.00% +101.1% M 21.3 0.15 38.4 0.34 100.00% +80.3% HT 18.6 0.16 24.7 0.06 100.00% +32.6% VT 17.4 0.17 20.4 0.24 100.00% +17.3% R 16.9 0.18 18.5 0.33 100.00% +9.9% RT 10.8 0.17 7.2 0.16 100.00% -33.3% over_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 26.6 0.14 42.9 0.32 100.00% +61.5% L2 26.4 0.12 41.7 0.32 100.00% +57.9% M 24.7 0.20 31.8 0.27 100.00% +28.8% HT 18.9 0.20 23.8 0.26 100.00% +25.7% VT 16.5 0.18 17.9 0.21 100.00% +8.7% R 15.0 0.17 16.5 0.19 100.00% +9.8% RT 7.7 0.12 7.1 0.10 100.00% -7.9% add_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 28.3 0.02 62.2 0.79 100.00% +119.6% L2 28.1 0.02 58.9 0.42 100.00% +109.3% M 26.2 0.01 43.4 0.38 100.00% +65.6% HT 20.2 0.01 30.3 0.18 100.00% +50.2% VT 18.3 0.01 21.1 0.12 100.00% +15.1% R 16.6 0.01 19.4 0.09 100.00% +16.4% RT 8.9 0.06 7.9 0.14 100.00% -10.6% src_8888_8_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 21.9 0.10 34.9 0.26 100.00% +59.8% L2 21.8 0.09 34.8 0.35 100.00% +59.8% M 21.0 0.18 32.2 0.27 100.00% +53.1% HT 17.2 0.01 21.7 0.22 100.00% +26.3% VT 15.1 0.16 17.4 0.19 100.00% +14.6% R 14.1 0.16 16.6 0.18 100.00% +17.5% RT 7.9 0.12 6.9 0.09 100.00% -13.3% src_8888_8_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 18.8 0.09 31.3 0.28 100.00% +66.4% L2 18.7 0.09 31.1 0.24 100.00% +66.2% M 18.3 0.14 29.1 0.15 100.00% +58.4% HT 15.3 0.15 18.3 0.21 100.00% +19.3% VT 13.9 0.14 14.9 0.13 100.00% +6.9% R 12.8 0.13 13.9 0.06 100.00% +8.5% RT 7.3 0.11 5.6 0.09 100.00% -23.2% src_0565_8_x888 Before After Mean StdDev Mean StdDev Confidence Change L1 19.7 0.10 30.3 0.20 100.00% +53.9% L2 19.6 0.10 30.3 0.17 100.00% +54.4% M 19.0 0.15 28.2 0.26 100.00% +48.9% HT 16.1 0.17 20.1 0.24 100.00% +24.5% VT 14.5 0.15 16.4 0.22 100.00% +13.6% R 13.9 0.16 15.7 0.21 100.00% +13.1% RT 8.0 0.13 6.8 0.12 100.00% -15.3% src_0565_8_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 17.2 0.09 27.5 0.23 100.00% +59.8% L2 17.1 0.10 27.3 0.18 100.00% +59.7% M 16.6 0.15 25.8 0.14 100.00% +55.2% HT 14.2 0.21 17.1 0.06 100.00% +20.4% VT 13.0 0.24 14.4 0.18 100.00% +10.7% R 12.4 0.24 13.4 0.16 100.00% +7.9% RT 7.3 0.21 5.5 0.04 100.00% -23.8% over_8888_8_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 25.5 0.10 30.3 0.37 100.00% +18.6% L2 25.3 0.02 29.9 0.17 100.00% +18.0% M 21.5 0.00 22.3 0.21 100.00% +4.0% HT 16.6 0.01 17.2 0.17 100.00% +3.4% VT 14.4 0.01 13.5 0.14 100.00% -6.6% R 12.5 0.01 12.5 0.13 71.35% +0.3% (insignificant) RT 5.9 0.05 5.4 0.08 100.00% -9.3% add_8888_8_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 21.0 0.08 36.9 0.53 100.00% +75.6% L2 20.9 0.01 36.1 0.28 100.00% +72.6% M 18.9 0.00 25.4 0.21 100.00% +34.8% HT 14.6 0.01 19.6 0.23 100.00% +34.9% VT 12.8 0.01 14.9 0.18 100.00% +16.3% R 11.7 0.01 13.8 0.16 100.00% +17.4% RT 6.0 0.04 5.7 0.09 100.00% -4.4%
2015-10-15arm: Add bilinear scaled fetchers for repeat type REFLECTBen Avison1-7/+68
2015-10-15arm: Add bilinear scaled fetchers for repeat type NORMALBen Avison1-7/+141
2015-10-15arm: Add bilinear scaled fetchers for repeat type PADBen Avison1-3/+131
2015-10-15arm: Add bilinear scaled fetchers for repeat type NONEBen Avison1-26/+365
2015-10-15armv7: Improved bilinear scaled fetchers for small scale factorsBen Avison2-4/+364
Adds a more sophisticated algorithm suitable for horizontal scale factors less than 1 (i.e. enlargements) where blocks of contiguous source pixels are loaded and format-converted prior to being picked. This produces a significant speed boost where the pixel conversion overhead outweighs the branch predict penalties; in practice this appears to hold for the r5g6b5 fetcher only.
2015-10-15armv7: Add bilinear scaled a8 fetcherBen Avison2-0/+22
2015-10-15armv7: Add bilinear scaled r5g6b5 fetcherBen Avison2-0/+32
2015-10-15armv7: Add bilinear scaled x8r8g8b8 fetcherBen Avison2-0/+24
2015-10-15armv7: Add bilinear scaled a8r8g8b8 fetcherBen Avison2-0/+6
2015-10-15armv7: Support for bilinear scaled fetchersBen Avison5-2/+494
Introduces the infrastructure that will be used by ARMv7 bilinear scaled fetcher routines, but doesn't actually utilise it yet. For simplicity, only includes the macros where source pixels are picked before being format-converted.
2015-10-15armv6: Add fetcher for a8 bilinear-interpolation scaled imagesBen Avison3-0/+36
This is constrained to support X increments in the positive X direction only. It also doesn't attempt to support any form of image repeat. Here are some affine-bench results for a variety of horizontal and vertical scaling factors. Before: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 6.2 6.2 6.2 6.1 6.0 0.75 6.2 6.1 6.1 6.0 5.9 1.0 6.2 6.1 5.9 5.8 1.5 6.1 6.0 5.9 5.8 5.6 2.0 6.1 6.0 5.9 5.7 5.5 After: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 22.2 21.2 19.7 21.0 20.4 0.75 19.4 18.3 16.7 18.2 17.4 1.0 24.7 22.3 22.1 20.4 1.5 14.2 13.0 11.5 12.9 12.1 2.0 12.0 10.9 9.5 10.8 10.0 Improvement: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 +256.4% +242.8% +219.6% +246.6% +241.3% 0.75 +212.9% +197.8% +173.7% +203.4% +195.1% 1.0 +300.2% +265.9% +273.4% +251.2% 1.5 +131.8% +115.6% +93.1% +123.2% +114.0% 2.0 +97.7% +82.9% +62.8% +91.0% +82.9%
2015-10-15armv6: Add fetcher for r5g6b5 bilinear-interpolation scaled imagesBen Avison3-0/+48
This is constrained to support X increments in the positive X direction only. It also doesn't attempt to support any form of image repeat. Here are some affine-bench results for a variety of horizontal and vertical scaling factors. Before: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 3.0 2.9 2.9 2.9 2.8 0.75 2.9 2.9 2.9 2.8 2.8 1.0 2.9 2.9 2.8 2.8 1.5 2.9 2.9 2.8 2.8 2.7 2.0 2.9 2.8 2.8 2.8 2.7 After: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 20.2 18.5 19.1 17.6 16.3 0.75 17.1 15.4 16.0 14.5 13.2 1.0 20.1 17.1 15.6 13.6 1.5 11.9 10.3 10.8 9.5 8.4 2.0 9.9 8.4 8.9 7.7 6.8 Improvement: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 +582.2% +530.7% +554.4% +514.7% +477.1% 0.75 +481.5% +427.7% +451.2% +410.3% +371.4% 1.0 +583.9% +486.9% +453.3% +392.7% 1.5 +308.1% +258.7% +281.0% +240.5% +208.1% 2.0 +241.4% +196.9% +217.8% +179.9% +152.4%
2015-10-15armv6: Add fetcher for x8r8g8b8 bilinear-interpolation scaled imagesBen Avison3-0/+29
This is constrained to support X increments in the positive X direction only. It also doesn't attempt to support any form of image repeat. Here are some affine-bench results for a variety of horizontal and vertical scaling factors. Before: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 3.9 3.8 3.7 3.6 3.4 0.75 3.8 3.8 3.7 3.5 3.3 1.0 3.8 3.7 3.5 3.3 1.5 3.7 3.6 3.5 3.3 3.1 2.0 3.6 3.5 3.4 3.2 3.0 After: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 20.8 19.3 18.8 19.6 18.4 0.75 17.8 16.2 15.7 16.6 15.2 1.0 21.3 18.4 18.9 16.9 1.5 12.5 11.1 10.5 11.4 10.2 2.0 10.5 9.1 8.7 9.4 8.3 Improvement: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 +436.3% +406.2% +402.8% +448.2% +437.9% 0.75 +364.2% +330.4% +324.7% +372.3% +354.5% 1.0 +461.5% +392.4% +446.9% +415.9% 1.5 +236.3% +204.8% +198.8% +242.2% +224.1% 2.0 +187.0% +156.5% +154.3% +193.3% +177.1%
2015-10-15armv6: Add fetcher for a8r8g8b8 bilinear-interpolation scaled imagesBen Avison4-0/+1008
This is constrained to support X increments in the positive X direction only. It also doesn't attempt to support any form of image repeat. Here are some affine-bench results for a variety of horizontal and vertical scaling factors. Before: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 7.1 6.9 6.8 6.6 6.3 0.75 6.4 6.2 6.1 5.8 5.5 1.0 5.9 5.7 5.2 4.9 1.5 5.0 4.8 4.6 4.3 4.0 2.0 4.4 4.2 4.0 3.7 3.4 After: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 21.0 19.6 19.2 20.2 18.9 0.75 18.0 16.6 16.1 17.1 15.9 1.0 21.8 18.9 19.9 17.7 1.5 12.8 11.3 10.9 11.8 10.7 2.0 10.7 9.3 8.9 9.8 8.8 Improvement: x increment 0.5 0.75 1.0 1.5 2.0 y increment 0.5 +196.7% +183.6% +181.8% +206.6% +198.4% 0.75 +182.2% +166.2% +164.0% +194.8% +185.8% 1.0 +271.7% +234.4% +282.7% +257.9% 1.5 +154.6% +135.3% +134.3% +173.3% +164.8% 2.0 +144.1% +124.2% +123.3% +165.6% +155.5%
2015-10-15armv7: Add nearest-neighbour scaled a8 fetcherBen Avison3-7/+111
2015-10-15armv7: Add nearest-neighbour scaled r5g6b5 fetcherBen Avison1-0/+2
2015-10-15armv7: Add nearest-neighbour scaled x8r8g8b8 fetcherBen Avison2-0/+17
2015-10-15armv7: Add nearest-neighbour scaled a8r8g8b8 fetcherBen Avison2-0/+35
2015-10-15armv6: Add four more nearest-scaled-cover fast pathsBen Avison1-0/+17
These complete the set of fast paths where currently pixman-fast-path.c provides versions that get selected in preference to the armv6-optimised scanline fetchers/combiners/writeback routines. Because generation of these fast paths is macroised, the patch required to add them is fairly simple. lowlevel-blt-bench -n over_8888_8888: Before After Mean StdDev Mean StdDev Confidence Change L1 13.8 0.0 26.5 0.2 100.0% +91.7% L2 9.4 0.2 22.9 0.4 100.0% +142.6% M 8.6 0.0 23.8 0.0 100.0% +176.1% HT 7.4 0.0 14.1 0.1 100.0% +91.2% VT 7.3 0.0 13.4 0.1 100.0% +84.1% R 7.0 0.0 13.0 0.1 100.0% +85.9% RT 4.5 0.1 6.2 0.1 100.0% +36.6% affine-bench * 0 0 1 over a8r8g8b8 a8r8g8b8: Before After Mean StdDev Mean StdDev Confidence Change 0.5 9.4 0.0 28.0 0.0 100.0% +197.4% 0.75 9.0 0.0 26.1 0.0 100.0% +190.2% 1.0 8.6 0.0 24.4 0.0 100.0% +184.6% 1.5 7.9 0.0 21.7 0.0 100.0% +173.4% 2.0 7.3 0.0 19.6 0.0 100.0% +166.6% lowlevel-blt-bench -n src_x888_8888: Before After Mean StdDev Mean StdDev Confidence Change L1 108.6 2.0 66.3 0.9 100.0% -39.0% L2 32.4 1.5 44.3 2.1 100.0% +36.8% M 27.5 0.1 62.0 0.1 100.0% +125.6% HT 20.3 0.1 28.7 0.2 100.0% +41.2% VT 19.9 0.1 26.7 0.1 100.0% +34.4% R 18.6 0.1 25.3 0.2 100.0% +36.3% RT 8.7 0.1 9.8 0.2 100.0% +12.6% affine-bench * 0 0 1 src x8r8g8b8 a8r8g8b8: Before After Mean StdDev Mean StdDev Confidence Change 0.5 45.2 0.0 97.2 0.1 100.0% +115.1% 0.75 35.9 0.1 76.7 0.1 100.0% +113.9% 1.0 29.6 0.1 61.1 0.1 100.0% +106.4% 1.5 21.4 0.0 52.7 0.1 100.0% +145.9% 2.0 16.7 0.0 43.0 0.1 100.0% +156.9% lowlevel-blt-bench -n src_8888_0565: Before After Mean StdDev Mean StdDev Confidence Change L1 57.2 0.7 43.1 0.4 100.0% -24.7% L2 23.0 1.0 32.8 1.0 100.0% +42.5% M 24.8 0.0 42.2 0.0 100.0% +70.0% HT 18.0 0.1 22.1 0.1 100.0% +22.5% VT 17.1 0.1 21.0 0.1 100.0% +22.5% R 16.5 0.1 20.0 0.1 100.0% +21.4% RT 8.3 0.2 8.4 0.1 95.0% +1.0% (insignificant) affine-bench * 0 0 1 src a8r8g8b8 r5g6b5: Before After Mean StdDev Mean StdDev Confidence Change 0.5 34.9 0.0 55.3 0.0 100.0% +58.7% 0.75 29.3 0.0 49.1 0.0 100.0% +67.4% 1.0 24.8 0.0 42.6 0.1 100.0% +71.6% 1.5 19.0 0.0 38.2 0.1 100.0% +100.7% 2.0 15.4 0.0 31.8 0.0 100.0% +107.1% lowlevel-blt-bench -n over_8888_0565: Before After Mean StdDev Mean StdDev Confidence Change L1 9.8 0.0 15.3 0.1 100.0% +56.6% L2 7.4 0.0 14.3 0.2 100.0% +91.7% M 7.5 0.0 15.4 0.0 100.0% +106.0% HT 6.5 0.0 10.1 0.0 100.0% +54.5% VT 6.4 0.0 9.9 0.0 100.0% +54.6% R 6.2 0.0 9.5 0.0 100.0% +52.1% RT 4.2 0.0 4.6 0.1 100.0% +9.8% affine-bench * 0 0 1 over a8r8g8b8 r5g6b5: Before After Mean StdDev Mean StdDev Confidence Change 0.5 8.0 0.0 17.3 0.0 100.0% +116.1% 0.75 7.8 0.0 16.5 0.0 100.0% +112.9% 1.0 7.5 0.0 15.7 0.0 100.0% +110.5% 1.5 7.0 0.0 14.8 0.0 100.0% +112.8% 2.0 6.5 0.0 13.7 0.0 100.0% +111.4%
2015-10-15armv6: Add nearest-scaled-cover src_0565_0565 fast pathBen Avison3-9/+75
This is adapted from the nearest scaled cover scanline fetcher, modified to pack output data in 16-bit units. lowlevel-blt-bench -n src_0565_0565: Before After Mean StdDev Mean StdDev Confidence Change L1 119.6 4.1 72.5 1.1 100.0% -39.4% L2 45.2 1.4 55.4 2.0 100.0% +22.5% M 47.1 0.1 71.3 0.1 100.0% +51.4% HT 26.4 0.2 31.8 0.3 100.0% +20.3% VT 25.0 0.2 30.0 0.3 100.0% +20.3% R 22.6 0.2 27.6 0.2 100.0% +22.0% RT 9.7 0.2 10.3 0.2 100.0% +5.6% affine-bench * 0 0 1 src r5g6b5 r5g6b5: Before After Mean StdDev Mean StdDev Confidence Change 0.5 59.6 0.1 129.6 0.1 100.0% +117.2% 0.75 52.0 0.1 106.3 0.1 100.0% +104.6% 1.0 47.2 0.1 71.7 0.0 100.0% +52.0% 1.5 39.1 0.1 68.1 0.1 100.0% +74.2% 2.0 37.7 0.1 68.7 0.1 100.0% +82.2%
2015-10-15armv6: Add nearest-scaled-cover src_8888_8888 fast pathBen Avison2-0/+104
Without this patch, any such operations are matched against the fast path implementation in pixman-fast-path.c before general_composite_rect(), so we never get to use the armv6-optimised assembly fetcher routines. This patch adds a C wrapper to the same assembly routine used for the nearest-scaled-cover fetcher, adapted to perform a 2D plot rather than a single scanlne. The C is macroised so that later patches can use the same approach to build more complex fast paths from combinations of armv6 fetcher/combiner/writeback routines in a similar manner to pixcman_composite_rect(). lowlevel-blt-bench -n src_8888_8888: Before After Mean StdDev Mean StdDev Confidence Change L1 117.2 1.6 79.2 1.1 100.0% -32.4% L2 44.1 3.1 49.9 2.4 100.0% +13.2% M 40.0 0.1 72.5 0.1 100.0% +81.4% HT 20.1 0.1 29.5 0.3 100.0% +46.5% VT 19.4 0.1 27.7 0.2 100.0% +42.7% R 18.2 0.1 26.2 0.2 100.0% +44.1% RT 8.7 0.2 10.0 0.2 100.0% +15.8% affine-bench * 0 0 1 src a8r8g8b8 a8r8g8b8: Before After Mean StdDev Mean StdDev Confidence Change 0.5 46.6 0.1 110.5 0.1 100.0% +137.2% 0.75 39.1 0.1 88.5 0.1 100.0% +126.1% 1.0 36.3 0.2 71.7 0.1 100.0% +97.7% 1.5 26.7 0.1 55.3 0.1 100.0% +106.8% 2.0 19.9 0.0 43.5 0.0 100.0% +119.2%
2015-10-15armv6: Add fetcher for a8 nearest-neighbour transformed imagesBen Avison2-0/+10
This is related to the a8r8g8b8 nearest-scaled-cover fetcher. Below are benchmarks for src_8_8888, which uses it. lowlevel-blt-bench -n : Before After Mean StdDev Mean StdDev Confidence Change L1 15.1 0.1 55.5 0.3 100.0% +267.2% L2 13.7 0.1 45.3 0.8 100.0% +230.0% M 14.5 0.0 53.9 0.1 100.0% +272.5% HT 8.3 0.0 21.2 0.2 100.0% +154.6% VT 8.3 0.0 20.1 0.3 100.0% +141.7% R 8.0 0.0 19.2 0.3 100.0% +140.5% RT 3.6 0.0 6.8 0.1 100.0% +88.4% affine-bench: Before After Mean StdDev Mean StdDev Confidence Change 0.5 17.2 0.0 76.5 0.1 100.0% +344.4% 0.75 16.7 0.0 67.1 0.1 100.0% +300.8% 1.0 16.4 0.0 54.3 0.1 100.0% +232.2% 1.5 15.7 0.0 52.4 0.1 100.0% +234.6% 2.0 14.8 0.0 50.8 0.1 100.0% +243.9%
2015-10-15armv6: Add fetcher for x8r8g8b8 nearest-neighbour transformed imagesBen Avison2-0/+10
This is related to the a8r8g8b8 nearest-scaled-cover fetcher. Below are benchmarks for add_x888_8888, which uses it. lowlevel-blt-bench -n : Before After Mean StdDev Mean StdDev Confidence Change L1 12.0 0.0 45.0 0.5 100.0% +275.3% L2 9.2 0.1 30.4 1.2 100.0% +231.6% M 8.6 0.0 27.8 0.1 100.0% +224.0% HT 6.0 0.0 15.4 0.1 100.0% +158.5% VT 5.9 0.0 14.5 0.1 100.0% +146.2% R 5.7 0.0 14.1 0.1 100.0% +145.8% RT 2.9 0.0 5.6 0.1 100.0% +91.4% affine-bench: Before After Mean StdDev Mean StdDev Confidence Change 0.5 12.1 0.0 32.5 0.1 100.0% +169.6% 0.75 11.3 0.0 30.0 0.0 100.0% +165.1% 1.0 10.7 0.0 27.1 0.0 100.0% +153.7% 1.5 9.6 0.0 24.1 0.0 100.0% +151.6% 2.0 8.8 0.0 21.5 0.0 100.0% +145.1%
2015-10-15armv6: Add fetcher for r5g6b5 nearest-neighbour transformed imagesBen Avison2-0/+22
This is related to the a8r8g8b8 nearest-scaled-cover fetcher. Below are benchmarks for src_0565_8888, which uses it. lowlevel-blt-bench -n : Before After Mean StdDev Mean StdDev Confidence Change L1 9.0 0.0 34.4 0.3 100.0% +284.7% L2 8.1 0.1 29.0 0.6 100.0% +258.7% M 8.4 0.0 33.2 0.1 100.0% +297.6% HT 5.8 0.0 16.5 0.3 100.0% +183.6% VT 5.8 0.0 16.0 0.3 100.0% +175.6% R 5.6 0.0 15.6 0.1 100.0% +175.5% RT 3.0 0.0 6.0 0.2 100.0% +98.7% affine-bench: Before After Mean StdDev Mean StdDev Confidence Change 0.5 11.2 0.0 52.0 0.1 100.0% +363.2% 0.75 10.9 0.0 41.3 0.1 100.0% +279.3% 1.0 10.6 0.0 33.4 0.1 100.0% +216.7% 1.5 10.0 0.0 32.3 0.1 100.0% +221.8% 2.0 9.4 0.0 31.7 0.0 100.0% +236.0%
2015-10-15armv6: Add fetcher for a8r8g8b8 nearest-neighbour transformed imagesBen Avison4-0/+479
This is constrained to support X increments in the positive X direction only, so this means scaled images (except those reflected in the Y axis) plus parallelogram transformations which preserve the direction of the X axis. It also doesn't attempt to support any form of image repeat. With this optimisation, some operations constructed from fetcher and combiner calls using general_composite_rect() now outperform the versions consructed from FAST_NEAREST macros in pixman-fast-path.c, but unfortunately the FAST_NEAREST ones have higher priority in fast path lookup. Here are some benchmarks for the in_reverse_8888_8888 operation, which is not affected: lowlevel-blt-bench -n : Before After Mean StdDev Mean StdDev Confidence Change L1 10.2 0.0 27.1 0.2 100.0% +164.8% L2 8.2 0.1 23.0 0.4 100.0% +179.2% M 8.3 0.0 24.8 0.0 100.0% +200.3% HT 5.5 0.0 12.7 0.0 100.0% +129.9% VT 5.4 0.0 12.1 0.0 100.0% +123.2% R 5.4 0.0 11.9 0.1 100.0% +122.7% RT 2.8 0.0 5.4 0.1 100.0% +91.9% affine-bench for 5 different scaling factors: Before After Mean StdDev Mean StdDev Confidence Change 0.5 11.1 0.0 28.3 0.0 100.0% +155.1% 0.75 10.5 0.0 26.4 0.0 100.0% +152.2% 1.0 9.9 0.0 24.6 0.0 100.0% +147.5% 1.5 9.0 0.0 21.8 0.0 100.0% +141.4% 2.0 8.3 0.0 19.7 0.0 100.0% +138.4%
2015-10-15armv7: Re-use existing fast paths in more casesBen Avison1-0/+9
There are a group of combiner types - SRC, OVER_REVERSE, IN, OUT and ADD - where the source alpha affects only the destination alpha component. This means that any fast path with a8r8g8b8 source and destination can also be applied to an equivalent operation with x8r8g8b8 source and destination just by updating the fast path table, and likewise with a8b8g8r8 and x8b8g8r8. The following operations are affected: add_x888_8_x888 (and bilinear scaled version of same) add_x888_8888_x888 add_x888_n_x888 add_x888_x888 (and bilinear scaled version of same)
2015-10-15armv7: Re-use existing fast paths in more casesBen Avison1-0/+12
There are a group of combiner types - SRC, OVER, IN_REVERSE, OUT_REVERSE and ADD - where the destination alpha component is only used (if at all) to determine the destination alpha component. This means that any such fast paths with an a8r8g8b8 destination can also be applied to an x8r8g8b8 destination just by updating the fast path table, and likewise with a8b8g8r8 and x8b8g8r8. The following operations are affected: over_8888_8888_x888 add_n_8_x888 add_8888_8_x888 add_8888_8888_x888 add_8888_n_x888 add_8888_x888 out_reverse_8_x888
2015-10-15armv7: Add in_n_8888 fast pathBen Avison2-0/+68
This is tuned for Cortex-A7 (Raspberry Pi 2). lowlevel-blt-bench results, compared to the ARMv6 fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 104.6 0.5 119.4 0.1 100.0% +14.1% L2 106.8 0.6 121.4 0.1 100.0% +13.6% M 100.3 1.3 116.4 0.0 100.0% +16.0% HT 64.5 1.0 70.8 0.1 100.0% +9.8% VT 56.0 0.8 62.2 0.1 100.0% +11.1% R 54.1 0.9 55.2 0.0 100.0% +1.9% RT 24.6 0.5 26.6 0.0 100.0% +8.3%
2015-10-15armv6: Add in_n_8888 fast pathBen Avison2-0/+81
lowlevel-blt-bench results: Before After Mean StdDev Mean StdDev Confidence Change L1 18.8 0.1 63.9 0.9 100.0% +239.0% L2 16.0 0.4 58.5 1.3 100.0% +265.8% M 13.1 0.0 56.8 0.1 100.0% +332.6% HT 11.6 0.0 31.3 0.3 100.0% +169.6% VT 11.4 0.0 27.2 0.2 100.0% +139.2% R 11.0 0.1 28.2 0.2 100.0% +156.1% RT 6.8 0.1 12.9 0.2 100.0% +89.0%
2015-10-15armv6: Add over_8888_n_0565 fast pathBen Avison2-20/+53
lowlevel-blt-bench results: Before After Mean StdDev Mean StdDev Confidence Change L1 5.7 0.0 20.6 0.1 100.0% +263.8% L2 4.9 0.0 17.4 0.3 100.0% +254.0% M 4.8 0.0 19.9 0.0 100.0% +312.9% HT 4.5 0.0 12.4 0.1 100.0% +175.4% VT 4.5 0.0 12.0 0.0 100.0% +168.9% R 4.3 0.0 11.4 0.1 100.0% +163.3% RT 2.9 0.0 6.0 0.1 100.0% +106.9%
2015-10-15armv6: Add over_8888_8_0565 fast pathBen Avison2-0/+186
lowlevel-blt-bench results: Before After Mean StdDev Mean StdDev Confidence Change L1 5.2 0.0 20.0 0.2 100.0% +281.7% L2 4.5 0.0 16.2 0.2 100.0% +256.9% M 4.5 0.0 18.8 0.0 100.0% +321.1% HT 3.9 0.0 10.9 0.0 100.0% +177.6% VT 3.9 0.0 10.6 0.0 100.0% +171.5% R 3.8 0.0 10.0 0.0 100.0% +165.1% RT 2.3 0.0 4.9 0.1 100.0% +107.7%
2015-10-15armv7: Add in_8888_8 fast pathBen Avison2-0/+40
This is tuned for the Cortex-A7 (Raspberry Pi 2). lowlevel-blt-bench results, compared to the ARMv6 fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 146.0 0.7 231.4 1.2 100.0% +58.5% L2 143.1 0.9 222.1 1.7 100.0% +55.3% M 110.9 0.0 129.0 0.5 100.0% +16.3% HT 57.3 0.6 73.0 0.3 100.0% +27.4% VT 46.6 0.5 61.6 0.4 100.0% +32.3% R 42.3 0.2 51.7 0.2 100.0% +22.2% RT 19.1 0.1 21.0 0.1 100.0% +9.9%
2015-10-15armv6: Add in_8888_8 fast pathBen Avison2-0/+116
This is used instead of the equivalent C fast path. lowlevel-blt-bench results, compared to no fast path at all: Before After Mean StdDev Mean StdDev Confidence Change L1 12.4 0.1 117.5 2.3 100.0% +851.2% L2 9.5 0.1 46.9 2.4 100.0% +393.8% M 9.6 0.0 61.9 0.9 100.0% +544.0% HT 7.9 0.0 26.6 0.5 100.0% +238.6% VT 7.7 0.0 24.2 0.4 100.0% +212.5% R 7.4 0.0 22.4 0.4 100.0% +204.5% RT 4.1 0.0 8.7 0.2 100.0% +109.4%
2015-10-15pixman-fast-path: Add in_8888_8 fast pathBen Avison1-0/+40
This is a C fast path, useful for reference or for platforms that don't have their own fast path for this operation. lowlevel-blt-bench results on ARMv6: Before After Mean StdDev Mean StdDev Confidence Change L1 12.4 0.1 24.4 0.3 100.0% +97.8% L2 9.5 0.1 14.1 0.2 100.0% +48.1% M 9.6 0.0 14.7 0.0 100.0% +53.1% HT 7.9 0.0 12.0 0.1 100.0% +52.3% VT 7.7 0.0 11.6 0.1 100.0% +49.8% R 7.4 0.0 10.8 0.1 100.0% +47.2% RT 4.1 0.0 6.1 0.1 100.0% +48.2%
2015-10-15armv6: Add over_n_0565 fast pathBen Avison2-0/+118
This is used instead of the equivalent C fast path. lowlevel-blt-bench results, compared to no fast path at all: Before After Mean StdDev Mean StdDev Confidence Change L1 8.2 0.0 38.7 0.5 100.0% +372.7% L2 7.9 0.1 37.6 0.5 100.0% +376.8% M 7.3 0.0 38.5 0.1 100.0% +425.6% HT 6.9 0.0 26.1 0.3 100.0% +279.9% VT 6.8 0.0 24.5 0.3 100.0% +258.0% R 6.6 0.1 23.6 0.2 100.0% +255.1% RT 4.5 0.1 10.9 0.2 100.0% +143.1%
2015-10-15pixman-fast-path: Add over_n_0565 fast pathBen Avison1-0/+35
This is a C fast path, useful for reference or for platforms that don't have their own fast path for this operation. lowlevel-blt-bench results on ARMv6: Before After Mean StdDev Mean StdDev Confidence Change L1 8.2 0.0 11.3 0.1 100.0% +38.6% L2 7.9 0.1 10.5 0.0 100.0% +33.3% M 7.3 0.0 10.0 0.0 100.0% +36.7% HT 6.9 0.0 9.2 0.0 100.0% +33.3% VT 6.8 0.0 9.0 0.0 100.0% +32.1% R 6.6 0.1 8.8 0.0 100.0% +31.8% RT 4.5 0.1 6.3 0.1 100.0% +39.7%
2015-10-15pixman-fast-path: Add over_n_8888 fast pathBen Avison1-0/+35
This is a C fast path, useful for reference or for platforms that don't have their own fast path for this operation. lowlevel-blt-bench results on ARMv6: Before After Mean StdDev Mean StdDev Confidence Change L1 11.9 0.1 20.4 0.2 100.0% +71.1% L2 10.6 0.2 16.5 0.4 100.0% +55.8% M 9.4 0.0 13.5 0.0 100.0% +44.3% HT 8.4 0.0 12.2 0.1 100.0% +43.9% VT 8.3 0.0 11.9 0.1 100.0% +42.7% R 8.1 0.0 11.5 0.1 100.0% +41.3% RT 5.4 0.1 7.6 0.1 100.0% +40.3%
2015-10-15armv7: Add optimised scanline writeback for r5g6b5Ben Avison2-0/+20
lowlevel-blt-bench results for an example operation, src_1555_0565: Before After Mean StdDev Mean StdDev Confidence Change L1 85.8 2.12 114.0 1.65 100.00% +32.9% L2 83.7 0.96 106.0 1.01 100.00% +26.7% M 76.4 0.66 94.8 0.98 100.00% +24.0% HT 39.8 0.37 38.9 0.29 100.00% -2.3% VT 37.0 0.36 34.1 0.24 100.00% -7.7% R 33.9 0.37 30.3 0.24 100.00% -10.5% RT 14.7 0.20 11.5 0.11 100.00% -21.7%
2015-10-15armv7: Add optimised untransformed scanline fetchers r5g6b5 & a1r5g5b5Ben Avison2-0/+33
lowlevel-blt-bench results on Cortex-A7 for a couple of sample operations that utilise these fetchers are below. add_0565_8888: Before After Mean StdDev Mean StdDev Confidence Change L1 75.4 0.38 147.5 0.90 100.00% +95.7% L2 72.3 0.36 129.3 0.57 100.00% +79.0% M 64.4 0.05 94.6 0.90 100.00% +46.8% HT 35.8 0.03 42.3 0.26 100.00% +18.1% VT 29.9 0.04 34.3 0.31 100.00% +14.5% R 26.1 0.02 28.6 0.11 100.00% +9.4% RT 12.2 0.06 13.1 0.15 100.00% +7.9% add_1555_8888: Before After Mean StdDev Mean StdDev Confidence Change L1 73.3 0.38 160.7 0.89 100.00% +119.2% L2 69.8 0.08 139.1 0.74 100.00% +99.4% M 62.2 0.03 100.4 0.76 100.00% +61.4% HT 35.1 0.03 42.9 0.42 100.00% +22.1% VT 29.5 0.03 34.7 0.33 100.00% +17.8% R 25.8 0.02 28.7 0.27 100.00% +11.4% RT 12.1 0.02 13.2 0.15 100.00% +8.5% --- For the record, I tried writing an a8 fetcher, but benchmarking indicated that it couldn't improve upon the ARMv6 a8 fetcher results. I also tried adding prefetch to the above fetchers - since they are the first iterator in a chain and won't benefit from write-allocate caches, you might think that this would help. Benchmarking indicated otherwise.
2015-10-15armv7: Add src_1555_8888 fast pathBen Avison2-0/+59
This is tuned for Cortex-A7 (Raspberry Pi 2). lowlevel-blt-bench results, compared to the ARMv6 fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 88.6 0.2 221.3 0.5 100.0% +149.7% L2 88.1 0.4 219.2 0.8 100.0% +148.9% M 87.9 0.1 178.2 0.1 100.0% +102.6% HT 59.7 0.4 72.0 0.2 100.0% +20.7% VT 53.2 0.4 69.8 0.2 100.0% +31.3% R 48.5 0.3 53.6 0.1 100.0% +10.6% RT 21.2 0.1 23.0 0.1 100.0% +8.5%
2015-10-15armv6: Add src_1555_8888 fast pathBen Avison2-0/+19
lowlevel-blt-bench results, compared to using the armv6 1555 fetcher: Before After Mean StdDev Mean StdDev Confidence Change L1 57.0 1.1 70.1 0.6 100.0% +23.1% L2 41.4 1.0 44.1 1.4 100.0% +6.3% M 49.8 0.1 59.0 0.2 100.0% +18.5% HT 21.4 0.3 32.3 0.3 100.0% +50.9% VT 21.0 0.3 30.2 0.3 100.0% +43.8% R 19.7 0.2 27.0 0.2 100.0% +37.4% RT 7.0 0.2 10.9 0.3 100.0% +56.6%
2015-10-15armv6: Add optimised scanline fetcher for x8r8g8b8Ben Avison2-0/+13
This supports x8r8g8b8 source images. lowlevel-blt-bench results for src_x888_8888 with PIXMAN_DISABLE=wholeops on a Raspberry Pi 1: Before After Mean StdDev Mean StdDev Confidence Change L1 55.5 0.98 147.5 5.82 100.00% +165.8% L2 25.2 0.84 46.7 2.83 100.00% +85.5% M 27.8 0.15 57.5 0.06 100.00% +106.7% HT 14.5 0.10 24.2 0.19 100.00% +66.8% VT 14.2 0.11 23.2 0.20 100.00% +63.0% R 13.5 0.07 22.0 0.24 100.00% +63.3% RT 5.5 0.05 7.8 0.24 100.00% +41.8% lowlevel-blt-bench results for src_x888_8888 with PIXMAN_DISABLE=wholeops on a Raspberry Pi 2 (ARMv7): Before After Mean StdDev Mean StdDev Confidence Change L1 135.8 2.43 236.4 6.68 100.00% +74.0% L2 122.8 1.09 201.4 2.01 100.00% +64.1% M 94.1 1.15 145.2 0.59 100.00% +54.3% HT 41.1 0.53 52.4 0.38 100.00% +27.5% VT 36.5 0.53 51.7 0.38 100.00% +41.7% R 30.3 0.42 40.9 0.29 100.00% +34.7% RT 13.7 0.24 17.5 0.25 100.00% +28.2% The before case was using the fetcher iterator defined in pixman-access.c. Note that it does not appear to be worthwhile to create an additional ARMv7 version of this fetcher. If we construct one using the src_x888_8888 macros the results are as follows on a Raspberry Pi 2: Before After Mean StdDev Mean StdDev Confidence Change L1 236.4 6.68 259.0 3.58 100.00% +9.6% L2 201.4 2.01 209.8 2.17 100.00% +4.2% M 145.2 0.59 139.4 1.06 100.00% -4.0% HT 52.4 0.38 51.4 0.56 100.00% -1.9% VT 51.7 0.38 47.8 0.86 100.00% -7.6% R 40.9 0.29 35.3 0.40 100.00% -13.5% RT 17.5 0.25 16.5 0.26 100.00% -6.2%
2015-10-15armv6: Add optimised scanline fetcher for a1r5g5b5Ben Avison2-0/+73
This supports a1r5g5b5 source images. lowlevel-blt-bench results for src_1555_8888, which does not yet have a dedicated fast path: Before After Mean StdDev Mean StdDev Confidence Change L1 24.5 0.2 57.0 1.1 100.0% +132.2% L2 19.3 0.4 41.4 1.0 100.0% +114.3% M 20.4 0.0 49.8 0.1 100.0% +144.7% HT 12.8 0.1 21.4 0.3 100.0% +67.0% VT 12.7 0.1 21.0 0.3 100.0% +65.4% R 12.1 0.1 19.7 0.2 100.0% +63.1% RT 5.6 0.1 7.0 0.2 100.0% +24.8%
2015-10-15armv6: Add optimised scanline fetchers and writeback for r5g6b5 and a8Ben Avison3-0/+178
This supports r5g6b5 source and desitination images, and a8 source images. lowlevel-blt-bench results for example operations which use these because they lack a dedicated fast path at the time of writing: in_reverse_8_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 30.0 0.3 37.0 0.3 100.0% +23.2% L2 23.3 0.3 29.4 0.4 100.0% +26.1% M 24.0 0.0 31.3 0.1 100.0% +30.5% HT 12.8 0.1 16.1 0.1 100.0% +25.8% VT 11.9 0.1 14.8 0.1 100.0% +24.6% R 11.7 0.1 14.6 0.1 100.0% +24.5% RT 5.1 0.1 6.2 0.1 100.0% +20.2% in_0565_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 22.0 0.1 28.3 0.2 100.0% +28.4% L2 16.6 0.2 23.6 0.3 100.0% +42.2% M 16.5 0.0 24.7 0.1 100.0% +49.5% HT 11.0 0.1 13.7 0.1 100.0% +24.4% VT 10.7 0.0 13.1 0.1 100.0% +22.0% R 10.3 0.0 12.6 0.1 100.0% +22.5% RT 5.3 0.1 5.7 0.1 100.0% +9.0% in_reverse_8888_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 16.6 0.1 20.9 0.1 100.0% +25.5% L2 13.1 0.1 17.7 0.3 100.0% +35.3% M 13.2 0.0 19.2 0.0 100.0% +45.3% HT 9.6 0.0 11.7 0.1 100.0% +21.8% VT 9.3 0.0 11.4 0.1 100.0% +22.4% R 9.0 0.0 10.9 0.1 100.0% +21.1% RT 4.7 0.1 5.2 0.1 100.0% +8.7%
2015-10-15armv7: Add OVER_REVERSE combinerBen Avison2-0/+147
In common with the ARMv6 version of this combiner, this code features a shortcut for the case where the destination is opaque. Without that, the NEON version performs significantly worse than the ARMv6 version (though it muct be noted that the effect of repeated application of the OVER_REVERSE operator is to set the destination opaque, so lowlevel-blt-bench is perhaps not best representing real-world usage in this case). lowlevel-blt-bench results for over_reverse_0565_8888 (compared to ARMv6 version): Before After Mean StdDev Mean StdDev Confidence Change L1 73.4 0.21 77.9 0.40 100.00% +6.2% L2 72.8 0.18 76.0 0.40 100.00% +4.4% M 66.3 0.02 70.1 0.67 100.00% +5.8% HT 34.0 0.19 31.0 0.38 100.00% -9.0% VT 30.2 0.16 27.4 0.35 100.00% -9.1% R 28.5 0.16 23.4 0.32 100.00% -17.9% RT 12.4 0.10 10.5 0.17 100.00% -15.2% lowlevel-blt-bench results for over_reverse_0565_8_8888 (compared to ARMv6 version): Before After Mean StdDev Mean StdDev Confidence Change L1 60.0 0.20 65.4 0.29 100.00% +9.0% L2 59.1 0.18 63.4 0.38 100.00% +7.2% M 50.3 0.24 55.8 0.09 100.00% +10.9% HT 24.1 0.15 22.4 0.12 100.00% -7.1% VT 20.8 0.12 19.6 0.13 100.00% -5.6% R 19.6 0.13 17.2 0.01 100.00% -12.4% RT 8.2 0.06 7.5 0.05 100.00% -8.2% It's notable that the compatative performance depends heavily upon the rectangle size - not surprising since one of the main features of NEON is the ability to work on larger blocks of data at once, which is mainly a benefit to large data sets, and the larger granularity works against it for smaller data sets. Comments welcome on whether it would be desirable to select between ARMv6 and ARMv7 implementations at runtime based upon the rectangle size.