~sandmann/pixman - Unnamed repository; edit this file to name it for gitweb.

Age	Commit message (Collapse)	Author	Files	Lines
2011-09-10	BILINEAR->NEAREST filter optimization for simple rotation and translationbilinear-reduction	Siarhei Siamashka	1	-1/+38
	Simple rotation and translation are the additional cases when BILINEAR filter can be safely reduced to NEAREST.
2011-09-10	Strength-reduce BILINEAR filter to NEAREST filter for identity transforms	Søren Sandmann Pedersen	5	-38/+62
	An image with a bilinear filter and an identity transform is equivalent to one with a nearest filter, so there is no reason the standard fast paths shouldn't be usable. But because a BILINEAR filter samples a 2x2 pixel block in the source image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the source area is the entire image, because some compositing operations might then read pixels outside the image. This patch fixes the problem by splitting the FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip covers the samples taking into account NEAREST/BILINEAR filters respectively. All the existing compositing operations that require FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which filter they depend on. In compute_image_info() both COVER_CILP_NEAREST and COVER_CLIP_BILINEAR can be set depending on how much room there is around the clip rectangle. Finally, images with an identity transform and a bilinear filter get FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER. Performance measurementas with render_bench against Xephyr: Before * ROUND 1 * --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 5.720 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 5.149 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.237 sec. After: * ROUND 1 * --------------------------------------------------------------- Test: Test Xrender doing non-scaled Over blends Time: 4.947 sec. --------------------------------------------------------------- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 4.487 sec. --------------------------------------------------------------- Test: Test Imlib2 doing non-scaled Over blends Time: 6.235 sec.
2011-09-10	test: Occasionally use a BILINEAR filter in blitters-test	Søren Sandmann Pedersen	1	-1/+4
	To test that reductions of BILINEAR->NEAREST for identity transformations happen correctly, occasionally use a bilinear filter in blitters test.
2011-09-10	test: better coverage for BILINEAR->NEAREST filter optimization	Siarhei Siamashka	1	-6/+30
	The upcoming optimization which is going to be able to replace BILINEAR filter with NEAREST where appropriate needs to analyze the transformation matrix and not to make any mistakes. The changes to affine-test include: 1. Higher chance of using the same scale factor for x and y axes. This can help to stress some special cases (for example the case when both x and y scale factors are integer). The same applies to x/y translation. 2. Introduced a small chance for "corrupting" transformation matrix by flipping random bits. This supposedly can help to identify the cases when some of the fast paths or other code logic is wrongly activated due to insufficient checks.
2011-09-10	Eliminate compute_sample_extents() function	Søren Sandmann Pedersen	1	-58/+42
	In analyze_extents(), instead of calling compute_sample_extents() call compute_transformed_extents() and inline the remaining part of compute_sample_extents(). The upcoming bilinear->nearest optimization will do something different with these two pieces of code.
2011-09-10	Split computation of sample area into own function	Søren Sandmann Pedersen	1	-62/+76
	compute_sample_extents() have two parts: one that computes the transformed extents, and one that checks whether the computed extents fit within the 16.16 coordinate space. Split the first part into its own function compute_transformed_extents().
2011-09-10	Remove x and y coordinates from analyze_extents() and compute_sample_extents()	Søren Sandmann Pedersen	1	-26/+37
	These coordinates were only ever used for subtracting from the extents box to put it into the coordinate space of the image, so we might as well do this coordinate translation only once before entering the functions.
2011-09-09	Post-release version bump to 0.23.5	Søren Sandmann Pedersen	1	-1/+1

2011-09-09	Pre-release version bump to 0.23.4	Søren Sandmann Pedersen	1	-1/+1

2011-09-09	bits: optimise fetching width==1 repeats	Chris Wilson	1	-14/+44
	Profiling ign.com, 20% of the entire render time was absorbed in this single operation: << /content //COLOR_ALPHA /width 480 /height 800 >> surface context << /width 1 /height 677 /format //ARGB32 /source <\|!!!@jGb!m5gD']#$jFHGWtZcK&2i)Up=!TuR9`G<8;ZQp[FQk;emL9ibhbEL&NTh-j63LhHo$E=mSG,0p71`cRJHcget4%<S\X+~> >> image pattern //EXTEND_REPEAT set-extend set-source n 0 0 480 677 rectangle fill+ pop which is a simple composition of a single pixel wide image. Sadly this is a workaround for lack of independent repeat-x/y handling in cairo and pixman. Worse still is that the worst-case behaviour of the general repeat path is for width 1 images... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-07	ARM: NEON better instruction scheduling of over_n_8888	Taekyun Kim	1	-5/+48
	New head, tail, tail/head blocks are added and instructions are reordered to eliminate pipeline stalls Performance numbers of before/after - cortex a8 - before : L1: 375.39 L2: 391.93 M:114.39 ( 40.99%) HT: 99.37 VT: 98.20 R: 90.24 RT: 32.87 ( 240Kops/s) after : L1: 481.90 L2: 483.46 M:114.29 ( 40.69%) HT:106.91 VT: 93.38 R: 90.74 RT: 29.51 ( 236Kops/s) - cortex a9 - before : L1: 324.50 L2: 332.79 M:155.55 ( 47.51%) HT:111.93 VT: 93.58 R: 71.92 RT: 28.21 ( 233Kops/s) after : L1: 355.87 L2: 364.49 M:156.90 ( 47.59%) HT:111.52 VT: 91.76 R: 72.16 RT: 28.22 ( 234Kops/s)
2011-09-07	ARM: NEON better instruction scheduling of over_n_8_8888	Taekyun Kim	1	-26/+60
	tail/head block is expanded and reordered to eliminate stalls Performance numbers of before/after - cortex a8 - before : L1: 201.35 L2: 190.48 M:101.94 ( 54.85%) HT: 78.41 VT: 63.83 R: 58.25 RT: 21.74 ( 191Kops/s) after : L1: 257.65 L2: 255.49 M:102.04 ( 55.33%) HT: 79.19 VT: 65.46 R: 59.23 RT: 21.12 ( 189Kops/s) - cortex a9 - before : L1: 157.35 L2: 159.81 M:133.00 ( 60.94%) HT: 82.44 VT: 63.64 R: 51.66 RT: 19.15 ( 179Kops/s) after : L1: 216.83 L2: 219.40 M:135.83 ( 61.80%) HT: 85.60 VT: 64.80 R: 52.23 RT: 19.16 ( 179Kops/s)
2011-08-29	Workaround bug in llvm-gcc	Andrea Canciani	1	-0/+4
	llvm-gcc (shipped in Apple XCode 4.1.1 as the default compiler or in the 2.9 release of LLVM) performs an invalid optimization which unifies the empty_region and the bad_region structures because they have the same content. A bugreport has been filed against Apple Developers Tool for this issue. This commit works around this bug by making one of the two structures volatile, so that it cannot be merged. Fixes region-contains-test.
2011-08-29	win32: Build benchmarks	Andrea Canciani	2	-5/+6
	Add the makefile rules needed to compile lowlevel-blt-bench on win32 and fix the compilation errors.
2011-08-19	Move bilinear interpolation to pixman-inlines.h	Søren Sandmann Pedersen	2	-91/+91

2011-08-19	Use repeat() function from pixman-inlines.h in pixman-bits-image.c	Søren Sandmann Pedersen	1	-42/+15
	The repeat() functionality was duplicated between pixman-bits-image.c and pixman-inlines.h
2011-08-19	Rename pixman-fast-path.h to pixman-inlines.h	Søren Sandmann Pedersen	8	-7/+7
	It is not really specific to pixman-fast-path.c.
2011-08-15	In pixman_image_create_bits() allow images larger than 2GB	Søren Sandmann Pedersen	3	-11/+18
	There is no reason for pixman_image_create_bits() to check that the image size fits in int32_t. The correct check is against size_t since that is what the argument to calloc() is. This patch fixes this by adding a new _pixman_multiply_overflows_size() and using it in create_bits(). Also prepend an underscore to the names of other similar functions since they are internal to pixman. V2: Use int, not ssize_t for the arguments in create_bits() since width/height are still limited to 32 bits, as pointed out by Chris Wilson.
2011-08-11	Don't include stdint.h in lowlevel-blt-bench.c	Søren Sandmann Pedersen	1	-1/+0
	Some systems don't have the file, and the types are already defined in pixman.h. https://bugs.freedesktop.org//show_bug.cgi?id=37422
2011-08-11	Use find_box_for_y() in pixman_region_contains_point() too	Søren Sandmann Pedersen	1	-6/+6
	The same binary search from the previous commit can be used in this function too. V2: Remove check from loop that is not needed anymore, pointed out by Andrea Canciani.
2011-08-11	Speed up pixman_region{,32}_contains_rectangle()	Søren Sandmann Pedersen	1	-6/+42
	When someone selects some text in Firefox under a non-composited X server and initiates a drag, a shaped window is created with a complex shape corresponding to the outline of the text. Then, on every mouse movement pixman_region_contains_rectangle() is called many times on that complicated region. And pixman_region_contains_rectangle() is doing a linear scan through the rectangles in the region, although the scan does exit when it finds the first box that can't possibly intersect the passed-in rectangle. This patch changes the loop so that it uses a binary search to skip boxes that don't overlap the current y position. The performance improvement for the text dragging case is easily noticable. V2: Use the binary search for the "getting up to speed or skippping remainder of band" as well.
2011-08-11	New test of pixman_region_contains_{rectangle,point}	Søren Sandmann Pedersen	4	-2/+184
	This test generates random regions and checks whether random boxes and points are contained within them. The results are combined and a CRC32 value is computed and compared to a known-correct one.
2011-08-11	Fix lcg_rand_u32() to return 32 random bits.	Søren Sandmann Pedersen	1	-4/+8
	The lcg_rand() function only returns 15 random bits, so lcg_rand_u32() would always have 0 in bit 31 and bit 15. Fix that by calling lcg_rand() three times, to generate 15, 15, and 2 random bits respectively. V2: Use the 10/11 most significant bits from the 3 lcg results and mix them with the low ones from the adjacent one, as suggested by Andrea Canciani.
2011-08-04	ARM NEON: Standard fast path out_reverse_8_8888	Taekyun Kim	2	-0/+54
	This fast path is frequently used by cairo to do polygon rendering. Existing NEON code generation framework is used.
2011-07-29	radial: Fix typos and trailing whitespace	Andrea Canciani	1	-8/+7
	Correct a typo reported by James Cloos and some reported by automatic spellchecking. Remove trailing whitespace.
2011-07-27	ARM: workaround binutils bug #12931 (code sections alignment)	Siarhei Siamashka	3	-0/+3
	More details in binutils bugtracker: http://sourceware.org/bugzilla/show_bug.cgi?id=12931 The problem was encountered in the wild by Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=672787
2011-07-22	C fast path for scaled src_x888_8888 with nearest filter	Siarhei Siamashka	3	-0/+12
	The necessity is justified by a message in the pixman mailing list: http://lists.freedesktop.org/archives/pixman/2011-July/001330.html NONE repeat is not supported, but could be added by tweaking the interpretation and making use of 'fully_transparent_src' scanline function argument.
2011-07-15	radial: Improve documentation and naming	Andrea Canciani	1	-10/+21
	Add a comment to explain why the tests guarantee that the code always computes the greatest valid root. Rename "det" as "discr" to make it match the mathematical name "discriminant". Based on a patch by Jeff Muizelaar <jmuizelaar@mozilla.com>.
2011-07-04	Makefile.am: Add pixman@lists.freedesktop.org to RELEASE_ANNOUNCE_LIST	Søren Sandmann Pedersen	1	-1/+1

2011-07-04	Post-release version bump to 0.23.3	Søren Sandmann Pedersen	1	-1/+1

2011-07-04	Pre-release version bump to 0.23.2	Søren Sandmann Pedersen	1	-1/+1

2011-06-28	Bilinear REPEAT_NORMAL source line extension for too short src_width	Taekyun Kim	1	-3/+47
	To avoid function call and other calculation overhead, extend source scanline into temporary buffer when source width is too small. Temporary buffer will be repeatedly accessed, so extension cost is very small due to cache effect.
2011-06-28	Enable REPEAT_NORMAL bilinear fast path entries	Taekyun Kim	1	-3/+39

2011-06-28	ARM: Add REPEAT_NORMAL functions to bilinear BIND macros	Taekyun Kim	1	-1/+10
	Now bilinear template support REPEAT_NORMAL, so functions for that is added to PIXMAN_ARM_BIND_SCALED_BILINEAR_ macros. Fast path entries are not enabled yet.
2011-06-28	sse2: Declare bilinear src_8888_8888 REPEAT_NORMAL composite function	Taekyun Kim	1	-0/+5
	Now bilinear template support REPEAT_NORMAL, so declare composite functions using it. Function is just declared not used yet.
2011-06-28	REPEAT_NORMAL support for bilinear fast path template	Taekyun Kim	1	-0/+90
	The basic idea is to break down normal repeat into a set of non-repeat scanline compositions and stitching them together. Bilinear may interpolate last and first pixels of source scanline. In this case, we can use temporary wrap around buffer.
2011-06-28	Replace boolean arguments with flags for bilinear fast path template	Taekyun Kim	3	-30/+49
	By replacing boolean arguments with flags, the code can be more readable and flags can be extended to do some more things later. Currently following flags are defined. FLAG_NONE - No flags are turned on. FLAG_HAVE_SOLID_MASK - Template will generate solid mask composite functions. FLAG_HAVE_NON_SOLID_MASK - Template will generate bits mask composite functions. FLAG_HAVE_SOLID_MASK and FLAG_NON_SOLID_MASK should be mutually exclusive.
2011-06-25	test: Make fuzzer-find-diff.pl executable	Søren Sandmann	1	-0/+0

2011-06-25	ARM: Fix two bugs in neon_composite_over_n_8888_0565_ca().	Søren Sandmann	1	-4/+4
	The first bug is that a vmull.u8 instruction would store its result in the q1 register, clobbering the d2 register used later on. The second is that a vraddhn instruction would overwrite d25, corrupting the q12 register used later. Fixing the second bug caused a pipeline bubble where the d18 register would be unavailable for a clock cycle. This is fixed by swapping the instruction with its successor.
2011-06-25	blitters-test: Make common formats more likely to be tested.	Søren Sandmann Pedersen	1	-8/+14
	Move the eight most common formats to the top of the list of image formats and make create_random_image() much more likely to select one of those eight formats. This should help catch more bugs in SIMD optimized operations.
2011-06-23	Silence autoconf warnings	Andrea Canciani	1	-20/+20
	Autoconf 2.86 reports: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body Every code fragment must be wrapped in [AC_LANG_SOURCE([...])]
2011-06-20	Replace argumentxs to composite functions with a pointer to a struct	Søren Sandmann Pedersen	11	-1041/+278
	This allows more information, such as flags or the composite region, to be passed to the composite functions.
2011-06-12	In pixman-general.c rename image_parameters to {src, mask, dest}_image	Søren Sandmann Pedersen	1	-17/+16
	All the fast paths generally use these names as well.
2011-06-12	Replace instances of "dst_" with "dest_"	Søren Sandmann Pedersen	11	-275/+275
	The variables in question were dst_x, dst_y, dst_image. The majority of _x and _y uses were already dest_x and dest_y, while the majority of _image uses were dst_image.
2011-05-31	demos: Comment out some unused variables	Søren Sandmann	2	-1/+7

2011-05-31	sse2: Delete some unused variables	Søren Sandmann	1	-14/+4

2011-05-31	mmx: Delete some unused variables	Søren Sandmann	1	-14/+3

2011-05-29	Include noop in win32 builds	Andrea Canciani	1	-0/+1

2011-05-24	Fix a few typos in pixman-combine.c.template	Nis Martensen	1	-4/+3
	Some equations have too much multiplication with alpha.
2011-05-19	Move NOP src iterator into noop implementation.	Søren Sandmann Pedersen	2	-9/+6
	The iterator for sources where neither RGB nor ALPHA is needed, really belongs in the noop implementation.