XAA.HOWTO This file describes how to add basic XAA support to a chipset driver. 0) What is XAA 1) XAA Initialization and Shutdown 2) The Primitives 2.0 Generic Flags 2.1 Screen to Screen Copies 2.2 Solid Fills 2.3 Solid Lines 2.4 Dashed Lines 2.5 Color Expand Fills 2.5.1 Screen to Screen Color Expansion 2.5.2 CPU to Screen Color Expansion 2.5.2.1 The Direct Method 2.5.2.2 The Indirect Method 2.6 8x8 Mono Pattern Fills 2.7 8x8 Color Pattern Fills 2.8 Image Writes 2.8.1 The Direct Method 2.8.2 The Indirect Method 2.9 Clipping 3) The Pixmap Cache 4) Offscreen Pixmaps /********************************************************************/ 0) WHAT IS XAA XAA (the XFree86 Acceleration Architecture) is a device dependent layer that encapsulates the unaccelerated framebuffer rendering layer, intercepting rendering commands sent to it from higher levels of the server. For rendering tasks where hardware acceleration is not possible, XAA allows the requests to proceed to the software rendering code. Otherwise, XAA breaks the sometimes complicated X primitives into simpler primitives more suitable for hardware acceleration and will use accelerated functions exported by the chipset driver to render these. XAA provides a simple, easy to use driver interface that allows the driver to communicate its acceleration capabilities and restrictions back to XAA. XAA will use the information provided by the driver to determine whether or not acceleration will be possible for a particular X primitive. 1) XAA INITIALIZATION AND SHUTDOWN All relevant prototypes and defines are in xaa.h. To Initialize the XAA layer, the driver should allocate an XAAInfoRec via XAACreateInfoRec(), fill it out as described in this document and pass it to XAAInit(). XAAInit() must be called _after_ the framebuffer initialization (usually cfb?ScreenInit or similar) since it is "wrapping" that layer. XAAInit() should be called _before_ the cursor initialization (usually miDCInitialize) since the cursor layer needs to "wrap" all the rendering code including XAA. When shutting down, the driver should free the XAAInfoRec structure in its CloseScreen function via XAADestroyInfoRec(). The prototypes for the functions mentioned above are as follows: XAAInfoRecPtr XAACreateInfoRec(void); Bool XAAInit(ScreenPtr, XAAInfoRecPtr); void XAADestroyInfoRec(XAAInfoRec); The driver informs XAA of it's acceleration capablities by filling out an XAAInfoRec structure and passing it to XAAInit(). The XAAInfoRec structure contains many fields, most of which are function pointers and flags. Each primitive will typically have two functions and a set of flags associated with it, but it may have more. These two functions are the "SetupFor" and "Subsequent" functions. The "SetupFor" function tells the driver that the hardware should be initialized for a particular type of graphics operation. After the "SetupFor" function, one or more calls to the "Subsequent" function will be made to indicate that an instance of the particular primitive should be rendered by the hardware. The details of each instance (width, height, etc...) are given with each "Subsequent" function. The set of flags associated with each primitive lets the driver tell XAA what its hardware limitations are (eg. It doesn't support a planemask, it can only do one of the raster-ops, etc...). Of the XAAInfoRec fields, one is required. This is the Sync function. XAA initialization will fail if this function is not provided. void Sync(ScrnInfoPtr pScrn) /* Required */ Sync will be called when XAA needs to be certain that all graphics coprocessor operations are finished, such as when the framebuffer must be written to or read from directly and it must be certain that the accelerator will not be overwriting the area of interest. One needs to make certain that the Sync function not only waits for the accelerator fifo to empty, but that it waits for the rendering of that last operation to complete. It is guaranteed that no direct framebuffer access will occur after a "SetupFor" or "Subsequent" function without the Sync function being called first. 2) THE PRIMITIVES 2.0 Generic Flags Each primitive type has a set of flags associated with it which allow the driver to tell XAA what the hardware limitations are. The common ones are as follows: /* Foreground, Background, rop and planemask restrictions */ GXCOPY_ONLY This indicates that the accelerator only supports GXcopy for the particular primitive. ROP_NEEDS_SOURCE This indicates that the accelerator doesn't supports a particular primitive with rops that don't involve the source. These rops are GXclear, GXnoop, GXinvert and GXset. If neither this flag nor GXCOPY_ONLY is defined, it is assumed that the accelerator supports all 16 raster operations (rops) for that primitive. NO_PLANEMASK This indicates that the accelerator does not support a hardware write planemask for the particular primitive. RGB_EQUAL This indicates that the particular primitive requires the red, green and blue bytes of the foreground color (and background color, if applicable) to be equal. This is useful for 24bpp when a graphics coprocessor is used in 8bpp mode, which is not uncommon in older hardware since some have no support for or only limited support for acceleration at 24bpp. This way, many operations will be accelerated for the common case of "grayscale" colors. This flag should only be used in 24bpp. In addition to the common ones listed above which are possible for nearly all primitives, each primitive may have its own flags specific to that primitive. If such flags exist they are documented in the descriptions of those primitives below. 2.1 Screen to Screen Copies The SetupFor and Subsequent ScreenToScreenCopy functions provide an interface for copying rectangular areas from video memory to video memory. To accelerate this primitive the driver should provide both the SetupFor and Subsequent functions and indicate the hardware restrictions via the ScreenToScreenCopyFlags. The NO_PLANEMASK, GXCOPY_ONLY and ROP_NEEDS_SOURCE flags as described in Section 2.0 are valid as well as the following: NO_TRANSPARENCY This indicates that the accelerator does not support skipping of color keyed pixels when copying from the source to the destination. TRANSPARENCY_GXCOPY_ONLY This indicates that the accelerator supports skipping of color keyed pixels only when the rop is GXcopy. ONLY_LEFT_TO_RIGHT_BITBLT This indicates that the hardware only accepts blitting when the x direction is positive. ONLY_TWO_BITBLT_DIRECTIONS This indicates that the hardware can only cope with blitting when the direction of x is the same as the direction in y. void SetupForScreenToScreenCopy( ScrnInfoPtr pScrn, int xdir, int ydir, int rop, unsigned int planemask, int trans_color ) When this is called, SubsequentScreenToScreenCopy will be called one or more times directly after. If ydir is 1, then the accelerator should copy starting from the top (minimum y) of the source and proceed downward. If ydir is -1, then the accelerator should copy starting from the bottom of the source (maximum y) and proceed upward. If xdir is 1, then the accelerator should copy each y scanline starting from the leftmost pixel of the source. If xdir is -1, it should start from the rightmost pixel. If trans_color is not -1 then trans_color indicates that the accelerator should not copy pixels with the color trans_color from the source to the destination, but should skip them. Trans_color is always -1 if the NO_TRANSPARENCY flag is set. void SubsequentScreenToScreenCopy(ScrnInfoPtr pScrn, int x1, int y1, int x2, int y2, int width, int height) Copy a rectangle "width" x "height" from the source (x1,y1) to the destination (x2,y2) using the parameters passed by the last SetupForScreenToScreenCopy call. (x1,y1) and (x2,y2) always denote the upper left hand corners of the source and destination regardless of which xdir and ydir values are given by SetupForScreenToScreenCopy. 2.2 Solid Fills The SetupFor and Subsequent SolidFill(Rect/Trap) functions provide an interface for filling rectangular areas of the screen with a foreground color. To accelerate this primitive the driver should provide both the SetupForSolidFill and SubsequentSolidFillRect functions and indicate the hardware restrictions via the SolidFillFlags. The driver may optionally provide a SubsequentSolidFillTrap if it is capable of rendering the primitive correctly. The GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags as described in Section 2.0 are valid. void SetupForSolidFill(ScrnInfoPtr pScrn, int color, int rop, unsigned int planemask) SetupForSolidFill indicates that any combination of the following may follow it. SubsequentSolidFillRect SubsequentSolidFillTrap void SubsequentSolidFillRect(ScrnInfoPtr pScrn, int x, int y, int w, int h) Fill a rectangle of dimensions "w" by "h" with origin at (x,y) using the color, rop and planemask given by the last SetupForSolidFill call. void SubsequentSolidFillTrap(ScrnInfoPtr pScrn, int y, int h, int left, int dxL, int dyL, int eL, int right, int dxR, int dyR, int eR) These parameters describe a trapezoid via a version of Bresenham's parameters. "y" is the top line. "h" is the number of spans to be filled in the positive Y direction. "left" and "right" indicate the starting X values of the left and right edges. dy/dx describes the edge slope. These are not the deltas between the beginning and ending points on an edge. They merely describe the slope. "e" is the initial error term. It's the relationships between dx, dy and e that define the edge. If your engine does not do bresenham trapezoids or does not allow the programmer to specify the error term then you are not expected to be able to accelerate them. 2.3 Solid Lines XAA provides an interface for drawing thin lines. In order to draw X lines correctly a high degree of accuracy is required. This usually limits line acceleration to hardware which has a Bresenham line engine, though depending on the algorithm used, other line engines may come close if they accept 16 bit line deltas. XAA has both a Bresenham line interface and a two-point line interface for drawing lines of arbitrary orientation. Additionally there is a SubsequentSolidHorVertLine which will be used for all horizontal and vertical lines. Horizontal and vertical lines are handled separately since hardware that doesn't have a line engine (or has one that is unusable due to precision problems) can usually draw these lines by some other method such as drawing them as thin rectangles. Even for hardware that can draw arbitrary lines via the Bresenham or two-point interfaces, the SubsequentSolidHorVertLine is used for horizontal and vertical lines since most hardware is able to render the horizontal lines and sometimes the vertical lines faster by other methods (Hint: try rendering horizontal lines as flattened rectangles). If you have not provided a SubsequentSolidHorVertLine but you have provided Bresenham or two-point lines, a SubsequentSolidHorVertLine function will be supplied for you. The flags field associated with Solid Lines is SolidLineFlags and the GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags as described in Section 2.0 are valid restrictions. Some line engines have line biases hardcoded to comply with Microsoft line biasing rules. A tell-tale sign of this is the hardware lines not matching the software lines in the zeroth and fourth octants. The driver can set the flag: MICROSOFT_ZERO_LINE_BIAS in the AccelInfoRec.Flags field to adjust the software lines to match the hardware lines. This is in the generic flags field rather than the SolidLineFlags since this flag applies to all software zero-width lines on the screen and not just the solid ones. void SetupForSolidLine(ScrnInfoPtr pScrn, int color, int rop, unsigned int planemask) SetupForSolidLine indicates that any combination of the following may follow it. SubsequentSolidBresenhamLine SubsequentSolidTwoPointLine SubsequentSolidHorVertLine void SubsequentSolidHorVertLine( ScrnInfoPtr pScrn, int x, int y, int len, int dir ) All vertical and horizontal solid thin lines are rendered with this function. The line starts at coordinate (x,y) and extends "len" pixels inclusive. In the direction indicated by "dir." The direction is either DEGREES_O or DEGREES_270. That is, it always extends to the right or down. void SubsequentSolidTwoPointLine(ScrnInfoPtr pScrn, int x1, int y1, int x2, int y2, int flags) Draw a line from (x1,y1) to (x2,y2). If the flags field contains the flag OMIT_LAST, the last pixel should not be drawn. Otherwise, the pixel at (x2,y2) should be drawn. If you use the TwoPoint line interface there is a good possibility that your line engine has hard-coded line biases that do not match the default X zero-width lines. If so, you may need to set the MICROSOFT_ZERO_LINE_BIAS flag described above. Note that since any vertex in the 16-bit signed coordinate system is valid, your line engine is expected to handle 16-bit values if you have hardware line clipping enabled. If your engine cannot handle 16-bit values, you should not use hardware line clipping. void SubsequentSolidBresenhamLine(ScrnInfoPtr pScrn, int x, int y, int major, int minor, int err, int len, int octant) "X" and "y" are the starting point of the line. "Major" and "minor" are the major and minor step constants. "Err" is the initial error term. "Len" is the number of pixels to be drawn (inclusive). "Octant" can be any combination of the following flags OR'd together: Y_MAJOR Y is the major axis (X otherwise) X_DECREASING The line is drawn from right to left Y_DECREASING The line is drawn from bottom to top The major, minor and err terms are the "raw" Bresenham parameters consistent with a line engine that does: e = err; while(len--) { DRAW_POINT(x,y); e += minor; if(e >= 0) { e -= major; TAKE_ONE_STEP_ALONG_MINOR_AXIS; } TAKE_ONE_STEP_ALONG_MAJOR_AXIS; } IBM 8514 style Bresenham line interfaces require their parameters modified in the following way: Axial = minor; Diagonal = minor - major; Error = minor + err; SolidBresenhamLineErrorTermBits This field allows the driver to tell XAA how many bits large its Bresenham parameter registers are. Many engines have registers that only accept 12 or 13 bit Bresenham parameters, and the parameters for clipped lines may overflow these if they are not scaled down. If this field is not set, XAA will assume the engine can accomodate 16 bit parameters, otherwise, it will scale the parameters to the size specified. 2.4 Dashed Lines The same degree of accuracy required by the solid lines is required for drawing dashed lines as well. The dash pattern itself is a buffer of binary data where ones are expanded into the foreground color and zeros either correspond to the background color or indicate transparency depending on whether or not DoubleDash or OnOffDashes are being drawn. The flags field associated with dashed Lines is DashedLineFlags and the GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags as described in Section 2.0 are valid restrictions. Additionally, the following flags are valid: NO_TRANSPARENCY This indicates that the driver cannot support dashed lines with transparent backgrounds (OnOffDashes). TRANSPARENCY_ONLY This indicates that the driver cannot support dashes with both a foreground and background color (DoubleDashes). LINE_PATTERN_POWER_OF_2_ONLY This indicates that only patterns with a power of 2 length can be accelerated. LINE_PATTERN_LSBFIRST_MSBJUSTIFIED LINE_PATTERN_LSBFIRST_LSBJUSTIFIED LINE_PATTERN_MSBFIRST_MSBJUSTIFIED LINE_PATTERN_MSBFIRST_LSBJUSTIFIED These describe how the line pattern should be packed. The pattern buffer is DWORD padded. LSBFIRST indicates that the pattern runs from the LSB end to the MSB end. MSBFIRST indicates that the pattern runs from the MSB end to the LSB end. When the pattern does not completely fill the DWORD padded buffer, the pattern will be justified towards the MSB or LSB end based on the flags above. The following field indicates the maximum length dash pattern that should be accelerated. int DashPatternMaxLength void SetupForDashedLine(ScrnInfoPtr pScrn, int fg, int bg, int rop, unsigned int planemask, int length, unsigned char *pattern) SetupForDashedLine indicates that any combination of the following may follow it. SubsequentDashedBresenhamLine SubsequentDashedTwoPointLine If "bg" is -1, then the background (pixels corresponding to clear bits in the pattern) should remain unmodified. "Bg" indicates the background color otherwise. "Length" indicates the length of the pattern in bits and "pattern" points to the DWORD padded buffer holding the pattern which has been packed according to the flags set above. void SubsequentDashedTwoPointLine( ScrnInfoPtr pScrn, int x1, int y1, int x2, int y2, int flags, int phase) void SubsequentDashedBresenhamLine(ScrnInfoPtr pScrn, int x1, int y1, int major, int minor, int err, int len, int octant, int phase) These are the same as the SubsequentSolidTwoPointLine and SubsequentBresenhamLine functions except for the addition of the "phase" field which indicates the offset into the dash pattern that the pixel at (x1,y1) corresponds to. As with the SubsequentBresenhamLine, there is an int DashedBresenhamLineErrorTermBits field which indicates the size of the error term registers used with dashed lines. This is usually the same value as the field for the solid lines (because it's usually the same register). 2.5 Color Expansion Fills When filling a color expansion rectangle, the accelerator paints each pixel depending on whether or not a bit in a corresponding bitmap is set or clear. Opaque expansions are when a set bit corresponds to the foreground color and a clear bit corresponds to the background color. A transparent expansion is when a set bit corresponds to the foreground color and a clear bit indicates that the pixel should remain unmodified. The graphics accelerator usually has access to the source bitmap in one of two ways: 1) the bitmap data is sent serially to the accelerator by the CPU through some memory mapped aperture or 2) the accelerator reads the source bitmap out of offscreen video memory. Some types of primitives are better suited towards one method or the other. Type 2 is useful for reusable patterns such as stipples which can be cached in offscreen memory. The aperature method can be used for stippling but the CPU must pass the data across the bus each time a stippled fill is to be performed. For expanding 1bpp client pixmaps or text strings to the screen, the aperature method is usually superior because the intermediate copy in offscreen memory needed by the second method would only be used once. Unfortunately, many accelerators can only do one of these methods and not both. XAA provides both ScreenToScreen and CPUToScreen color expansion interfaces for doing color expansion fills. The ScreenToScreen functions can only be used with hardware that supports reading of source bitmaps from offscreen video memory, and these are only used for cacheable patterns such as stipples. There are two variants of the CPUToScreen routines - a direct method intended for hardware that has a transfer aperature, and an indirect method intended for hardware without transfer aperatures or hardware with unusual transfer requirements. Hardware that can only expand bitmaps from video memory should supply ScreenToScreen routines but also ScanlineCPUToScreen (indirect) routines to optimize transfers of non-cacheable data. Hardware that can only accept source bitmaps through an aperature should supply CPUToScreen (or ScanlineCPUToScreen) routines. Hardware that can do both should provide both ScreenToScreen and CPUToScreen routines. For both ScreenToScreen and CPUToScreen interfaces, the GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags described in Section 2.0 are valid as well as the following: /* bit order requirements (one of these must be set) */ BIT_ORDER_IN_BYTE_LSBFIRST This indicates that least significant bit in each byte of the source data corresponds to the leftmost of that block of 8 pixels. This is the prefered format. BIT_ORDER_IN_BYTE_MSBFIRST This indicates that most significant bit in each byte of the source data corresponds to the leftmost of that block of 8 pixels. /* transparency restrictions */ NO_TRANSPARENCY This indicates that the accelerator cannot do a transparent expansion. TRANSPARENCY_ONLY This indicates that the accelerator cannot do an opaque expansion. In cases where where the background needs to be filled, XAA will render the primitive in two passes when using the CPUToScreen interface, but will not do so with the ScreenToScreen interface since that would require caching of two patterns. Some ScreenToScreen hardware may be able to render two passes at the driver level and remove the TRANSPARENCY_ONLY restriction if it can render pixels corresponding to the zero bits. 2.5.1 Screen To Screen Color Expansion The ScreenToScreenColorExpandFill routines provide an interface for doing expansion blits from source patterns stored in offscreen video memory. void SetupForScreenToScreenColorExpandFill (ScrnInfoPtr pScrn, int fg, int bg, int rop, unsigned int planemask) Ones in the source bitmap will correspond to the fg color. Zeros in the source bitmap will correspond to the bg color unless bg = -1. In that case the pixels corresponding to the zeros in the bitmap shall be left unmodified by the accelerator. For hardware that doesn't allow an easy implementation of skipleft, the driver can replace CacheMonoStipple function with one that stores multiple rotated copies of the stipple and select between them. In this case the driver should set CacheColorExpandDensity to tell XAA how many copies of the pattern are stored in the width of a cache slot. For instance if the hardware can specify the starting address in bytes, then 8 rotated copies of the stipple are needed and CacheColorExpandDensity should be set to 8. void SubsequentScreenToScreenColorExpandFill( ScrnInfoPtr pScrn, int x, int y, int w, int h, int srcx, int srcy, int offset ) Fill a rectangle "w" x "h" at location (x,y). The source pitch between scanlines is the framebuffer pitch (pScrn->displayWidth pixels) and srcx and srcy indicate the start of the source pattern in units of framebuffer pixels. "Offset" indicates the bit offset into the pattern that corresponds to the pixel being painted at "x" on the screen. Some hardware accepts source coordinates in units of bits which makes implementation of the offset trivial. In that case, the bit address of the source bit corresponding to the pixel painted at (x,y) would be: (srcy * pScrn->displayWidth + srcx) * pScrn->bitsPerPixel + offset It should be noted that the offset assumes LSBFIRST hardware. For MSBFIRST hardware, the driver may need to implement the offset by bliting only from byte boundaries and hardware clipping. 2.5.2 CPU To Screen Color Expansion The CPUToScreenColorExpandFill routines provide an interface for doing expansion blits from source patterns stored in system memory. There are two varieties of this primitive, a CPUToScreenColorExpandFill and a ScanlineCPUToScreenColorExpandFill. With the CPUToScreenColorExpandFill method, the source data is sent serially through a memory mapped aperature. With the Scanline version, the data is rendered scanline at a time into intermediate buffers with a call to SubsequentColorExpandScanline following each scanline. These two methods have separate flags fields, the CPUToScreenColorExpandFillFlags and ScanlineCPUToScreenColorExpandFillFlags respectively. Flags specific to one method or the other are described in sections 2.5.2.1 and 2.5.2.2 but for both cases the bit order and transparency restrictions listed at the beginning of section 2.5 are valid as well as the following: /* clipping (optional) */ LEFT_EDGE_CLIPPING This indicates that the accelerator supports omission of up to 31 pixels on the left edge of the rectangle to be filled. This is beneficial since it allows transfer of the source bitmap to always occur from DWORD boundaries. LEFT_EDGE_CLIPPING_NEGATIVE_X This flag indicates that the accelerator can render color expansion rectangles even if the value of x origin is negative (off of the screen on the left edge). /* misc */ TRIPLE_BITS_24BPP When enabled (must be in 24bpp mode), color expansion functions are expected to require three times the amount of bits to be transferred so that 24bpp grayscale colors can be used with color expansion in 8bpp coprocessor mode. Each bit is expanded to 3 bits when writing the monochrome data. 2.5.1 The Direct Method Using the direct method of color expansion XAA will send all bitmap data to the accelerator serially through an memory mapped transfer window defined by the following two fields: unsigned char *ColorExpandBase This indicates the memory address of the beginning of the aperture. int ColorExpandRange This indicates the size in bytes of the aperture. The driver should specify how the transfered data should be padded. There are options for both the padding of each Y scanline and for the total transfer to the aperature. One of the following two flags must be set: CPU_TRANSFER_PAD_DWORD This indicates that the total transfer (sum of all scanlines) sent to the aperature must be DWORD padded. This is the default behavior. CPU_TRANSFER_PAD_QWORD This indicates that the total transfer (sum of all scanlines) sent to the aperature must be QWORD padded. With this set, XAA will send an extra DWORD to the aperature when needed to ensure that only an even number of DWORDs are sent. And then there are the flags for padding of each scanline: SCANLINE_PAD_DWORD This indicates that each Y scanline should be DWORD padded. This is the only option available and is the default. Finally, there is the CPU_TRANSFER_BASE_FIXED flag which indicates that the aperture is a single register rather than a range of registers, and XAA should write all of the data to the first DWORD. If the ColorExpandRange is not large enough to accomodate scanlines the width of the screen, this option will be forced. That is, the ColorExpandRange must be: ((virtualX + 31)/32) * 4 bytes or more. ((virtualX + 62)/32 * 4) if LEFT_EDGE_CLIPPING_NEGATIVE_X is set. If the TRIPLE_BITS_24BPP flag is set, the required area should be multiplied by three. void SetupForCPUToScreenColorExpandFill(ScrnInfoPtr pScrn, int fg, int bg, int rop, unsigned int planemask) Ones in the source bitmap will correspond to the fg color. Zeros in the source bitmap will correspond to the bg color unless bg = -1. In that case the pixels corresponding to the zeros in the bitmap shall be left unmodified by the accelerator. void SubsequentCPUToScreenColorExpandFill(ScrnInfoPtr pScrn, int x, int y, int w, int h, int skipleft ) When this function is called, the accelerator should be setup to fill a rectangle of dimension "w" by "h" with origin at (x,y) in the fill style prescribed by the last call to SetupForCPUToScreenColorExpandFill. XAA will pass the data to the aperture immediately after this function is called. If the skipleft is non-zero (and LEFT_EDGE_CLIPPING has been enabled), then the accelerator _should_not_ render skipleft pixels on the leftmost edge of the rectangle. Some engines have an alignment feature like this built in, some others can do this using a clipping window. It can be arranged for XAA to call Sync() after it is through calling the Subsequent function by setting SYNC_AFTER_COLOR_EXPAND in the CPUToScreenColorExpandFillFlags. This can provide the driver with an oportunity to reset a clipping window if needed. 2.5.2 The Indirect Method Using the indirect method, XAA will render the bitmap data scanline at a time to one or more buffers. These buffers may be memory mapped apertures or just intermediate storage. int NumScanlineColorExpandBuffers This indicates the number of buffers available. unsigned char **ScanlineColorExpandBuffers This is an array of pointers to the memory locations of each buffer. Each buffer is expected to be large enough to accommodate scanlines the width of the screen. That is: ((virtualX + 31)/32) * 4 bytes or more. ((virtualX + 62)/32 * 4) if LEFT_EDGE_CLIPPING_NEGATIVE_X is set. Scanlines are always DWORD padded. If the TRIPLE_BITS_24BPP flag is set, the required area should be multiplied by three. void SetupForScanlineCPUToScreenColorExpandFill(ScrnInfoPtr pScrn, int fg, int bg, int rop, unsigned int planemask) Ones in the source bitmap will correspond to the fg color. Zeros in the source bitmap will correspond to the bg color unless bg = -1. In that case the pixels corresponding to the zeros in the bitmap shall be left unmodified by the accelerator. void SubsequentScanlineCPUToScreenColorExpandFill(ScrnInfoPtr pScrn, int x, int y, int w, int h, int skipleft ) void SubsequentColorExpandScanline(ScrnInfoPtr pScrn, int bufno) When SubsequentScanlineCPUToScreenColorExpandFill is called, XAA will begin transfering the source data scanline at a time, calling SubsequentColorExpandScanline after each scanline. If more than one buffer is available, XAA will cycle through the buffers. Subsequent scanlines will use the next buffer and go back to the buffer 0 again when the last buffer is reached. The index into the ScanlineColorExpandBuffers array is presented as "bufno" with each SubsequentColorExpandScanline call. The skipleft field is the same as for the direct method. The indirect method can be use to send the source data directly to a memory mapped aperture represented by a single color expand buffer, scanline at a time, but more commonly it is used to place the data into offscreen video memory so that the accelerator can blit it to the visible screen from there. In the case where the accelerator permits rendering into offscreen video memory while the accelerator is active, several buffers can be used so that XAA can be placing source data into the next buffer while the accelerator is blitting the current buffer. For cases where the accelerator requires some special manipulation of the source data first, the buffers can be in system memory. The CPU can manipulate these buffers and then send the data to the accelerator. 2.6 8x8 Mono Pattern Fills XAA provides support for two types of 8x8 hardware patterns - "Mono" patterns and "Color" patterns. Mono pattern data is 64 bits of color expansion data with ones indicating the foreground color and zeros indicating the background color. The source bitmaps for the 8x8 mono patterns can be presented to the graphics accelerator in one of two ways. They can be passed as two DWORDS to the 8x8 mono pattern functions or they can be cached in offscreen memory and their locations passed to the 8x8 mono pattern functions. In addition to the GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags defined in Section 2.0, the following are defined for the Mono8x8PatternFillFlags: HARDWARE_PATTERN_PROGRAMMED_BITS This indicates that the 8x8 patterns should be packed into two DWORDS and passed to the 8x8 mono pattern functions. The default behavior is to cache the patterns in offscreen video memory and pass the locations of these patterns to the functions instead. The pixmap cache must be enabled for the default behavior (8x8 pattern caching) to work. See Section 3 for how to enable the pixmap cache. The pixmap cache is not necessary for HARDWARE_PATTERN_PROGRAMMED_BITS. HARDWARE_PATTERN_PROGRAMMED_ORIGIN If the hardware supports programmable pattern offsets then this option should be set. See the table below for further infomation. HARDWARE_PATTERN_SCREEN_ORIGIN Some hardware wants the pattern offset specified with respect to the upper left-hand corner of the primitive being drawn. Other hardware needs the option HARDWARE_PATTERN_SCREEN_ORIGIN set to indicate that all pattern offsets should be referenced to the upper left-hand corner of the screen. HARDWARE_PATTERN_SCREEN_ORIGIN is preferable since this is more natural for the X-Window system and offsets will have to be recalculated for each Subsequent function otherwise. BIT_ORDER_IN_BYTE_MSBFIRST BIT_ORDER_IN_BYTE_LSBFIRST As with other color expansion routines this indicates whether the most or the least significant bit in each byte from the pattern is the leftmost on the screen. TRANSPARENCY_ONLY NO_TRANSPARENCY This means the same thing as for the color expansion rect routines except that for TRANSPARENCY_ONLY XAA will not render the primitive in two passes since this is more easily handled by the driver. It is recommended that TRANSPARENCY_ONLY hardware handle rendering of opaque patterns in two passes (the background can be filled as a rectangle in GXcopy) in the Subsequent function so that the TRANSPARENCY_ONLY restriction can be removed. Additional information about cached patterns... For the case where HARDWARE_PATTERN_PROGRAMMED_BITS is not set and the pattern must be cached in offscreen memory, the first pattern starts at the cache slot boundary which is set by the CachePixelGranularity field used to configure the pixmap cache. One should ensure that the CachePixelGranularity reflects any alignment restrictions that the accelerator may put on 8x8 pattern storage locations. When HARDWARE_PATTERN_PROGRAMMED_ORIGIN is set there is only one pattern stored. When this flag is not set, all 64 pre-rotated copies of the pattern are cached in offscreen memory. The MonoPatternPitch field can be used to specify the X position pixel granularity that each of these patterns must align on. If the MonoPatternPitch is not supplied, the patterns will be densely packed within the cache slot. The behavior of the default XAA 8x8 pattern caching mechanism to store all 8x8 patterns linearly in video memory. If the accelerator needs the patterns stored in a more unusual fashion, the driver will need to provide its own 8x8 mono pattern caching routines for XAA to use. The following table describes the meanings of the "patx" and "paty" fields in both the SetupFor and Subsequent functions. With HARDWARE_PATTERN_SCREEN_ORIGIN ----------------------------------- HARDWARE_PATTERN_PROGRAMMED_BITS and HARDWARE_PATTERN_PROGRAMMED_ORIGIN SetupFor: patx and paty are the first and second DWORDS of the 8x8 mono pattern. Subsequent: patx and paty are the x,y offset into that pattern. All Subsequent calls will have the same offset in the case of HARDWARE_PATTERN_SCREEN_ORIGIN so only the offset specified by the first Subsequent call after a SetupFor call will need to be observed. HARDWARE_PATTERN_PROGRAMMED_BITS only SetupFor: patx and paty hold the first and second DWORDS of the 8x8 mono pattern pre-rotated to match the desired offset. Subsequent: These just hold the same patterns and can be ignored. HARDWARE_PATTERN_PROGRAMMED_ORIGIN only SetupFor: patx and paty hold the x,y coordinates of the offscreen memory location where the 8x8 pattern is stored. The bits are stored linearly in memory at that location. Subsequent: patx and paty hold the offset into the pattern. All Subsequent calls will have the same offset in the case of HARDWARE_PATTERN_SCREEN_ORIGIN so only the offset specified by the first Subsequent call after a SetupFor call will need to be observed. Neither programmed bits or origin SetupFor: patx and paty hold the x,y coordinates of the offscreen memory location where the pre-rotated 8x8 pattern is stored. Subsequent: patx and paty are the same as in the SetupFor function and can be ignored. Without HARDWARE_PATTERN_SCREEN_ORIGIN -------------------------------------- HARDWARE_PATTERN_PROGRAMMED_BITS and HARDWARE_PATTERN_PROGRAMMED_ORIGIN SetupFor: patx and paty are the first and second DWORDS of the 8x8 mono pattern. Subsequent: patx and paty are the x,y offset into that pattern. HARDWARE_PATTERN_PROGRAMMED_BITS only SetupFor: patx and paty holds the first and second DWORDS of the unrotated 8x8 mono pattern. This can be ignored. Subsequent: patx and paty hold the rotated 8x8 pattern to be rendered. HARDWARE_PATTERN_PROGRAMMED_ORIGIN only SetupFor: patx and paty hold the x,y coordinates of the offscreen memory location where the 8x8 pattern is stored. The bits are stored linearly in memory at that location. Subsequent: patx and paty hold the offset into the pattern. Neither programmed bits or origin SetupFor: patx and paty hold the x,y coordinates of the offscreen memory location where the unrotated 8x8 pattern is stored. This can be ignored. Subsequent: patx and paty hold the x,y coordinates of the rotated 8x8 pattern to be rendered. void SetupForMono8x8PatternFill(ScrnInfoPtr pScrn, int patx, int paty, int fg, int bg, int rop, unsigned int planemask) SetupForMono8x8PatternFill indicates that any combination of the following may follow it. SubsequentMono8x8PatternFillRect SubsequentMono8x8PatternFillTrap The fg, bg, rop and planemask fields have the same meaning as the ones used for the other color expansion routines. Patx's and paty's meaning can be determined from the table above. void SubsequentMono8x8PatternFillRect( ScrnInfoPtr pScrn, int patx, int paty, int x, int y, int w, int h) Fill a rectangle of dimensions "w" by "h" with origin at (x,y) using the parameters give by the last SetupForMono8x8PatternFill call. The meanings of patx and paty can be determined by the table above. void SubsequentMono8x8PatternFillTrap( ScrnInfoPtr pScrn, int patx, int paty, int y, int h, int left, int dxL, int dyL, int eL, int right, int dxR, int dyR, int eR ) The meanings of patx and paty can be determined by the table above. The rest of the fields have the same meanings as those in the SubsequentSolidFillTrap function. 2.7 8x8 Color Pattern Fills 8x8 color pattern data is 64 pixels of full color data that is stored linearly in offscreen video memory. 8x8 color patterns are useful as a substitute for 8x8 mono patterns when tiling, doing opaque stipples, or in the case where transperency is supported, regular stipples. 8x8 color pattern fills also have the additional benefit of being able to tile full color 8x8 patterns instead of just 2 color ones like the mono patterns. However, full color 8x8 patterns aren't used very often in the X Window system so you might consider passing this primitive by if you already can do mono patterns, especially if they require alot of cache area. Color8x8PatternFillFlags is the flags field for this primitive and the GXCOPY_ONLY, ROP_NEEDS_SOURCE and NO_PLANEMASK flags as described in Section 2.0 are valid as well as the following: HARDWARE_PATTERN_PROGRAMMED_ORIGIN If the hardware supports programmable pattern offsets then this option should be set. HARDWARE_PATTERN_SCREEN_ORIGIN Some hardware wants the pattern offset specified with respect to the upper left-hand corner of the primitive being drawn. Other hardware needs the option HARDWARE_PATTERN_SCREEN_ORIGIN set to indicate that all pattern offsets should be referenced to the upper left-hand corner of the screen. HARDWARE_PATTERN_SCREEN_ORIGIN is preferable since this is more natural for the X-Window system and offsets will have to be recalculated for each Subsequent function otherwise. NO_TRANSPARENCY TRANSPARENCY_GXCOPY_ONLY These mean the same as for the ScreenToScreenCopy functions. The following table describes the meanings of patx and paty passed to the SetupFor and Subsequent fields: HARDWARE_PATTERN_PROGRAMMED_ORIGIN && HARDWARE_PATTERN_SCREEN_ORIGIN SetupFor: patx and paty hold the x,y location of the unrotated pattern. Subsequent: patx and paty hold the pattern offset. For the case of HARDWARE_PATTERN_SCREEN_ORIGIN all Subsequent calls have the same offset so only the first call will need to be observed. HARDWARE_PATTERN_PROGRAMMED_ORIGIN only SetupFor: patx and paty hold the x,y location of the unrotated pattern. Subsequent: patx and paty hold the pattern offset. HARDWARE_PATTERN_SCREEN_ORIGIN SetupFor: patx and paty hold the x,y location of the rotated pattern. Subsequent: patx and paty hold the same location as the SetupFor function so these can be ignored. neither flag SetupFor: patx and paty hold the x,y location of the unrotated pattern. This can be ignored. Subsequent: patx and paty hold the x,y location of the rotated pattern. Additional information about cached patterns... All 8x8 color patterns are cached in offscreen video memory so the pixmap cache must be enabled to use them. The first pattern starts at the cache slot boundary which is set by the CachePixelGranularity field used to configure the pixmap cache. One should ensure that the CachePixelGranularity reflects any alignment restrictions that the accelerator may put on 8x8 pattern storage locations. When HARDWARE_PATTERN_PROGRAMMED_ORIGIN is set there is only one pattern stored. When this flag is not set, all 64 rotations off the pattern are accessible but it is assumed that the accelerator is capable of accessing data stored on 8 pixel boundaries. If the accelerator has stricter alignment requirements than this the dirver will need to provide its own 8x8 color pattern caching routines. void SetupForColor8x8PatternFill(ScrnInfoPtr pScrn, int patx, int paty, int rop, unsigned int planemask, int trans_color) SetupForColor8x8PatternFill indicates that any combination of the following may follow it. SubsequentColor8x8PatternFillRect SubsequentColor8x8PatternFillTrap (not implemented yet) For the meanings of patx and paty, see the table above. Trans_color means the same as for the ScreenToScreenCopy functions. void SubsequentColor8x8PatternFillRect( ScrnInfoPtr pScrn, int patx, int paty, int x, int y, int w, int h) Fill a rectangle of dimensions "w" by "h" with origin at (x,y) using the parameters give by the last SetupForColor8x8PatternFill call. The meanings of patx and paty can be determined by the table above. void SubsequentColor8x8PatternFillTrap( ScrnInfoPtr pScrn, int patx, int paty, int y, int h, int left, int dxL, int dyL, int eL, int right, int dxR, int dyR, int eR ) For the meanings of patx and paty, see the table above. The rest of the fields have the same meanings as those in the SubsequentSolidFillTrap function. 2.8 Image Writes XAA provides a mechanism for transfering full color pixel data from system memory to video memory through the accelerator. This is useful for dealing with alignment issues and performing raster ops on the data when writing it to the framebuffer. As with color expansion rectangles, there is a direct and indirect method. The direct method sends all data through a memory mapped aperature. The indirect method sends the data to an intermediated buffer scanline at a time. The direct and indirect methods have separate flags fields, the ImageWriteFlags and ScanlineImageWriteFlags respectively. Flags specific to one method or the other are described in sections 2.8.1 and 2.8.2 but for both cases the GXCOPY_ONLY, ROP_NEEDS_SOURCE and NO_PLANEMASK flags described in Section 2.0 are valid as well as the following: NO_GXCOPY In order to have accelerated image transfers faster than the software versions for GXcopy, the engine needs to support clipping, be using the direct method and have a large enough image transfer range so that CPU_TRANSFER_BASE_FIXED doesn't need to be set. If these are not supported, then it is unlikely that transfering the data through the accelerator will be of any advantage for the simple case of GXcopy. In fact, it may be much slower. For such cases it's probably best to set the NO_GXCOPY flag so that Image writes will only be used for the more complicated rops. /* transparency restrictions */ NO_TRANSPARENCY This indicates that the accelerator does not support skipping of color keyed pixels when copying from the source to the destination. TRANSPARENCY_GXCOPY_ONLY This indicates that the accelerator supports skipping of color keyed pixels only when the rop is GXcopy. /* clipping (optional) */ LEFT_EDGE_CLIPPING This indicates that the accelerator supports omission of up to 3 pixels on the left edge of the rectangle to be filled. This is beneficial since it allows transfer from the source pixmap to always occur from DWORD boundaries. LEFT_EDGE_CLIPPING_NEGATIVE_X This flag indicates that the accelerator can fill areas with image write data even if the value of x origin is negative (off of the screen on the left edge). 2.8.1 The Direct Method Using the direct method of ImageWrite XAA will send all bitmap data to the accelerator serially through an memory mapped transfer window defined by the following two fields: unsigned char *ImageWriteBase This indicates the memory address of the beginning of the aperture. int ImageWriteRange This indicates the size in bytes of the aperture. The driver should specify how the transfered data should be padded. There are options for both the padding of each Y scanline and for the total transfer to the aperature. One of the following two flags must be set: CPU_TRANSFER_PAD_DWORD This indicates that the total transfer (sum of all scanlines) sent to the aperature must be DWORD padded. This is the default behavior. CPU_TRANSFER_PAD_QWORD This indicates that the total transfer (sum of all scanlines) sent to the aperature must be QWORD padded. With this set, XAA will send an extra DWORD to the aperature when needed to ensure that only an even number of DWORDs are sent. And then there are the flags for padding of each scanline: SCANLINE_PAD_DWORD This indicates that each Y scanline should be DWORD padded. This is the only option available and is the default. Finally, there is the CPU_TRANSFER_BASE_FIXED flag which indicates that the aperture is a single register rather than a range of registers, and XAA should write all of the data to the first DWORD. XAA will automatically select CPU_TRANSFER_BASE_FIXED if the ImageWriteRange is not large enough to accomodate an entire scanline. void SetupForImageWrite(ScrnInfoPtr pScrn, int rop, unsigned int planemask, int trans_color, int bpp, int depth) If trans_color is not -1 then trans_color indicates the transparency color key and pixels with color trans_color passed through the aperature should not be transfered to the screen but should be skipped. Bpp and depth indicate the bits per pixel and depth of the source pixmap. Trans_color is always -1 if the NO_TRANSPARENCY flag is set. void SubsequentImageWriteRect(ScrnInfoPtr pScrn, int x, int y, int w, int h, int skipleft) Data passed through the aperature should be copied to a rectangle of width "w" and height "h" with origin (x,y). If LEFT_EDGE_CLIPPING has been enabled, skipleft will correspond to the number of pixels on the left edge that should not be drawn. Skipleft is zero otherwise. It can be arranged for XAA to call Sync() after it is through calling the Subsequent functions by setting SYNC_AFTER_IMAGE_WRITE in the ImageWriteFlags. This can provide the driver with an oportunity to reset a clipping window if needed. 2.8.2 The Indirect Method Using the indirect method, XAA will render the pixel data scanline at a time to one or more buffers. These buffers may be memory mapped apertures or just intermediate storage. int NumScanlineImageWriteBuffers This indicates the number of buffers available. unsigned char **ScanlineImageWriteBuffers This is an array of pointers to the memory locations of each buffer. Each buffer is expected to be large enough to accommodate scanlines the width of the screen. That is: pScrn->VirtualX * pScreen->bitsPerPixel/8 bytes or more. If LEFT_EDGE_CLIPPING_NEGATIVE_X is set, add an additional 4 bytes to that requirement in 8 and 16bpp, 12 bytes in 24bpp. Scanlines are always DWORD padded. void SetupForScanlineImageWrite(ScrnInfoPtr pScrn, int rop, unsigned int planemask, int trans_color, int bpp, int depth) If trans_color is not -1 then trans_color indicates the transparency color key and pixels with color trans_color in the buffer should not be transfered to the screen but should be skipped. Bpp and depth indicate the bits per pixel and depth of the source bitmap. Trans_color is always -1 if the NO_TRANSPARENCY flag is set. void SubsequentImageWriteRect(ScrnInfoPtr pScrn, int x, int y, int w, int h, int skipleft) void SubsequentImageWriteScanline(ScrnInfoPtr pScrn, int bufno) When SubsequentImageWriteRect is called, XAA will begin transfering the source data scanline at a time, calling SubsequentImageWriteScanline after each scanline. If more than one buffer is available, XAA will cycle through the buffers. Subsequent scanlines will use the next buffer and go back to the buffer 0 again when the last buffer is reached. The index into the ScanlineImageWriteBuffers array is presented as "bufno" with each SubsequentImageWriteScanline call. The skipleft field is the same as for the direct method. The indirect method can be use to send the source data directly to a memory mapped aperture represented by a single image write buffer, scanline at a time, but more commonly it is used to place the data into offscreen video memory so that the accelerator can blit it to the visible screen from there. In the case where the accelerator permits rendering into offscreen video memory while the accelerator is active, several buffers can be used so that XAA can be placing source data into the next buffer while the accelerator is blitting the current buffer. For cases where the accelerator requires some special manipulation of the source data first, the buffers can be in system memory. The CPU can manipulate these buffers and then send the data to the accelerator. 2.9 Clipping XAA supports hardware clipping rectangles. To use clipping in this way it is expected that the graphics accelerator can clip primitives with verticies anywhere in the 16 bit signed coordinate system. void SetClippingRectangle ( ScrnInfoPtr pScrn, int left, int top, int right, int bottom) void DisableClipping (ScrnInfoPtr pScrn) When SetClippingRectangle is called, all hardware rendering following it should be clipped to the rectangle specified until DisableClipping is called. The ClippingFlags field indicates which operations this sort of Set/Disable pairing can be used with. Any of the following flags may be OR'd together. HARDWARE_CLIP_SCREEN_TO_SCREEN_COLOR_EXPAND HARDWARE_CLIP_SCREEN_TO_SCREEN_COPY HARDWARE_CLIP_MONO_8x8_FILL HARDWARE_CLIP_COLOR_8x8_FILL HARDWARE_CLIP_SOLID_FILL HARDWARE_CLIP_DASHED_LINE HARDWARE_CLIP_SOLID_LINE 3) XAA PIXMAP CACHE /* NOTE: XAA has no knowledge of framebuffer particulars so until the framebuffer is able to render into offscreen memory, usage of the pixmap cache requires that the driver provide ImageWrite routines or a WritePixmap or WritePixmapToCache replacement so that patterns can even be placed in the cache. ADDENDUM: XAA can now load the pixmap cache without requiring that the driver supply an ImageWrite function, but this can only be done on linear framebuffers. If you have a linear framebuffer, set LINEAR_FRAMEBUFFER in the XAAInfoRec.Flags field and XAA will then be able to upload pixmaps into the cache without the driver providing functions to do so. */ The XAA pixmap cache provides a mechanism for caching of patterns in offscreen video memory so that tiled fills and in some cases stippling can be done by blitting the source patterns from offscreen video memory. The pixmap cache also provides the mechanism for caching of 8x8 color and mono hardware patterns. Any unused offscreen video memory gets used for the pixmap cache and that information is provided by the XFree86 Offscreen Memory Manager. XAA registers a callback with the manager so that it can be informed of any changes in the offscreen memory configuration. The driver writer does not need to deal with any of this since it is all automatic. The driver merely needs to initialize the Offscreen Memory Manager as described in the DESIGN document and set the PIXMAP_CACHE flag in the XAAInfoRec.Flags field. The Offscreen Memory Manager initialization must occur before XAA is initialized or else pixmap cache initialization will fail. PixmapCacheFlags is an XAAInfoRec field which allows the driver to control pixmap cache behavior to some extent. Currently only one flag is defined: DO_NOT_BLIT_STIPPLES This indicates that the stippling should not be done by blitting from the pixmap cache. This does not apply to 8x8 pattern fills. CachePixelGranularity is an optional field. If the hardware requires that a 8x8 patterns have some particular pixel alignment it should be reflected in this field. Ignoring this field or setting it to zero or one means there are no alignment issues. 4) OFFSCREEN PIXMAPS XAA has the ability to store pixmap drawables in offscreen video memory and render into them with full hardware acceleration. Placement of pixmaps in the cache is done automatically on a first-come basis and only if there is room. To enable this feature, set the OFFSCREEN_PIXMAPS flag in the XAAInfoRec.Flags field. This is only available when a ScreenToScreenCopy function is provided, when the Offscreen memory manager has been initialized and when the LINEAR_FRAMEBUFFER flag is also set. int maxOffPixWidth int maxOffPixHeight These two fields allow the driver to limit the maximum dimensions of an offscreen pixmap. If one of these is not set, it is assumed that there is no limit on that dimension. Note that if an offscreen pixmap with a particular dimension is allowed, then your driver will be expected to render primitives as large as that pixmap. $XFree86$