From eb10d050a8a842ed947e3a493099d5604f83963f Mon Sep 17 00:00:00 2001 From: Dylan Baker Date: Thu, 15 Dec 2016 15:59:58 -0800 Subject: Import Jason's isl-handbook into mesa This imports the isl-handbook, uses pandoc (with some hand touchups) to convert the markdown into rst files, and then adds them to the sphinx documentation. --- sphinx/source/index.rst | 1 + sphinx/source/isl | 1 + src/intel/isl/docs/.gitignore | 3 + src/intel/isl/docs/ccs.rst | 126 +++++++ src/intel/isl/docs/formats.rst | 229 +++++++++++++ src/intel/isl/docs/images/tiling-basic.svg | 512 +++++++++++++++++++++++++++++ src/intel/isl/docs/index.rst | 11 + src/intel/isl/docs/isl-intro.rst | 256 +++++++++++++++ src/intel/isl/docs/preface.rst | 23 ++ src/intel/isl/docs/tiling.rst | 359 ++++++++++++++++++++ 10 files changed, 1521 insertions(+) create mode 120000 sphinx/source/isl create mode 100644 src/intel/isl/docs/.gitignore create mode 100644 src/intel/isl/docs/ccs.rst create mode 100644 src/intel/isl/docs/formats.rst create mode 100644 src/intel/isl/docs/images/tiling-basic.svg create mode 100644 src/intel/isl/docs/index.rst create mode 100644 src/intel/isl/docs/isl-intro.rst create mode 100644 src/intel/isl/docs/preface.rst create mode 100644 src/intel/isl/docs/tiling.rst diff --git a/sphinx/source/index.rst b/sphinx/source/index.rst index 147bf643ac..4e7fe67fee 100644 --- a/sphinx/source/index.rst +++ b/sphinx/source/index.rst @@ -11,6 +11,7 @@ Welcome to Mesa's documentation! :caption: Contents: gallium/index + isl/index Indices and tables diff --git a/sphinx/source/isl b/sphinx/source/isl new file mode 120000 index 0000000000..68349c7f43 --- /dev/null +++ b/sphinx/source/isl @@ -0,0 +1 @@ +../../src/intel/isl/docs \ No newline at end of file diff --git a/src/intel/isl/docs/.gitignore b/src/intel/isl/docs/.gitignore new file mode 100644 index 0000000000..e2597eea25 --- /dev/null +++ b/src/intel/isl/docs/.gitignore @@ -0,0 +1,3 @@ +isl-handbook.html +isl-handbook.pdf +.*.swp diff --git a/src/intel/isl/docs/ccs.rst b/src/intel/isl/docs/ccs.rst new file mode 100644 index 0000000000..88d3b4e136 --- /dev/null +++ b/src/intel/isl/docs/ccs.rst @@ -0,0 +1,126 @@ +Single-sampled Color Compression +================================ + +Starting with Ivy Bridge, GEN hardware provides a form of color +compression for single-sampled surfaces. In its initial form, this +provided an acceleration of render target clear operations that, in the +common case, allows you to avoid almost all of the bandwidth of a +full-surface clear operation. On Sky Lake, single-sampled color +compression was extended to allow for the compression color values from +actual rendering and not just the initial clear. From here on, the older +Ivy Bridge form of color compression will be called "fast-clears" and +term "color compression" will be reserved for the more powerful Sky Lake +form. + +The documentation for Ivy Bridge through Broadwell overloads the term +MCS for referring both to the *multisample control surface* used for +multisample compression and the control surface used for fast-clears. +Throughout this chapter, we will use the term "color control surface", +abbreviated CCS, to denote the control surface used for both fast-clears +and color compression. While this is still an overloaded term, +fast-clears are much closer to Sky Lake color compression than they are +to multisample compression. + +Fast Clears +----------- + +Fast clears are possibly the single most poorly documented aspect of +surface layout/setup for GEN graphics hardware (with HiZ coming in a +neat second). All the documentation really says is that you can use an +MCS buffer on single-sampled surfaces (we will call it the CCS in this +case). It also provides some documentation on how to program the +hardware to perform clear operations, but that's it. How big is this +buffer? What does it contain? Those question are left as exercises to +the reader. Almost everything we know about the contents of the CCS is +gleaned from reverse-engineering of the hardware. The best bit of +documentation we have ever had comes from the display section of the Sky +Lake PRM: + + .. rubric:: Sky Lake PRM Vol 12 section on planes (p. 159): + :name: sky-lake-prm-vol-12-section-on-planes-p.-159 + :class: unnumbered + + The Color Control Surface (CCS) contains the compression status of + the cache-line pairs. The compression state of the cache-line pair + is specified by 2 bits in the CCS. Each CCS cache-line represents an + area on the main surface of 16x16 sets of 128 byte Y-tiled + cache-line-pairs. CCS is always Y tiled. + +While this is technically for color compression and not fast-clears, it +provides a good bit of insight into how color compression and +fast-clears operate. Each cache-line pair, in the main surface +corresponds to 1 or 2 bits in the CCS. The primary difference, as far as +the current discussion is concerned, is that fast-clears use only 1 bit +per cache-line pair whereas color compression uses 2 bits. + +What is a cache-line pair? Both the X and Y tiling formats are arranged +as an 8x8 grid of cache lines. (See the `chapter on tiling <#tiling>`__ +for more details.) In either case, a cache-line pair is a pair of cache +lines whose starting addresses differ by 512 bytes or 8 cache lines. +This results in the two cache lines being vertically adjacent when the +main surface is X-tiled and horizontally adjacent when the main surface +is Y-tiled. For an X-tiled surface this forms an area of 64B x 2rows and +for a Y-tiled surface this forms an area of 32B x 4rows. In either case, +it is guaranteed that, regardless of surface format, each 2x2 subspan +coming out of a shader will land entirely within one cache-line pair. + +CCS surface layout +~~~~~~~~~~~~~~~~~~ + +Starting with Broadwell, fast-clears and color compression can be used +on mipmapped and array surfaces. When considered from a higher level, +the CCS is layed out like any other surface. The Broadwell and Sky Lake +PRMs describe this as follows: + + .. rubric:: Broadwell PRM Vol 7, "MCS Buffer for Render Target(s)" + (p. 676): + :name: broadwell-prm-vol-7-mcs-buffer-for-render-targets-p.-676 + :class: unnumbered + + Mip-mapped and arrayed surfaces are supported with MCS buffer layout + with these alignments in the RT space: Horizontal Alignment = 256 + and Vertical Alignment = 128. + + .. rubric:: Broadwell PRM Vol 2d, "RENDER\_SURFACE\_STATE" (p. 279): + :name: broadwell-prm-vol-2d-render_surface_state-p.-279 + :class: unnumbered + + For non-multisampled render target's auxiliary surface, MCS, QPitch + must be computed with Horizontal Alignment = 256 and Surface + Vertical Alignment = 128. These alignments are only for MCS buffer + and not for associated render target. + + .. rubric:: Sky Lake PRM Vol 7, "MCS Buffer for Render Target(s)" + (p. 632): + :name: sky-lake-prm-vol-7-mcs-buffer-for-render-targets-p.-632 + :class: unnumbered + + Mip-mapped and arrayed surfaces are supported with MCS buffer layout + with these alignments in the RT space: Horizontal Alignment = 128 + and Vertical Alignment = 64. + + .. rubric:: Sky Lake PRM Vol. 2d, "RENDER\_SURFACE\_STATE" (p. 435): + :name: sky-lake-prm-vol.-2d-render_surface_state-p.-435 + :class: unnumbered + + For non-multisampled render target's CCS auxiliary surface, QPitch + must be computed with Horizontal Alignment = 128 and Surface + Vertical Alignment = 256. These alignments are only for CCS buffer + and not for associated render target. + +Empirical evidence seems to confirm this. On Sky Lake, the vertical +alignment is always one cache line. The horizontal alignment, however, +varies by main surface format: 1 cache line for 32bpp, 2 for 64bpp and 4 +cache lines for 128bpp formats. This nicely corresponds to the alignment +of 128x64 pixels in the primary color surface. The second PRM citation +about Sky Lake CCS above gives a vertical alignment of 256 rather than +64. With a little experimentation, this additional alignment appears to +only apply to QPitch and not to the miplevels within a slice. + +TODO: More than just 32bpp formats on Broadwell! On Broadwell, each +miplevel in the CCS is aligned to a cache-line pair boundary: horizontal +when the primary surface is X-tiled and vertical when Y-tiled. For a +32bpp format, this works out to an alignment of 256x128 main surface +pixels regardless of X or Y tiling. On Sky Lake, the alignment is a +single cache line which works out to an alignment of 128x64 main surface +pixels. diff --git a/src/intel/isl/docs/formats.rst b/src/intel/isl/docs/formats.rst new file mode 100644 index 0000000000..7021b17740 --- /dev/null +++ b/src/intel/isl/docs/formats.rst @@ -0,0 +1,229 @@ +Surface Formats +=============== + +A surface format describes the encoding of color information into the +actual data stored in memory. In general, a surface format definition +consists of two parts; encoding and layout. We'll take each +individually. + +Data Encoding +------------- + +There are several different ways that one can encode a number (or +vector) into a binary form and each makes different trade-offs. By +default, most color values are considered to lie in the range +:math:`[0, 1]` so one of the most common encodings for color data is +unsigned normalized where the range of an unsigned integer of a +particular size is mapped linearly onto the interval :math:`[0, 1]`. +While normalized is certainly the most common representation for color +data, not all data is color data and not all values are nicely bounded. +The following table gives an overview of the different encodings +frequently found in graphics APIs. + ++----------------------+--------------+-------------+-----------------------------+ +| Name | ISL base | Integer | Conversion | +| | type | | | ++======================+==============+=============+=============================+ +| Unsigned normalized | ``ISL_UNORM` | no | :math:`\frac{{\tt(uint)} x} | +| | ` | | {2^{bits} - 1}` | ++----------------------+--------------+-------------+-----------------------------+ +| Signed normalized | ``ISL_SNORM` | no | :math:`\frac{{\tt(int)} x}{ | +| | ` | | 2^{bits - 1} - 1}` | ++----------------------+--------------+-------------+-----------------------------+ +| Unsigned float | ``ISL_UFLOAT | no | | +| | `` | | | ++----------------------+--------------+-------------+-----------------------------+ +| Signed float | ``ISL_SFLOAT | no | | +| | `` | | | ++----------------------+--------------+-------------+-----------------------------+ +| Unsigned fixed-point | ``ISL_UFIXED | no | :math:`\frac{{\tt(uint)} x} | +| | `` | | {2^{16}}` | ++----------------------+--------------+-------------+-----------------------------+ +| Signed fixed-point | ``ISL_SFIXED | no | :math:`\frac{{\tt(uint)} x} | +| | `` | | {2^{16}}` | ++----------------------+--------------+-------------+-----------------------------+ +| Unsigned integer | ``ISL_UINT`` | yes | | ++----------------------+--------------+-------------+-----------------------------+ +| Signed integer | ``ISL_SINT`` | yes | | ++----------------------+--------------+-------------+-----------------------------+ + +The integer encodings are simply a signed or unsigned integer of a +particular bit-size. The normalized and fixed-point encodings are both +stored as an integer that can be converted to a real number by dividing +by the appropriate divisor. + +As far as floating-point is concerned, there are several different sizes +which are summarized in the table below. + ++--------------+---------------+---------------+------------+ +| Bits | Sign bit | Mantissa | Exponent | ++==============+===============+===============+============+ +| 64 | Y | 52 | 11 | ++--------------+---------------+---------------+------------+ +| 32 | Y | 23 | 8 | ++--------------+---------------+---------------+------------+ +| 16 | Y | 10 | 5 | ++--------------+---------------+---------------+------------+ +| 11 | N | 6 | 5 | ++--------------+---------------+---------------+------------+ +| 10 | N | 5 | 5 | ++--------------+---------------+---------------+------------+ + +There is one other odd-ball floating-point format with three components +each of which has a 9-bit mantissa and all three share a 5-bit exponent. + +Data Layout +----------- + +The different data layouts, in general, fall into two categories: array +and packed. When an array layout is used, the components are stored +sequentially in an array of the given encoding. For instance, if the +data is encoded in an 8-bit RGBA array format the data is stored in an +array of type ``uint8_t`` where the blue component of the ``i``'th color +value is accessed as + +.. code:: c + + uint8_t r = ((uint8_t *)data)[i * 4 + 0]; + uint8_t g = ((uint8_t *)data)[i * 4 + 1]; + uint8_t b = ((uint8_t *)data)[i * 4 + 2]; + uint8_t a = ((uint8_t *)data)[i * 4 + 3]; + +Array formats are popular because of their simplicity. However, they are +limited to formats where all components have the same size and fit in a +standard C data type. + +Packed formats, on the other hand are specified a whole color value at a +time. Instead of being specified as an array of components, the vector +lies entirely in a single power-of-two sized value and the components +are specified by which bits they occupy within that value. For instance, +with the popular ``RGB565`` format, each ``vec3`` takes up 16 bits and +the ``i``'th color value is accessed as + +.. code:: c + + uint8_t r = (*(uint8_t *)data >> 0) & 0x1f; + uint8_t g = (*(uint8_t *)data >> 5) & 0x3f; + uint8_t b = (*(uint8_t *)data >> 11) & 0x1f; + +Packed formats are useful because they allow you to specify formats with +uneven component sizes such as ``RGBA1010102`` or where the components +are smaller than 8 bits such as ``RGB565`` discussed above. It does, +however, come with the restriction that the entire vector must fit +within 8, 16, or 32 bits. + +One has to be careful when reasoning about packed formats because it is +easy to get the color order wrong. With array formats, the channel +ordering is usually implied directly from the format name with +``RGBA8888`` storing the formats as in the first example and +``BGRA8888`` storing them in the BGRA ordering. Packed formats, however, +are not as simple because some specifications choose to use a MSB to LSB +ordering and others LSB to MSB. One must be careful to pay attention to +the enum in question in order to avoid getting them backwards. + +From an API perspective, both types of formats are available. In Vulkan, +the formats that are of the form ``VK_FORMAT_xxx_PACKEDn`` are packed +formats where the entire color fits in ``n`` bits and formats without +the ``_PACKEDn`` suffix are array formats. In GL, if you specify one of +the base types such as ``GL_FLOAT`` you get an array format but if you +specify a packed type such as ``GL_UNSIGNED_INT_8_8_8_8_REV`` you get a +packed format. + ++-------------------------+--------------------+ +| Component | Left :math:`\to` | +| | Right | ++=========================+====================+ +| GL | MSB :math:`\to` | +| | LSB | ++-------------------------+--------------------+ +| Vulkan | MSB :math:`\to` | +| | LSB | ++-------------------------+--------------------+ +| mesa\_format | LSB :math:`\to` | +| | MSB | ++-------------------------+--------------------+ +| Intel surface format | LSB :math:`\to` | +| | MSB | ++-------------------------+--------------------+ + +Table: A summary of the bit orderings of different packed format +specifications. The bit ordering is relative to a reading of the enum +name from left to right. + +sRGB +---- + +The sRGB colorspace is one of the more intractable concepts in the +entire world of surfaces and formats. Most texture formats are stored in +a linear colorspace where the floating-point value corresponds linearly +to intensity values. The sRGB color space, on the other hand, is +non-linear and provides greater precision in the lower-intensity +(darker) end of the spectrum. The relationship between linear and sRGB +is governed by the following continuous bijection: \\[ c\_l = + +.. raw:: latex + + \begin{cases} + \frac{c_s}{12.92} &\text{if } c_s \le 0.04045 \\\\ + \left(\frac{c_s + 0.055}{1.055}\right)^{2.4} &\text{if } c_s > 0.04045 + \end{cases} + +\\] where :math:`c_l` is the linear color and :math:`c_s` is the color +in sRGB. It is important to note that, when an alpha channel is present, +the alpha channel is always stored in the linear colorspace. + +The key to understanding sRGB is to think about it starting from the +physical display. All displays work naively in sRGB. On older displays, +there isn't so much a conversion operation as a fact of how the hardware +works. All display hardware has a natural gamma curve required to get +from linear to the signal level required to generate the correct color. +On older CRT displays, the gamma curve of your average CRT is +approximately the sRGB curve. More modern display hardware has support +for additional gamma curves to try and get accurate colors but, for the +sake of compatibility, everything still operates in sRGB. When an image +is sent to the X server, X passes the pixels on to the display verbatim +without doing any conversions. (Fun fact: When dealing with translucent +windows, X blends in the wrong colorspace.) This means that the image +into which you are rendering will always be interpreted as if it were in +the sRGB colorspace. + +When sampling from a texture, the value returned to the shader is in the +linear colorspace. The conversion from sRGB happens as part of sampling. +In OpenGL, thanks mostly to history, there are various knobs for +determining when you should or should not encode or decode sRGB. In +2007, ``GL_EXT_texture_sRGB`` added support for sRGB texture formats and +was included in OpenGL 2.1. In 2010, ``GL_EXT_texture_sRGB_decode`` +added a flag to allow you to disable texture decoding so that the shader +received the data still in the sRGB colorspace. Then, in 2012, +``GL_ARB_texture_view`` came along and made +``GL_EXT_texture_sRGB_decode`` simultaneously obsolete and very +confusing. Now, thanks to the combination of extensions, you can upload +a texture as linear, create an sRGB view of it and ask that sRGB not be +decoded. What format is it in again? + +The situation with render targets is a bit different. Historically, you +got your render target from the window system (which is always sRGB) and +the spec said nothing whatsoever about encoding. All render targets were +sRGB because that's how monitors worked and application writers were +expected to understand that their final rendering needed to be in sRGB. +However, with the advent of ``EXT_framebuffer_object`` this was no +longer true. Also, sRGB was causing problems with blending because GL +was blind to the fact that the output was sRGB and blending was +occurring in the wrong colorspace. In 2006, a set of +``EXT_framebuffer_sRGB`` extensions added support (on both the GL and +window-system sides) for detecting whether a particular framebuffer was +in sRGB and instructing GL to do the conversion into the sRGB colorspace +as the final step prior to writing out to the render target. Enabling +sRGB also implied that blending would occur in the linear colorspace +prior to sRGB conversion and would therefore be more accurate. When sRGB +was added to the OpenGL ES spec in 3.1, they added the query for sRGB +but did not add the flag to allow you to turn it on and off. + +In Vulkan, this is all much more straightforward. Your format is sRGB or +it isn't. If you have an sRGB image and you don't want sRGB decoding to +happen when you sample from it, you simply create a ``VkImageView`` that +has the appropriate linear format and the data will be treated as linear +and not converted. Similarly for render targets, blending always happens +in the same colorspace as the shader output and you determine whether or +not you want sRGB conversion by the format of the ``VkImageView`` used +as the render target. diff --git a/src/intel/isl/docs/images/tiling-basic.svg b/src/intel/isl/docs/images/tiling-basic.svg new file mode 100644 index 0000000000..ebe4b6287d --- /dev/null +++ b/src/intel/isl/docs/images/tiling-basic.svg @@ -0,0 +1,512 @@ + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + + + + + + + + + 0 + 1 + 2 + 128 + 129 + 256 + 257 + 258 + 130 + ... + ... + ... + ... + ... + ... + + + + diff --git a/src/intel/isl/docs/index.rst b/src/intel/isl/docs/index.rst new file mode 100644 index 0000000000..4652da13ca --- /dev/null +++ b/src/intel/isl/docs/index.rst @@ -0,0 +1,11 @@ +Intel Surface Layout Library (ISL) +================================== + +.. toctree:: + :maxdepth: 2 + + preface + isl-intro + formats + tiling + ccs diff --git a/src/intel/isl/docs/isl-intro.rst b/src/intel/isl/docs/isl-intro.rst new file mode 100644 index 0000000000..bb0f91c8c6 --- /dev/null +++ b/src/intel/isl/docs/isl-intro.rst @@ -0,0 +1,256 @@ +Introduction to ISL +=================== + +All of the documentation here focuses around ISL, the **I**\ ntel +**S**\ urface **L**\ ayout library that was originally written by Chad +for doing surface layout in the Vulkan driver. When writing the Vulkan +driver we decided not to port the old code from ``intel_mipmap_tree``. +Instead, we did a complete rewrite of the surface layout code and the +result of that rewrite is ISL. + +The best place to start with ISL is the ``isl_surf`` data structure: + +.. code:: c + + struct isl_surf { + enum isl_surf_dim dim; + enum isl_dim_layout dim_layout; + enum isl_msaa_layout msaa_layout; + enum isl_tiling tiling; + enum isl_format format; + + /** + * Alignment of the upper-left sample of each subimage, in units of surface + * elements. + */ + struct isl_extent3d image_alignment_el; + + /** + * Logical extent of the surface's base level, in units of pixels. This is + * identical to the extent defined in isl_surf_init_info. + */ + struct isl_extent4d logical_level0_px; + + /** + * Physical extent of the surface's base level, in units of physical + * surface samples and aligned to the format's compression block. + * + * Consider isl_dim_layout as an operator that transforms a logical surface + * layout to a physical surface layout. Then + * + * logical_layout := (isl_surf::dim, isl_surf::logical_level0_px) + * isl_surf::phys_level0_sa := isl_surf::dim_layout * logical_layout + */ + struct isl_extent4d phys_level0_sa; + + uint32_t levels; + uint32_t samples; + + /** Total size of the surface, in bytes. */ + uint32_t size; + + /** Required alignment for the surface's base address. */ + uint32_t alignment; + + /** + * Pitch between vertically adjacent surface elements, in bytes. + */ + uint32_t row_pitch; + + /** + * Pitch between physical array slices, in rows of surface elements. + */ + uint32_t array_pitch_el_rows; + + enum isl_array_pitch_span array_pitch_span; + + /** Copy of isl_surf_init_info::usage. */ + isl_surf_usage_flags_t usage; + }; + +This data structure describes everything you need to know about a +surface in a canonical way. Everything in ISL has well-defined units and +things don't change meanings based on hardware generation. Units are +usually denoted by a suffix such as "``_el``" for elements or "``_sa``" +for samples. Understanding ISL's units is key to understanding how ISL +performs surface calculations; so important, in fact, that they are +given `their own section <#units#>`__. + +Units +----- + +Before we go any further, we should discuss the different units that ISL +uses. There are four of them: + +- Pixels (px) +- Samples (sa) +- Elements (el) +- Tiles (tl) + +**Pixels** are the most straightforward unit and are where everything +starts. A pixel simply corresponds to a single pixel (or texel if you +prefer) in the surface. For multisampled surfaces, a pixel may contain +one or more samples. For compressed textures, a compression block may +contain one or more pixels. When initially creating a surface, +everything passed to isl\_surf\_init is implicitly in terms of pixels +because this is what all of the APIs use. + +The next unit in ISL's repertoire is **samples**. In a multisampled +surface, each pixel corresponds to some number of samples given by +``isl_surf::samples``. The exact layout of the samples depends on the +value of ``isl_surf::msaa_layout``. If the layout is +``ISL_MSAA_LAYOUT_ARRAY`` then each logical array in the surface +corresponds to ``isl_surf::samples`` actual slices in the resulting +surface, one per array slice. If the layout is +``ISL_MSAA_LAYOUT_INTERLEAVED`` then each pixel corresponds to a 2x1, +2x2, 4x2, or 4x4 grid of samples. In order to aid in calculations, one +of the first things ISL does is to compute ``isl_surf::phys_level0_sa`` +which gives the dimensions of the base miplevel of the surface in +samples. The type of ``isl_surf::phys_level0_sa`` is ``isl_extent4d`` +which allows us to express both the array and interleaved cases. Most of +the calculations of how the different miplevels and array slices are +laid out is done in terms of samples. + +Next, we have surface **elements**. An element is the basic unit of +actual surface memory. For multisampled textures, an element is equal to +a single sample. For compressed textures, an element corresponds to an +entire compression block. The conversion from samples to elements is +given by dividing by the block width and block height of the surface +format. This is true regardless of whether or not the surface is +multisampled; for multisampled compressed textures (these exist for +certain auxiliary formats), the block width and block height are +expressed in samples. This means that you cannot convert directly from +pixels to elements or vice versa; any conversion between pixels and +elements *must* go through samples. + +Finally, we have **tiles**. A tile is a large rectangular block of +surface data that all fits in a single contiguous block of memory +(usually a 4K page). Tiles are used to provide an arrangement of the +data in memory that yields better cache performance. The size of a tile +is always specified in surface elements. + +These units are fundamental to ISL because they allow us to specify +information about a surface in a canonical way that isn't dependent on +hardware generation. Each field in an ISL data structure that stores any +sort of dimension has a suffix that declares the units for that +particular value: "``_el``" for elements, "``_sa``" for samples, etc. If +the units of the particular field aren't quite what is wanted by the +hardware, we do the conversion when we emit ``RENDER_SURFACE_STATE``. +This is one of the primary differences in ideology between ISL and the +old miptree code which tried to keep everything in the same units as the +hardware expects. One example of this difference is QPitch which +specifies the distance between array slices. For compressed textures, +the QPitch field in ``RENDER_SURFACE_STATE`` was in compression blocks +on Broadwell but it changed to pixels on Sky Lake. Since the old surface +state code tries to store things in hardware units, everyone who ever +reads ``intel_mipmap_tree::qpitch`` has to change their interpretation +based on hardware generation. In ISL, we have ``array_pitch_el_rows`` +which, as the name says, is in rows of elements. On Sky Lake and later, +we have to multiply by the block size of the texture when we finally +fill out the hardware packet, but it makes any other users of the field +much simpler because they know that it's always in elements. + +Creating Surfaces +----------------- + +Creating an ``isl_surf`` is done via the ``isl_surf_init_s`` function +which takes an ``isl_surf_init_info`` structure. There is also an +``isl_surf_init`` macro which uses a C99 designated initializer to +provide a function-like interface with named parameters. + +.. code:: c + + struct isl_surf_init_info { + enum isl_surf_dim dim; + enum isl_format format; + + uint32_t width; + uint32_t height; + uint32_t depth; + uint32_t levels; + uint32_t array_len; + uint32_t samples; + + /** Lower bound for isl_surf::alignment, in bytes. */ + uint32_t min_alignment; + + /** Lower bound for isl_surf::pitch, in bytes. */ + uint32_t min_pitch; + + isl_surf_usage_flags_t usage; + + /** Flags that alter how ISL selects isl_surf::tiling. */ + isl_tiling_flags_t tiling_flags; + }; + + #define isl_surf_init(dev, surf, ...) \ + isl_surf_init_s((dev), (surf), \ + &(struct isl_surf_init_info) { __VA_ARGS__ }); + + bool + isl_surf_init_s(const struct isl_device *dev, + struct isl_surf *surf, + const struct isl_surf_init_info *restrict info); + +The dimensionality of the surface is given by the ``isl_surf_dim`` enum: + +.. code:: c + + enum isl_surf_dim { + ISL_SURF_DIM_1D, + ISL_SURF_DIM_2D, + ISL_SURF_DIM_3D, + }; + +Not that ISL has no inherent concept of cube or array surfaces. All 1-D +or 2-D surfaces are potentially arrays. Cube surfaces are simply 2-D +surfaces with 6 array layers that have the ``ISL_SURF_USAGE_CUBE_BIT`` +set (more on usage bits later). + +Next we have an ``isl_format`` which specifies the nominal format of the +surface. The values in the ``isl_format`` enum are exactly the same +integer values as the hardware surface format enumerations. This allows +for zero-cost translations between ISL and the hardware. The format +specified in the ``isl_surf`` is used for surface layout calculations +but it is not necessarily the format that will be packed into the +``RENDER_SURFACE_STATE`` structure. When emitting a surface state, you +also provide an ``isl_view`` structure that provides array layer and +miplevel ranges as well as the final format. + +Next we have 6 unsigned integer values that provide the size of the +surface in all possible dimensions. The ``width``, ``height``, +``depth``, and ``array_len`` fields are all in terms of surface +*pixels*. The ``array_len`` field is expected to be 6 for cubemap +surfaces and is specified in number of faces (not number of cubes) for +cube array surfaces. The ``levels`` and ``samples`` fields are fairly +self-explanatory. + +The ``min_alignment`` and ``min_pitch`` fields allow some control over +the way the surface is laid out in memory. While the final alignment and +pitch are calculated by ISL in ``isl_init_surf_s``, these allow the +caller to specify a lower bound. For linear surfaces, these fields are +more-or-less respected with the exception that ISL may round up to the +size of an element. + +The ``usage`` field is a bitwise OR of ``ISL_SURF_USAGE_*`` flags that +specify all of the possible ways the surface may be used. Correctly +specifying these flags is crucial to getting the correct results. +Because the hardware has no surface formats for depth of stencil +textures, the only way that ISL can know that a texture is expected to +be used for depth or stencil is by the usage flags. For instance, a +stencil texture should always have a format of ``ISL_FORMAT_R8_UINT`` +and specify ``ISL_SURF_USAGE_STENCIL_BIT``. It is illegal to combine +depth or stencil bits with ``ISL_SURF_USAGE_RENDER_TARGET_BIT`` because +they have different layout requirements which may or may not be +renderable. The usage flags are also where you specify that a given +surface may be used as a cube map. + +Finally, we have tiling flags. These specify the allowed tiling modes +for the given surface. Usually, this will be one of +``ISL_TILING_LINEAR_BIT``, ``ISL_TILING_NON_LINEAR_MASK`` or +``ISL_TILING_ANY_MASK``. Inside of ``isl_surf_init_s``, isl will +automatically filter the set of possible tilings based on hardware +generation, usage flags, etc. and produce choose the tiling format that +it thinks is the most appropriate. If, however, the calling code knows +exactly what tiling format it wants, then it can specify a single bit +and it will get that tiling format assuming it's supported. diff --git a/src/intel/isl/docs/preface.rst b/src/intel/isl/docs/preface.rst new file mode 100644 index 0000000000..02e86f0e30 --- /dev/null +++ b/src/intel/isl/docs/preface.rst @@ -0,0 +1,23 @@ +Preface +======= + +Surface layout is a difficult subject and a full mastery of it requires +knowledge in a variety of areas throughout the stack from the graphics +API (Vulkan or GL) all the way down to the bits placed in memory by the +hardware. Unfortunately, the documentation required to gain that +knowledge is scattered if it exists at all. This document aims to serve +as a handbook of all things surface related. Some things are already +documented in detail other places; when this is the case, there is no +need to write the documentation and it will merely be referenced. There +are many other things, especially those relating to Intel hardware, +where the documentation is so incomplete or scattered that it is almost +worthless; in those cases, full documentation will be provided here with +as many PRM citations as possible. + +This documented is intended to interact tightly with the Intel surface +layout library (ISL or libisl) that lives in the mesa repository. All +code examples will use ISL and will be written in terms of it's units, +data structures, and functions. ISL itself is intended to be +well-documented so there is going to be some overlap. However, not all +documentation is fit for code comments and not all of it should be made +public. diff --git a/src/intel/isl/docs/tiling.rst b/src/intel/isl/docs/tiling.rst new file mode 100644 index 0000000000..bc55df96fd --- /dev/null +++ b/src/intel/isl/docs/tiling.rst @@ -0,0 +1,359 @@ +Tiling +====== + +The naive view of an image in memory is that the pixels are stored one +after another in memory usually in an X-major order. An image that is +arranged in this way is called "linear". Linear images, while easy to +reason about, can have very bad cache locality. Graphics operations tend +to act on pixels that are close together in 2-D euclidean space. If you +move one pixel to the right or left in a linear image, you only move a +few bytes to one side or the other in memory. However, if you move one +pixel up or down you can end up kilobytes or even megabytes away. + +Tiling (sometimes referred to as swizzling) is a method of re-arranging +the pixels of a surface so that pixels which are close in 2-D euclidean +space are likely to be close in memory. + +Basics +------ + +The basic idea of a tiled image is that the image is first divided into +two-dimensional blocks or tiles. Each tile takes up a chunk of +contiguous memory and the tiles are arranged like pixels in linear +surface. This is best demonstrated with a specific example. Suppose we +have a RGBA8888 X-tiled surface on GEN graphics. Then the surface is +divided into 128x8 pixel tiles each of which is 4KB of memory. Within +each tile, the pixels are laid out like a 128x8 linear image. The tiles +themselves are laid out row-major in memory like giant pixels. This +means that, as long as you don't leave your 128x8 tile, you can move in +both dimensions without leaving the same 4K page in memory. + +.. figure:: images/tiling-basic.svg + :alt: Example of a Y-tiled image + + Example of a Y-tiled image + +You can, however do even better than this. Suppose that same image is, +instead, Y-tiled. Then the surface is divided into 32x32 pixel tiles +each of which is 4KB of memory. Within a tile, each 64B cache line +corresponds to 4x4 pixel region of the image (you can think of it as a +tile within a tile). This means that very small deviations don't even +leave the cache line. This added bit of pixel shuffling is known to have +a substantial performance impact in most real-world applications. + +Intel GEN graphics has several different tiling formats that we'll +discuss in detail in later sections. The most commonly used as of the +writing of this chapter is Y-tiling. In all tiling formats the basic +principal is the same: The image is divided into tiles of a particular +size and, within those tiles, the data is re-arranged (or swizzled) +based on a particular pattern. A tile size will always be specified in +bytes by rows and the actual X-dimension of the tile in elements depends +on the size of the element in bytes. + +Bit-6 Swizzling +~~~~~~~~~~~~~~~ + +On some hardware, there is an additional address swizzle that is applied +on top of the tiling format. Whether or not swizzling is enabled depends +on the memory configuration of the system. In general, systems with +dual-channel RAM have swizzling enabled and single-channel do not. +Supposedly, this swizzling allows for better balancing between the two +memory channels and increases performance. Because it depends on the +memory configuration which may change from one boot to the next, it +requires a run-time check. + +The best documentation for bit-6 swizzling can be found in the Haswell +PRM Vol. 5 "Memory Views" in the section entitled "Address Swizzling for +Tiled-Y Surfaces". + +ISL Representation +------------------ + +The structure of any given tiling format is represented by ISL using the +``isl_tiling`` enum and the ``isl_tile_info`` structure: + +.. code:: c + + enum isl_tiling { + ISL_TILING_LINEAR = 0, + ISL_TILING_W, + ISL_TILING_X, + ISL_TILING_Y0, + ISL_TILING_Yf, + ISL_TILING_Ys, + ISL_TILING_HIZ, + ISL_TILING_CCS, + }; + + struct isl_tile_info { + enum isl_tiling tiling; + + struct isl_extent2d logical_extent_el; + + struct isl_extent2d phys_extent_B; + }; + + bool + isl_tiling_get_info(const struct isl_device \*dev, + enum isl_tiling tiling, + uint32_t format_bpb, + struct isl_tile_info \*info); + +Instead of using separate "Tile Mode" and "Tiled Resource Mode" fields +like are used by the Sky Lake ``RENDER_SURFACE_STATE`` packet, ISL has a +single tiling enum that is capable of expressing everything we need. + +The ``isl_tile_info`` structure has two different sizes for a tile: a +logical size in surface elements and a physical size in bytes. In order +to determine the proper logical size, the bits-per-block of the +underlying format has to be passed into ``isl_tiling_get_info``. The +proper way to compute the size of an image in bytes given a width and +height in elements is as follows: + +.. code:: c + + uint32_t width_tl = DIV_ROUND_UP(width_el, tile_info.logical_extent_el.w); + uint32_t height_tl = DIV_ROUND_UP(height_el, tile_info.logical_extent_el.h); + uint32_t row_pitch = width_tl * tile_info.phys_extent_el.w; + uint32_t size = height_tl * tile_info.phys_extent_el.h * row_pitch; + +It is very important to note that there is no direct conversion between +``logical_extent_el`` and ``phys_extent_B``. It is tempting to assume +that the logical and physical heights are the same and simply divide the +width of ``phys_extent_B`` by the size of the format (which is what the +PRM does) to get ``logical_extent_el`` but this is not at all correct. +Some tiling formats have logical and physical heights that differ and so +no such calculation will work in general. The easiest case study for +this is W-tiling. From the Sky Lake PRM: + + .. rubric:: Sky Lake PRM Vol. 2d, "RENDER\_SURFACE\_STATE" (p. 427): + :name: sky-lake-prm-vol.-2d-render_surface_state-p.-427 + :class: unnumbered + + If the surface is a stencil buffer (and thus has Tile Mode set to + TILEMODE\_WMAJOR), the pitch must be set to 2x the value computed + based on width, as the stencil buffer is stored with two rows + interleaved. + +What does this mean? Why are we multiplying the pitch by two? What does +it mean that "the stencil buffer is stored with two rows interleaved"? +The explanation for all these questions is that a W-tile (which is only +used for stencil) has a logical size of 64el x 64el but a physical size +of 128B x 32rows. In memory, a W-tile has the same footprint as a Y-tile +(128B x 32rows) but every pair of rows in the stencil buffer is +interleaved into a single row of bytes yielding a two-dimensional area +of 64el x 64el. You can consider this as its own tiling format or as a +modification of Y-tiling. The interpretation in the PRMs vary by +hardware generation; on Sandy Bridge they simply said it was Y-tiled but +by Sky Lake there is almost no mention of Y-tiling in connection with +stencil buffers and they are always W-tiled. This mismatch between +logical and physical tile sizes are also relevant for hierarchical depth +buffers as well as single-channel MCS and CCS buffers. + +X-tiling +-------- + +The simplest tiling format available on GEN graphics (which has been +available since gen4) is X-tiling. An X-tile is 512B x 8rows and, within +the tile, the data is arranged in an X-major linear fashion. You can +also look at X-tiling as being an 8x8 cache line grid where the cache +lines are arranged X-major as follows: + ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x000 | 0x040 | 0x080 | 0x0c0 | 0x100 | 0x140 | 0x180 | 0x1c0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x200 | 0x240 | 0x280 | 0x2c0 | 0x300 | 0x340 | 0x380 | 0x3c0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x400 | 0x440 | 0x480 | 0x4c0 | 0x500 | 0x540 | 0x580 | 0x5c0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x600 | 0x640 | 0x680 | 0x6c0 | 0x700 | 0x740 | 0x780 | 0x7c0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x800 | 0x840 | 0x880 | 0x8c0 | 0x900 | 0x940 | 0x980 | 0x9c0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0xa00 | 0xa40 | 0xa80 | 0xac0 | 0xb00 | 0xb40 | 0xb80 | 0xbc0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0xc00 | 0xc40 | 0xc80 | 0xcc0 | 0xd00 | 0xd40 | 0xd80 | 0xdc0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0xe00 | 0xe40 | 0xe80 | 0xec0 | 0xf00 | 0xf40 | 0xf80 | 0xfc0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ + +Each cache line represents a piece of a single row of pixels within the +image. The memory locations of two vertically adjacent pixels within the +same X-tile always differs by 512B or 8 cache lines. + +As mentioned above, X-tiling is slower than Y-tiling (though still +faster than linear). However, until Sky Lake, the display scan-out +hardware could only do X-tiling so we have historically used X-tiling +for all window-system buffers (because X or a Wayland compositor may +want to put it in a plane). + +Bit-6 Swizzling +~~~~~~~~~~~~~~~ + +When bit-6 swizzling is enabled, bits 9 and 10 are XOR'd in with bit 6 +of the tiled address: + +.. code:: c + + addr[6] ^= addr[9] ^ addr[10]; + +Y-tiling +-------- + +The Y-tiling format, also available since gen4, is substantially +different from X-tiling and performs much better in practice. Each +Y-tile is an 8x8 grid of cache lines arranged Y-major as follows: + ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x000 | 0x200 | 0x400 | 0x600 | 0x800 | 0xa00 | 0xc00 | 0xe00 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x040 | 0x240 | 0x440 | 0x640 | 0x840 | 0xa40 | 0xc40 | 0xe40 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x080 | 0x280 | 0x480 | 0x680 | 0x880 | 0xa80 | 0xc80 | 0xe80 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x0c0 | 0x2c0 | 0x4c0 | 0x6c0 | 0x8c0 | 0xac0 | 0xcc0 | 0xec0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x100 | 0x300 | 0x500 | 0x700 | 0x900 | 0xb00 | 0xd00 | 0xf00 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x140 | 0x340 | 0x540 | 0x740 | 0x940 | 0xb40 | 0xd40 | 0xf40 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x180 | 0x380 | 0x580 | 0x780 | 0x980 | 0xb80 | 0xd80 | 0xf80 | ++---------+---------+---------+---------+---------+---------+---------+---------+ +| 0x1c0 | 0x3c0 | 0x5c0 | 0x7c0 | 0x9c0 | 0xbc0 | 0xdc0 | 0xfc0 | ++---------+---------+---------+---------+---------+---------+---------+---------+ + +Each 64B cache line within the tile is laid out as 4 rows of 16B each: + ++--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x00 | 0x01 | 0x02 | 0x03 | 0x04 | 0x05 | 0x06 | 0x07 | 0x08 | 0x09 | 0x0a | 0x0b | 0x0c | 0x0d | 0x0e | 0x0f | ++--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x10 | 0x11 | 0x12 | 0x13 | 0x14 | 0x15 | 0x16 | 0x17 | 0x18 | 0x19 | 0x1a | 0x1b | 0x1c | 0x1d | 0x1e | 0x1f | ++--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x20 | 0x21 | 0x22 | 0x23 | 0x24 | 0x25 | 0x26 | 0x27 | 0x28 | 0x29 | 0x2a | 0x2b | 0x2c | 0x2d | 0x2e | 0x2f | ++--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x30 | 0x31 | 0x32 | 0x33 | 0x34 | 0x35 | 0x36 | 0x37 | 0x38 | 0x39 | 0x3a | 0x3b | 0x3c | 0x3d | 0x3e | 0x3f | ++--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ + +Y-tiling is widely regarded as being substantially faster than X-tiling +so it is generally preferred. However, prior to Sky Lake, Y-tiling was +not available for scanout so X tiling was used for any sort of +window-system buffers. Starting with Sky Lake, we can scan out from +Y-tiled buffers. + +Bit-6 Swizzling +~~~~~~~~~~~~~~~ + +When bit-6 swizzling is enabled, bit 9 is XOR'd in with bit 6 of the +tiled address: + +.. code:: c + + addr[6] ^= addr[9]; + +W-tiling +-------- + +W-tiling is a new tiling format added on Sandy Bridge for use in stencil +buffers. W-tiling is similar to Y-tiling in that it's arranged as an 8x8 +Y-major grid of cache lines. The bytes within each cache line are +arranged as follows: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x00 | 0x01 | 0x04 | 0x05 | 0x10 | 0x11 | 0x14 | 0x15 | ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x02 | 0x03 | 0x06 | 0x07 | 0x12 | 0x13 | 0x16 | 0x17 | ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x08 | 0x09 | 0x0c | 0x0d | 0x18 | 0x19 | 0x1c | 0x1d | ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x0a | 0x0b | 0x0e | 0x0f | 0x1a | 0x1b | 0x1e | 0x1f | ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x20 | 0x21 | 0x24 | 0x25 | 0x30 | 0x31 | 0x34 | 0x35 | ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x22 | 0x23 | 0x26 | 0x27 | 0x32 | 0x33 | 0x36 | 0x37 | ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x28 | 0x29 | 0x2c | 0x2d | 0x38 | 0x39 | 0x3c | 0x3d | ++--------+--------+--------+--------+--------+--------+--------+--------+ +| 0x2a | 0x2b | 0x2e | 0x2f | 0x3a | 0x3b | 0x3e | 0x3f | ++--------+--------+--------+--------+--------+--------+--------+--------+ + +While W-tiling has been required for stencil all the way back to Sandy +Bridge, the docs are somewhat confused as to whether stencil buffers are +W or Y-tiled. This seems to stem from the fact that the hardware seems +to implement W-tiling as a sort of modified Y-tiling. One example of +this is the somewhat odd requirement that W-tiled buffers have their +pitch multiplied by 2: + + .. rubric:: Sky Lake PRM Vol. 2d, "RENDER\_SURFACE\_STATE" (p. 427): + :name: sky-lake-prm-vol.-2d-render_surface_state-p.-427-1 + :class: unnumbered + + If the surface is a stencil buffer (and thus has Tile Mode set to + TILEMODE\_WMAJOR), the pitch must be set to 2x the value computed + based on width, as the stencil buffer is stored with two rows + interleaved. + +The last phrase holds the key here: "the stencil buffer is stored with +two rows interleaved". More accurately, a W-tiled buffer can be viewed +as a Y-tiled buffer with each set of 4 W-tiled lines interleaved to form +2 Y-tiled lines. In ISL, we represent a W-tile as a tiling with a +logical dimension of 64el x 64el but a physical size of 128B x 32rows. +This cleanly takes care of the pitch issue above and seems to nicely +model the hardware. + +Tiling as a bit pattern +----------------------- + +There is one more important angle on tiling that should be discussed +before we finish. Every tiling can be described by three things: + +1. A logical width and height in elements +2. A physical width in bytes and height in rows +3. A mapping from logical elements to physical bytes within the tile + +We have spent a good deal of time on the first two because this is what +you really need for doing surface layout calculations. However, there +are cases in which the map from logical to physical elements is +critical. One example is W-tiling where we have code to do W-tiled +encoding and decoding in the shader for doing stencil blits because the +hardware does not allow us to render to W-tiled surfaces. + +There are many ways to mathematically describe the mapping from logical +elements to physical bytes. In the PRMs they give a very complicated set +of formulas involving lots of multiplication, modulus, and sums that +show you how to compute the mapping. With a little creativity, you can +easily reduce those to a set of bit shifts and ORs. By far the simplest +formulation, however, is as a mapping from the bits of the texture +coordinates to bits in the address. Suppose that :math:`(u, v)` is +location of a 1-byte element within a tile. If you represent :math:`u` +as :math:`u_n u_{n-1} \cdots u_2 u_1 u_0` where :math:`u_0` is the LSB +and :math:`u_n` is the MSB of :math:`u` and similarly +:math:`v = v_m v_{m-1} \cdots v_2 v_1 v_0`, then the bits of the address +within the tile are given by the table below: + ++---------------+------+------+------+------+------+------+------+------+------+------+------+------+ +| Tiling | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | ++===============+======+======+======+======+======+======+======+======+======+======+======+======+ +| ``ISL_TILING_ | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | +| X`` | h:`v | h:`v | h:`v | h:`v | h:`u | h:`u | h:`u | h:`u | h:`u | h:`u | h:`u | h:`u | +| | _3` | _2` | _1` | _0` | _7` | _6` | _5` | _4` | _3` | _2` | _1` | _0` | ++---------------+------+------+------+------+------+------+------+------+------+------+------+------+ +| ``ISL_TILING_ | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | +| Y0`` | h:`u | h:`u | h:`u | h:`v | h:`v | h:`v | h:`v | h:`v | h:`u | h:`u | h:`u | h:`u | +| | _6` | _5` | _4` | _4` | _3` | _2` | _1` | _0` | _3` | _2` | _1` | _0` | ++---------------+------+------+------+------+------+------+------+------+------+------+------+------+ +| ``ISL_TILING_ | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | :mat | +| W`` | h:`u | h:`u | h:`u | h:`v | h:`v | h:`v | h:`v | h:`u | h:`v | h:`u | h:`v | h:`u | +| | _5` | _4` | _3` | _5` | _4` | _3` | _2` | _2` | _1` | _1` | _0` | _0` | ++---------------+------+------+------+------+------+------+------+------+------+------+------+------+ + +Constructing the mapping this way makes a lot of sense when you think +about hardware. It may seem complex on paper but "simple" things such as +addition are relatively expensive in hardware while interleaving bits in +a well-defined pattern is practically free. For a format that has more +than one byte per element, you simply chop bits off the bottom of the +pattern, hard-code them to 0, and adjust bit indices as needed. For a +128-bit format, for instance, the Y-tiled pattern becomes +:math:`u_2 u_1 u_0 v_4 v_3 v_2 v_1 v_0`. The Sky Lake PRM Vol. 5 in the +section "2D Surfaces" contains an expanded version of the above table +(which we will not repeat here) that also includes the bit patterns for +the Ys and Yf tiling formats. -- cgit v1.2.3