summaryrefslogtreecommitdiff
path: root/backend/doc/flat_address_space.html
diff options
context:
space:
mode:
Diffstat (limited to 'backend/doc/flat_address_space.html')
-rw-r--r--backend/doc/flat_address_space.html93
1 files changed, 0 insertions, 93 deletions
diff --git a/backend/doc/flat_address_space.html b/backend/doc/flat_address_space.html
deleted file mode 100644
index d7c30d8a..00000000
--- a/backend/doc/flat_address_space.html
+++ /dev/null
@@ -1,93 +0,0 @@
-<h1>Flat Address Space</h1>
-
-<h2>Segmented address space...</h2>
-
-<p>The first challenge with OpenCL is its very liberal use of pointers. The memory
-is segment into several address spaces:</p>
-
-<ul>
-<li><p>private. This is the memory for each work item</p></li>
-<li><p>global. These are buffers in memory shared by all work items and work groups</p></li>
-<li><p>constant. These are constant buffers in memory shared by all work items and
-work groups as well</p></li>
-<li><p>local. These is a memory shared by all work items in the <em>same</em> work group</p></li>
-</ul>
-
-<h2>... But with no restriction inside each address space</h2>
-
-<p>The challenge is that there is no restriction in OpenCL inside each address
-space i.e. the full C semantic applies in particular regarding pointer
-arithmetic.</p>
-
-<p>Therefore the following code is valid:</p>
-
-<p><code>
-__kernel void example(__global int *dst, __global int *src0, __global int *src1)<br/>
-{<br/>
-&nbsp;&nbsp;__global int *from;<br/>
-&nbsp;&nbsp;if (get_global_id(0) % 2)<br/>
-&nbsp;&nbsp;&nbsp;&nbsp;from = src0;<br/>
-&nbsp;&nbsp;else<br/>
-&nbsp;&nbsp;&nbsp;&nbsp;from = src1;<br/>
-&nbsp;&nbsp;dst[get_global_id(0)] = from[get_global_id(0)];<br/>
-}
-</code></p>
-
-<p>As one may see, the load done in the last line actually mixes pointers from both
-source src0 and src1. This typically makes the use of binding table indices
-pretty hard. In we use binding table 0 for dst, 1 for src0 and 2 for src1 (for
-example), we are not able to express the load in the last line with one send
-only.</p>
-
-<h2>No support for stateless in required messages</h2>
-
-<p>Furthermore, in IVB, we are going four types of messages to implement the loads
-and the stores</p>
-
-<ul>
-<li><p>Byte scattered reads. They are used to read bytes/shorts/integers that are not
-aligned on 4 bytes. This is a gather message i.e. the user provides up to 16
-addresses</p></li>
-<li><p>Byte scattered writes. They are used to write bytes/shorts/integers that are not
-aligned on 4 bytes. This is a scatter message i.e. the user provides up to 16
-addresses</p></li>
-<li><p>Untyped reads. They allow to read from 1 to 4 double words (i.e 4 bytes) per
-lane. This is also a gather message i.e. up to 16 address are provided per
-message.</p></li>
-<li><p>Untyped writes. They are the counter part of the untyped reads</p></li>
-</ul>
-
-<p>Problem is that IVB does not support stateless accesses for these messages. So
-surfaces are required. Secondly, stateless messages are not that interesting
-since all of them require a header which is still slow to assemble.</p>
-
-<h2>Implemented solution</h2>
-
-<p>The solution is actually quite simple. Even with no stateless support, it is
-actually possible to simulate it with a surface. As one may see in the run-time
-code in <code>intel/intel_gpgpu.c</code>, we simply create a surface:</p>
-
-<ul>
-<li><p>2GB big</p></li>
-<li><p>Which starts at offset 0</p></li>
-</ul>
-
-<p>Surprisingly, this surface can actually map the complete GTT address space which
-is 2GB big. One may look at <code>flat_address_space</code> unit test in the run-time code
-that creates and copies buffers in such a way that the complete GTT address
-space is traversed.</p>
-
-<p>This solution brings a pretty simple implementation in the compiler side.
-Basically, there is nothing to do when translating from LLVM to Gen ISA. A
-pointer to <code>__global</code> or <code>__constant</code> memory is simply a 32 bits offset in that
-surface.</p>
-
-<h2>Related problems</h2>
-
-<p>There is one drawback for this approach. Since we use a 2GB surface that maps
-the complete GTT space, there is no protection at all. Each write can therefore
-potentially modify any buffer including the command buffer, the frame buffer or
-the kernel code. There is <em>no</em> protection at all in the hardware to prevent
-that.</p>
-
-<p><a href="../README.html">Up</a></p>