2 files changed, 169 insertions, 1 deletions
diff --git a/README.html b/README.html
new file mode 100644
index 00000000..4e2c955c
--- /dev/null
+++ b/README.html
@@ -0,0 +1,159 @@
+<h1>OpenCL Runtime</h1>
+
+<p>This code base contains the code to run OpenCL programs on Intel GPUs. This is
+basically the run-time code i.e. it defines the OpenCL host functions required
+to initialize the device, create the command queues, the kernels and the
+programs and run them on the GPU. The run-time does <em>not</em> contain the compiler.
+The OpenCL compiler has its own shared object and both the run-time and the
+compiler are interfaced with a regular C layer.</p>
+
+<h2>How to build</h2>
+
+<p>The project uses CMake with three profiles:</p>
+
+<ol>
+<li>Debug (-g)</li>
+<li>RelWithDebInfo (-g with optimizations)</li>
+<li>Release (only optimizations)</li>
+</ol>
+
+<p>Basically, from the root directory of the project</p>
+
+<p><code>&gt; mkdir build</code></p>
+
+<p><code>&gt; ccmake ../ # to configure</code></p>
+
+<p>Choose whatever you want for the build.</p>
+
+<p>Then press 'c' to configure and 'g' to generate the code.</p>
+
+<p><code>&gt; make</code></p>
+
+<p>The project depends on several external libraries:</p>
+
+<ul>
+<li>Several X components (XLib, Xfixes, Xext)</li>
+<li>libdrm libraries (libdrm and libdrm_intel)</li>
+<li>The compiler backend itself (libgbe)</li>
+</ul>
+
+<p>CMake will check the dependencies and will complain if it does not find them.</p>
+
+<p>Once built, the run-time produces a shared object libcl.so which basically
+directly implements the OpenCL API. A set of tests are also produced. They may
+be found in utests.</p>
+
+<h2>How to run</h2>
+
+<p>Apart from the OpenCL library itself that can be used by any OpenCL application,
+this code also produces various tests to ensure the compiler and the run-time
+consistency. This small test framework uses a simple c++ registration system to
+register all the unit tests.</p>
+
+<p>You need to set the variable <code>OCL_KERNEL_PATH</code> to locate the OCL kernels. They
+are with the run-time in <code>./kernels</code>.</p>
+
+<p>Then in <code>utests/</code>:</p>
+
+<p><code>&gt; ./run</code></p>
+
+<p>will run all the unit tests one after the others</p>
+
+<p><code>&gt; ./run some_unit_test0 some_unit_test1</code></p>
+
+<p>will only run <code>some_unit_test0</code> and <code>some_unit_test1</code> tests</p>
+
+<p>As an important remark, the code was only tested on IVB GT2 with a rather
+minimal Linux distribution (ArchLinux) and a very small desktop (dwm). If you
+use something more sophisticated using compiz or similar stuffs, you may expect
+serious problems and GPU hangs.</p>
+
+<h2>TODO</h2>
+
+<p>The run-time is far from being complete. Most of the pieces have been put
+together to test and develop the OpenCL compiler. A partial list of things to
+do:</p>
+
+<ul>
+<li><p>Support for samplers / textures but it should be rather easy since the
+low-level parts of the code already supports it</p></li>
+<li><p>Support for events</p></li>
+<li><p>Check that NDRangeKernels can be pushed into <em>different</em> queues from several
+threads </p></li>
+<li><p>Support for Enqueue*Buffer. I added a straightforward extension to map /
+unmap buffer. This extension <code>clIntelMapBuffer</code> directly maps <code>dri_bo_map</code>
+which is really convenient</p></li>
+<li><p>Full support for images. Today, the code just tiles everything <em>manually</em>
+which is really bad. I think the best solution to copy and create images is to
+use the GPU and typed writes (scatter to textures) or samplers. We would
+however need the vmap extension proposed by Chris Wilson to be able to map
+user pointers while doing to copies and the conversions.</p></li>
+<li><p>No state tracking at all. One batch buffer is created at each "draw call"
+(i.e. for each NDRangeKernels). This is really inefficient since some
+expensive pipe controls are issued for each batch buffer</p></li>
+<li><p>Valgrind reports some leaks in libdrm. It sounds like a false positive but it
+has to be checked. Idem for LLVM. There is one leak here to check</p></li>
+</ul>
+
+<p>More generally, everything in the run-time that triggers the "FATAL" macro means
+that something that must be supported is not implemented properly (either it
+does not comply with the standard or it is just missing)</p>
+
+<h2>Fulsim</h2>
+
+<p>The code base supports a seamless integration with Fulsim i.e. you do not need
+to run anything else than your application to make Fulsim work with it. However,
+some specific step have to be completed first to make it work.</p>
+
+<ul>
+<li><p>Compilation phase. You need to compile the project with fulsim enabled. You
+should choose <code>EMULATE_IVB ON</code> in ccmake options. Actually, Haswell has not been
+tested that much recently so there is a large probability it will not work
+properly</p></li>
+<li><p>Fulsim executables and DLL. Copy and paste fulsim <em>Windows</em> executables and
+DLLs into the directory where you run your code. The run-time will simply call
+AubLoad.exe to run Fulsim. You can get fulsim from our subversion server. We
+compile versions of it. They are all located in
+<a href="https://subversion.jf.intel.com/cag/gen/gpgpu/fulsim/">here</a></p></li>
+<li><p>Run-time phase. You need to fake the machine you want to simulate. Small
+scripts in the root directory of the project are responsible for doing that:</p></li>
+</ul>
+
+<p><code>&gt; source setup_fulsim_ivb.sh 1</code></p>
+
+<p>will run fulsim in debug mode i.e. you will be able to step into the EU code</p>
+
+<p><code>&gt; source setup_fulsim_ivb.sh 0</code></p>
+
+<p>will simply run fulsim</p>
+
+<ul>
+<li>Modified libdrm. Unfortunately, to support fulsim, this run-time uses a
+modified libdrm library (in particular to support binary buffers and a seamless
+integration with the run-time). See below.</li>
+</ul>
+
+<h2>C++ simulator</h2>
+
+<p>The compiler is able to produce c++ file that simulate the behavior of the
+kernel. The idea is mostly to be able to gather statistics about how the kernel
+can run (SIMD occupancy, bank conflicts in shared local memory or cache hit/miss
+rates). Basically, the compiler generates a c++ file from the LLVM file (with
+some extra steps detailed in the OpenCL compiler documentation). Then, GCC (or
+ICC) is directly called to generate a shared object.</p>
+
+<p>The run-time is actually able to run the simulation code directly. To enable it
+(and to also enable the c++ path in the compile code), a small script in the
+root directory has to be run:</p>
+
+<p><code>&gt; source setup_perfim_ivb.sh</code></p>
+
+<p>Doing that, the complete C++ simulation path is enabled.</p>
+
+<h2>Modified libdrm</h2>
+
+<p>Right now, a modified libdrm is required to run fulsim. It completely disables
+the HW path (nothing will run on the HW at all) and allows to selectively dump
+any OpenCL buffer. Contact Ben Segovia to get the access to it.</p>
+
+<p>Ben Segovia (<a href="&#109;&#97;&#105;&#108;&#116;&#x6F;:&#x62;&#101;&#110;&#x6A;&#97;&#x6D;&#105;&#110;&#x2E;&#115;&#101;&#103;&#x6F;&#x76;&#105;&#x61;&#64;&#x69;&#110;&#116;&#x65;&#108;&#46;&#99;&#x6F;&#x6D;">&#x62;&#101;&#110;&#x6A;&#97;&#x6D;&#105;&#110;&#x2E;&#115;&#101;&#103;&#x6F;&#x76;&#105;&#x61;&#64;&#x69;&#110;&#116;&#x65;&#108;&#46;&#99;&#x6F;&#x6D;</a>)</p>
diff --git a/README.md b/README.md
index 2ef5688c..77efac5c 100644
--- a/README.md
+++ b/README.md
@@ -48,7 +48,11 @@ Apart from the OpenCL library itself that can be used by any OpenCL application,
 this code also produces various tests to ensure the compiler and the run-time
 consistency. This small test framework uses a simple c++ registration system to
 register all the unit tests.
-Typically, in utests/:
+
+You need to set the variable `OCL_KERNEL_PATH` to locate the OCL kernels. They
+are with the run-time in `./kernels`.
+
+Then in `utests/`:
 
 `> ./run`
 
@@ -58,6 +62,11 @@ will run all the unit tests one after the others
 
 will only run `some_unit_test0` and `some_unit_test1` tests
 
+As an important remark, the code was only tested on IVB GT2 with a rather
+minimal Linux distribution (ArchLinux) and a very small desktop (dwm). If you
+use something more sophisticated using compiz or similar stuffs, you may expect
+serious problems and GPU hangs.
+
 TODO
 ----