diff options
author | Benjamin Segovia <segovia.benjamin@gmail.com> | 2012-06-19 21:53:26 +0000 |
---|---|---|
committer | Keith Packard <keithp@keithp.com> | 2012-08-10 16:18:53 -0700 |
commit | f26a40e9b7908fc8ab48c2efe260d04045b77f40 (patch) | |
tree | f9dce5d71964b12bb81dcc161e7d734cb9a79bdf | |
parent | 1671526b15a8aac1378a366002ec2c85b0596609 (diff) |
Updated and compiled README
-rw-r--r-- | README.html | 159 | ||||
-rw-r--r-- | README.md | 11 |
2 files changed, 169 insertions, 1 deletions
diff --git a/README.html b/README.html new file mode 100644 index 00000000..4e2c955c --- /dev/null +++ b/README.html @@ -0,0 +1,159 @@ +<h1>OpenCL Runtime</h1> + +<p>This code base contains the code to run OpenCL programs on Intel GPUs. This is +basically the run-time code i.e. it defines the OpenCL host functions required +to initialize the device, create the command queues, the kernels and the +programs and run them on the GPU. The run-time does <em>not</em> contain the compiler. +The OpenCL compiler has its own shared object and both the run-time and the +compiler are interfaced with a regular C layer.</p> + +<h2>How to build</h2> + +<p>The project uses CMake with three profiles:</p> + +<ol> +<li>Debug (-g)</li> +<li>RelWithDebInfo (-g with optimizations)</li> +<li>Release (only optimizations)</li> +</ol> + +<p>Basically, from the root directory of the project</p> + +<p><code>> mkdir build</code></p> + +<p><code>> ccmake ../ # to configure</code></p> + +<p>Choose whatever you want for the build.</p> + +<p>Then press 'c' to configure and 'g' to generate the code.</p> + +<p><code>> make</code></p> + +<p>The project depends on several external libraries:</p> + +<ul> +<li>Several X components (XLib, Xfixes, Xext)</li> +<li>libdrm libraries (libdrm and libdrm_intel)</li> +<li>The compiler backend itself (libgbe)</li> +</ul> + +<p>CMake will check the dependencies and will complain if it does not find them.</p> + +<p>Once built, the run-time produces a shared object libcl.so which basically +directly implements the OpenCL API. A set of tests are also produced. They may +be found in utests.</p> + +<h2>How to run</h2> + +<p>Apart from the OpenCL library itself that can be used by any OpenCL application, +this code also produces various tests to ensure the compiler and the run-time +consistency. This small test framework uses a simple c++ registration system to +register all the unit tests.</p> + +<p>You need to set the variable <code>OCL_KERNEL_PATH</code> to locate the OCL kernels. They +are with the run-time in <code>./kernels</code>.</p> + +<p>Then in <code>utests/</code>:</p> + +<p><code>> ./run</code></p> + +<p>will run all the unit tests one after the others</p> + +<p><code>> ./run some_unit_test0 some_unit_test1</code></p> + +<p>will only run <code>some_unit_test0</code> and <code>some_unit_test1</code> tests</p> + +<p>As an important remark, the code was only tested on IVB GT2 with a rather +minimal Linux distribution (ArchLinux) and a very small desktop (dwm). If you +use something more sophisticated using compiz or similar stuffs, you may expect +serious problems and GPU hangs.</p> + +<h2>TODO</h2> + +<p>The run-time is far from being complete. Most of the pieces have been put +together to test and develop the OpenCL compiler. A partial list of things to +do:</p> + +<ul> +<li><p>Support for samplers / textures but it should be rather easy since the +low-level parts of the code already supports it</p></li> +<li><p>Support for events</p></li> +<li><p>Check that NDRangeKernels can be pushed into <em>different</em> queues from several +threads </p></li> +<li><p>Support for Enqueue*Buffer. I added a straightforward extension to map / +unmap buffer. This extension <code>clIntelMapBuffer</code> directly maps <code>dri_bo_map</code> +which is really convenient</p></li> +<li><p>Full support for images. Today, the code just tiles everything <em>manually</em> +which is really bad. I think the best solution to copy and create images is to +use the GPU and typed writes (scatter to textures) or samplers. We would +however need the vmap extension proposed by Chris Wilson to be able to map +user pointers while doing to copies and the conversions.</p></li> +<li><p>No state tracking at all. One batch buffer is created at each "draw call" +(i.e. for each NDRangeKernels). This is really inefficient since some +expensive pipe controls are issued for each batch buffer</p></li> +<li><p>Valgrind reports some leaks in libdrm. It sounds like a false positive but it +has to be checked. Idem for LLVM. There is one leak here to check</p></li> +</ul> + +<p>More generally, everything in the run-time that triggers the "FATAL" macro means +that something that must be supported is not implemented properly (either it +does not comply with the standard or it is just missing)</p> + +<h2>Fulsim</h2> + +<p>The code base supports a seamless integration with Fulsim i.e. you do not need +to run anything else than your application to make Fulsim work with it. However, +some specific step have to be completed first to make it work.</p> + +<ul> +<li><p>Compilation phase. You need to compile the project with fulsim enabled. You +should choose <code>EMULATE_IVB ON</code> in ccmake options. Actually, Haswell has not been +tested that much recently so there is a large probability it will not work +properly</p></li> +<li><p>Fulsim executables and DLL. Copy and paste fulsim <em>Windows</em> executables and +DLLs into the directory where you run your code. The run-time will simply call +AubLoad.exe to run Fulsim. You can get fulsim from our subversion server. We +compile versions of it. They are all located in +<a href="https://subversion.jf.intel.com/cag/gen/gpgpu/fulsim/">here</a></p></li> +<li><p>Run-time phase. You need to fake the machine you want to simulate. Small +scripts in the root directory of the project are responsible for doing that:</p></li> +</ul> + +<p><code>> source setup_fulsim_ivb.sh 1</code></p> + +<p>will run fulsim in debug mode i.e. you will be able to step into the EU code</p> + +<p><code>> source setup_fulsim_ivb.sh 0</code></p> + +<p>will simply run fulsim</p> + +<ul> +<li>Modified libdrm. Unfortunately, to support fulsim, this run-time uses a +modified libdrm library (in particular to support binary buffers and a seamless +integration with the run-time). See below.</li> +</ul> + +<h2>C++ simulator</h2> + +<p>The compiler is able to produce c++ file that simulate the behavior of the +kernel. The idea is mostly to be able to gather statistics about how the kernel +can run (SIMD occupancy, bank conflicts in shared local memory or cache hit/miss +rates). Basically, the compiler generates a c++ file from the LLVM file (with +some extra steps detailed in the OpenCL compiler documentation). Then, GCC (or +ICC) is directly called to generate a shared object.</p> + +<p>The run-time is actually able to run the simulation code directly. To enable it +(and to also enable the c++ path in the compile code), a small script in the +root directory has to be run:</p> + +<p><code>> source setup_perfim_ivb.sh</code></p> + +<p>Doing that, the complete C++ simulation path is enabled.</p> + +<h2>Modified libdrm</h2> + +<p>Right now, a modified libdrm is required to run fulsim. It completely disables +the HW path (nothing will run on the HW at all) and allows to selectively dump +any OpenCL buffer. Contact Ben Segovia to get the access to it.</p> + +<p>Ben Segovia (<a href="mailto:benjamin.segovia@intel.com">benjamin.segovia@intel.com</a>)</p> @@ -48,7 +48,11 @@ Apart from the OpenCL library itself that can be used by any OpenCL application, this code also produces various tests to ensure the compiler and the run-time consistency. This small test framework uses a simple c++ registration system to register all the unit tests. -Typically, in utests/: + +You need to set the variable `OCL_KERNEL_PATH` to locate the OCL kernels. They +are with the run-time in `./kernels`. + +Then in `utests/`: `> ./run` @@ -58,6 +62,11 @@ will run all the unit tests one after the others will only run `some_unit_test0` and `some_unit_test1` tests +As an important remark, the code was only tested on IVB GT2 with a rather +minimal Linux distribution (ArchLinux) and a very small desktop (dwm). If you +use something more sophisticated using compiz or similar stuffs, you may expect +serious problems and GPU hangs. + TODO ---- |