The ww_mutex_test kernel module

Compile using
make
make install

Start a test sequence using
sudo modprobe wtest

After test sequence use
sudo rmmod wtest

The test sequence simulates a number of gpu command submissions taking
ww_mutex_locks. The module maintains a number of global locks and each thread
is required to lock a local number of locks that are randomly determined from
the set of global locks. Typically threads will sometimes try to acquire
the same lock and the ww mutex rollback will be initialized. Defines to
tune this behaviour (ww_mutex_test.h):
#define WWT_NUM_LOCKS 100000 /* Number of global locks *
#define WWT_NUM_T_LOCKS 800  /* Number of locks per thread out of the global */
#define WWT_NUM_THREADS 16   /* Number of locking threads to fire off */

Each thread performs a number of simulated command submissions with the same
locks. Each command submission consists of
*) Taking the locks
*) Busy-wait for a while (Mimicing time use to submit GPU commands)
*) Releasing the locks.
The busy wait makes it more difficult to judge how much time was used for
the actual locking, but would on the other hand give more real-world-like
results for the number of rollbacks. Related defines:
#define WWT_NUM_SUB 10000    /* Number of command submissions */
#define WWT_CS_UDELAY 000    /* Command submission udelay */

The results can be wiewed as starting and ending time for each thread in
"dmesg". Each thread also prints the number of rollbacks it had to do.
There are two ways to have zero rollbacks: One is to fire off the threads
sequentially in which case there will be no contention. The other one is to
make sure there are no common locks between threads. Be careful with the latter
option so that there are enough global locks to accommodate the requests for
all threads. Otherwise module loading may lock up.
Related defines:
#define WWT_NO_SHARED         /* No shared mutexes - No rollbacks */
#define WWT_SEQUENTIAL        /* Fire off locking threads sequentially */

The module can either use the kernel built-in ww mutex implementation or a
replacement drop-in implementation. The drop in replacement implements a
choice of algorithms: Wait-Die and Wound-Wait. It's also possible to batch
mutex locks and unlocks, significantly reducing the number of locked CPU cycles.
Note that the drop-in replacement manipulates locking state under a class
global spinlock instead of the builtin atomic operation manipulation. This
is slightly slower in cases where the global spinlock is not contended, and
significantly slower in cases where the global spinlock is contended, but
it allows for batching locks and unlocks in a single global spinlock
critical section.

Related defines:
#define WW_BUILTIN            /* Use kernel builtin ww mutexes */
#define WW_WAITDIE true       /* Use wait-die, not wound-wait */
#define WW_BATCHING           /* Batch locks and unlocks */