The ww_mutex_test kernel module Compile using make make install Start a test sequence using sudo modprobe wtest After test sequence use sudo rmmod wtest The test sequence simulates a number of gpu command submissions taking ww_mutex_locks. The module maintains a number of global locks and each thread is required to lock a local number of locks that are randomly determined from the set of global locks. Typically threads will sometimes try to acquire the same lock and the ww mutex rollback will be initialized. Defines to tune this behaviour (ww_mutex_test.h): #define WWT_NUM_LOCKS 100000 /* Number of global locks * #define WWT_NUM_T_LOCKS 800 /* Number of locks per thread out of the global */ #define WWT_NUM_THREADS 16 /* Number of locking threads to fire off */ Each thread performs a number of simulated command submissions with the same locks. Each command submission consists of *) Taking the locks *) Busy-wait for a while (Mimicing time use to submit GPU commands) *) Releasing the locks. The busy wait makes it more difficult to judge how much time was used for the actual locking, but would on the other hand give more real-world-like results for the number of rollbacks. Related defines: #define WWT_NUM_SUB 10000 /* Number of command submissions */ #define WWT_CS_UDELAY 000 /* Command submission udelay */ The results can be wiewed as starting and ending time for each thread in "dmesg". Each thread also prints the number of rollbacks it had to do. There are two ways to have zero rollbacks: One is to fire off the threads sequentially in which case there will be no contention. The other one is to make sure there are no common locks between threads. Be careful with the latter option so that there are enough global locks to accommodate the requests for all threads. Otherwise module loading may lock up. Related defines: #define WWT_NO_SHARED /* No shared mutexes - No rollbacks */ #define WWT_SEQUENTIAL /* Fire off locking threads sequentially */ The module can either use the kernel built-in ww mutex implementation or a replacement drop-in implementation. The drop in replacement implements a choice of algorithms: Wait-Die and Wound-Wait. It's also possible to batch mutex locks and unlocks, significantly reducing the number of locked CPU cycles. Note that the drop-in replacement manipulates locking state under a class global spinlock instead of the builtin atomic operation manipulation. This is slightly slower in cases where the global spinlock is not contended, and significantly slower in cases where the global spinlock is contended, but it allows for batching locks and unlocks in a single global spinlock critical section. Related defines: #define WW_BUILTIN /* Use kernel builtin ww mutexes */ #define WW_WAITDIE true /* Use wait-die, not wound-wait */ #define WW_BATCHING /* Batch locks and unlocks */