mm: batch unlink_file_vma calls in free_pgd_range - drm-misc - Kernel DRM miscellaneous fixes and cross-tree changes

diff options

author	Mateusz Guzik <mjguzik@gmail.com>	2024-05-22 01:43:21 +0200
committer	Andrew Morton <akpm@linux-foundation.org>	2024-07-03 19:29:58 -0700
commit	3577dbb192419e37b6f54aced8777b6c81cd03d4 (patch)
tree	0debd41a85915375f2a82f3ad53796ac4efbfb19 /mm/mm_init.c
parent	1a3798dececa8cb26b9eee26840195ccc1a4d6c1 (diff)

mm: batch unlink_file_vma calls in free_pgd_range

Execs of dynamically linked binaries at 20-ish cores are bottlenecked on the i_mmap_rwsem semaphore, while the biggest singular contributor is free_pgd_range inducing the lock acquire back-to-back for all consecutive mappings of a given file. Tracing the count of said acquires while building the kernel shows: [1, 2) 799579 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 3) 0 | | [3, 4) 3009 | | [4, 5) 3009 | | [5, 6) 326442 |@@@@@@@@@@@@@@@@@@@@@ | So in particular there were 326442 opportunities to coalesce 5 acquires into 1. Doing so increases execs per second by 4% (~50k to ~52k) when running the benchmark linked below. The lock remains the main bottleneck, I have not looked at other spots yet. Bench can be found here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -O2 -o shared-doexec doexec.c $ ./shared-doexec $(nproc) Note this particular test makes sure binaries are separate, but the loader is shared. Stats collected on the patched kernel (+ "noinline") with: bpftrace -e 'kprobe:unlink_file_vma_batch_process { @ = lhist(((struct unlink_vma_file_batch *)arg0)->count, 0, 8, 1); }' Link: https://lkml.kernel.org/r/20240521234321.359501-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Diffstat (limited to 'mm/mm_init.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: