diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-02-25 10:29:09 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-02-25 10:29:09 -0800 |
commit | 7b46588f364f4f40c25f43ceabb6f705d20793e2 (patch) | |
tree | 29e80019ee791abe58176161f3ae2b766749b808 /Documentation | |
parent | 915f3e3f76c05b2da93c4cc278eebc2d9219d9f4 (diff) | |
parent | 95330473636e5e4546f94874c957c3be66bb2140 (diff) |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
- almost all of the rest of MM
- misc bits
- KASAN updates
- procfs
- lib/ updates
- checkpatch updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (124 commits)
checkpatch: remove false unbalanced braces warning
checkpatch: notice unbalanced else braces in a patch
checkpatch: add another old address for the FSF
checkpatch: update $logFunctions
checkpatch: warn on logging continuations
checkpatch: warn on embedded function names
lib/lz4: remove back-compat wrappers
fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version
crypto: change LZ4 modules to work with new LZ4 module version
lib/decompress_unlz4: change module to work with new LZ4 module version
lib: update LZ4 compressor module
lib/test_sort.c: make it explicitly non-modular
lib: add CONFIG_TEST_SORT to enable self-test of sort()
rbtree: use designated initializers
linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors
lib/find_bit.c: micro-optimise find_next_*_bit
lib: add module support to atomic64 tests
lib: add module support to glob tests
lib: add module support to crc32 tests
kernel/ksysfs.c: add __ro_after_init to bin_attribute structure
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/blockdev/zram.txt | 6 | ||||
-rw-r--r-- | Documentation/sysctl/vm.txt | 4 | ||||
-rw-r--r-- | Documentation/vm/ksm.txt | 18 | ||||
-rw-r--r-- | Documentation/vm/userfaultfd.txt | 89 |
4 files changed, 112 insertions, 5 deletions
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 1c0c08d9206b..4fced8a21307 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -201,8 +201,8 @@ File /sys/block/zram<id>/mm_stat The stat file represents device's mm statistics. It consists of a single line of text and contains the following stats separated by whitespace: orig_data_size uncompressed size of data stored in this disk. - This excludes zero-filled pages (zero_pages) since no - memory is allocated for them. + This excludes same-element-filled pages (same_pages) since + no memory is allocated for them. Unit: bytes compr_data_size compressed size of data stored in this disk mem_used_total the amount of memory allocated for this disk. This @@ -214,7 +214,7 @@ line of text and contains the following stats separated by whitespace: the compressed data mem_used_max the maximum amount of memory zram have consumed to store the data - zero_pages the number of zero filled pages written to this disk. + same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. pages_compacted the number of pages freed during compaction diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 95ccbe6d79ce..b4ad97f10b8e 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -376,8 +376,8 @@ max_map_count: This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling -malloc, directly by mmap and mprotect, and also when loading shared -libraries. +malloc, directly by mmap, mprotect, and madvise, and also when loading +shared libraries. While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt index f34a8ee6f860..6b0ca7feb135 100644 --- a/Documentation/vm/ksm.txt +++ b/Documentation/vm/ksm.txt @@ -38,6 +38,10 @@ the range for whenever the KSM daemon is started; even if the range cannot contain any pages which KSM could actually merge; even if MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE. +If a region of memory must be split into at least one new MADV_MERGEABLE +or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process +will exceed vm.max_map_count (see Documentation/sysctl/vm.txt). + Like other madvise calls, they are intended for use on mapped areas of the user address space: they will report ENOMEM if the specified range includes unmapped gaps (though working on the intervening mapped areas), @@ -80,6 +84,20 @@ run - set 0 to stop ksmd from running but keep merged pages, Default: 0 (must be changed to 1 to activate KSM, except if CONFIG_SYSFS is disabled) +use_zero_pages - specifies whether empty pages (i.e. allocated pages + that only contain zeroes) should be treated specially. + When set to 1, empty pages are merged with the kernel + zero page(s) instead of with each other as it would + happen normally. This can improve the performance on + architectures with coloured zero pages, depending on + the workload. Care should be taken when enabling this + setting, as it can potentially degrade the performance + of KSM for some workloads, for example if the checksums + of pages candidate for merging match the checksum of + an empty page. This setting can be changed at any time, + it is only effective for pages merged after the change. + Default: 0 (normal KSM behaviour as in earlier releases) + The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/: pages_shared - how many shared pages are being used diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt index 70a3c94d1941..fe51a5aa8963 100644 --- a/Documentation/vm/userfaultfd.txt +++ b/Documentation/vm/userfaultfd.txt @@ -54,6 +54,26 @@ uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of respectively all the available features of the read(2) protocol and the generic ioctl available. +The uffdio_api.features bitmask returned by the UFFDIO_API ioctl +defines what memory types are supported by the userfaultfd and what +events, except page fault notifications, may be generated. + +If the kernel supports registering userfaultfd ranges on hugetlbfs +virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in +uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be +set if the kernel supports registering userfaultfd ranges on shared +memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero +MAP_SHARED, memfd_create, etc). + +The userland application that wants to use userfaultfd with hugetlbfs +or shared memory need to set the corresponding flag in +uffdio_api.features to enable those features. + +If the userland desires to receive notifications for events other than +page faults, it has to verify that uffdio_api.features has appropriate +UFFD_FEATURE_EVENT_* bits set. These events are described in more +detail below in "Non-cooperative userfaultfd" section. + Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should be invoked (if present in the returned uffdio_api.ioctls bitmask) to register a memory range in the userfaultfd by setting the @@ -142,3 +162,72 @@ course the bitmap is updated accordingly. It's also useful to avoid sending the same page twice (in case the userfault is read by the postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration thread). + +== Non-cooperative userfaultfd == + +When the userfaultfd is monitored by an external manager, the manager +must be able to track changes in the process virtual memory +layout. Userfaultfd can notify the manager about such changes using +the same read(2) protocol as for the page fault notifications. The +manager has to explicitly enable these events by setting appropriate +bits in uffdio_api.features passed to UFFDIO_API ioctl: + +UFFD_FEATURE_EVENT_EXIT - enable notification about exit() of the +non-cooperative process. When the monitored process exits, the uffd +manager will get UFFD_EVENT_EXIT. + +UFFD_FEATURE_EVENT_FORK - enable userfaultfd hooks for fork(). When +this feature is enabled, the userfaultfd context of the parent process +is duplicated into the newly created process. The manager receives +UFFD_EVENT_FORK with file descriptor of the new userfaultfd context in +the uffd_msg.fork. + +UFFD_FEATURE_EVENT_REMAP - enable notifications about mremap() +calls. When the non-cooperative process moves a virtual memory area to +a different location, the manager will receive UFFD_EVENT_REMAP. The +uffd_msg.remap will contain the old and new addresses of the area and +its original length. + +UFFD_FEATURE_EVENT_REMOVE - enable notifications about +madvise(MADV_REMOVE) and madvise(MADV_DONTNEED) calls. The event +UFFD_EVENT_REMOVE will be generated upon these calls to madvise. The +uffd_msg.remove will contain start and end addresses of the removed +area. + +UFFD_FEATURE_EVENT_UNMAP - enable notifications about memory +unmapping. The manager will get UFFD_EVENT_UNMAP with uffd_msg.remove +containing start and end addresses of the unmapped area. + +Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP +are pretty similar, they quite differ in the action expected from the +userfaultfd manager. In the former case, the virtual memory is +removed, but the area is not, the area remains monitored by the +userfaultfd, and if a page fault occurs in that area it will be +delivered to the manager. The proper resolution for such page fault is +to zeromap the faulting address. However, in the latter case, when an +area is unmapped, either explicitly (with munmap() system call), or +implicitly (e.g. during mremap()), the area is removed and in turn the +userfaultfd context for such area disappears too and the manager will +not get further userland page faults from the removed area. Still, the +notification is required in order to prevent manager from using +UFFDIO_COPY on the unmapped area. + +Unlike userland page faults which have to be synchronous and require +explicit or implicit wakeup, all the events are delivered +asynchronously and the non-cooperative process resumes execution as +soon as manager executes read(). The userfaultfd manager should +carefully synchronize calls to UFFDIO_COPY with the events +processing. To aid the synchronization, the UFFDIO_COPY ioctl will +return -ENOSPC when the monitored process exits at the time of +UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed +its virtual memory layout simultaneously with outstanding UFFDIO_COPY +operation. + +The current asynchronous model of the event delivery is optimal for +single threaded non-cooperative userfaultfd manager implementations. A +synchronous event delivery model can be added later as a new +userfaultfd feature to facilitate multithreading enhancements of the +non cooperative manager, for example to allow UFFDIO_COPY ioctls to +run in parallel to the event reception. Single threaded +implementations should continue to use the current async event +delivery model instead. |