diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/accounting/psi.txt | 107 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 17 | ||||
-rw-r--r-- | Documentation/core-api/kernel-api.rst | 4 | ||||
-rw-r--r-- | Documentation/dev-tools/gcov.rst | 18 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/pps/pps-gpio.txt | 7 | ||||
-rw-r--r-- | Documentation/filesystems/autofs-mount-control.txt | 6 | ||||
-rw-r--r-- | Documentation/filesystems/autofs.txt | 66 |
7 files changed, 198 insertions, 27 deletions
diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt index 7e71c9c1d8e9..5cbe5659e3b7 100644 --- a/Documentation/accounting/psi.txt +++ b/Documentation/accounting/psi.txt @@ -63,6 +63,110 @@ as well as medium and long term trends. The total absolute stall time spikes which wouldn't necessarily make a dent in the time averages, or to average trends over custom time frames. +Monitoring for pressure thresholds +================================== + +Users can register triggers and use poll() to be woken up when resource +pressure exceeds certain thresholds. + +A trigger describes the maximum cumulative stall time over a specific +time window, e.g. 100ms of total stall time within any 500ms window to +generate a wakeup event. + +To register a trigger user has to open psi interface file under +/proc/pressure/ representing the resource to be monitored and write the +desired threshold and time window. The open file descriptor should be +used to wait for trigger events using select(), poll() or epoll(). +The following format is used: + +<some|full> <stall amount in us> <time window in us> + +For example writing "some 150000 1000000" into /proc/pressure/memory +would add 150ms threshold for partial memory stall measured within +1sec time window. Writing "full 50000 1000000" into /proc/pressure/io +would add 50ms threshold for full io stall measured within 1sec time window. + +Triggers can be set on more than one psi metric and more than one trigger +for the same psi metric can be specified. However for each trigger a separate +file descriptor is required to be able to poll it separately from others, +therefore for each trigger a separate open() syscall should be made even +when opening the same psi interface file. + +Monitors activate only when system enters stall state for the monitored +psi metric and deactivates upon exit from the stall state. While system is +in the stall state psi signal growth is monitored at a rate of 10 times per +tracking window. + +The kernel accepts window sizes ranging from 500ms to 10s, therefore min +monitoring update interval is 50ms and max is 1s. Min limit is set to +prevent overly frequent polling. Max limit is chosen as a high enough number +after which monitors are most likely not needed and psi averages can be used +instead. + +When activated, psi monitor stays active for at least the duration of one +tracking window to avoid repeated activations/deactivations when system is +bouncing in and out of the stall state. + +Notifications to the userspace are rate-limited to one per tracking window. + +The trigger will de-register when the file descriptor used to define the +trigger is closed. + +Userspace monitor usage example +=============================== + +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <poll.h> +#include <string.h> +#include <unistd.h> + +/* + * Monitor memory partial stall with 1s tracking window size + * and 150ms threshold. + */ +int main() { + const char trig[] = "some 150000 1000000"; + struct pollfd fds; + int n; + + fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK); + if (fds.fd < 0) { + printf("/proc/pressure/memory open error: %s\n", + strerror(errno)); + return 1; + } + fds.events = POLLPRI; + + if (write(fds.fd, trig, strlen(trig) + 1) < 0) { + printf("/proc/pressure/memory write error: %s\n", + strerror(errno)); + return 1; + } + + printf("waiting for events...\n"); + while (1) { + n = poll(&fds, 1, -1); + if (n < 0) { + printf("poll error: %s\n", strerror(errno)); + return 1; + } + if (fds.revents & POLLERR) { + printf("got POLLERR, event source is gone\n"); + return 0; + } + if (fds.revents & POLLPRI) { + printf("event triggered!\n"); + } else { + printf("unknown event received: 0x%x\n", fds.revents); + return 1; + } + } + + return 0; +} + Cgroup2 interface ================= @@ -71,3 +175,6 @@ mounted, pressure stall information is also tracked for tasks grouped into cgroups. Each subdirectory in the cgroupfs mountpoint contains cpu.pressure, memory.pressure, and io.pressure files; the format is the same as the /proc/pressure/ files. + +Per-cgroup psi monitors can be specified and used the same way as +system-wide ones. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 43176340c73d..d1d1da911085 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1830,6 +1830,9 @@ ip= [IP_PNP] See Documentation/filesystems/nfs/nfsroot.txt. + ipcmni_extend [KNL] Extend the maximum number of unique System V + IPC identifiers from 32,768 to 16,777,216. + irqaffinity= [SMP] Set the default irq affinity mask The argument is a cpu list, as described above. @@ -3174,6 +3177,16 @@ This will also cause panics on machine check exceptions. Useful together with panic=30 to trigger a reboot. + page_alloc.shuffle= + [KNL] Boolean flag to control whether the page allocator + should randomize its free lists. The randomization may + be automatically enabled if the kernel detects it is + running on a platform with a direct-mapped memory-side + cache, and this parameter can be used to + override/disable that behavior. The state of the flag + can be read from sysfs at: + /sys/module/page_alloc/parameters/shuffle. + page_owner= [KNL] Boot-time page_owner enabling option. Storage of the information about who allocated each page is disabled in default. With this switch, @@ -4054,7 +4067,9 @@ [[,]s[mp]#### \ [[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \ [[,]f[orce] - Where reboot_mode is one of warm (soft) or cold (hard) or gpio, + Where reboot_mode is one of warm (soft) or cold (hard) or gpio + (prefix with 'panic_' to set mode for panic + reboot only), reboot_type is one of bios, acpi, kbd, triple, efi, or pci, reboot_force is either force or not specified, reboot_cpu is s[mp]#### with #### being the processor diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst index 71f5d2fe39b7..a29c99d13331 100644 --- a/Documentation/core-api/kernel-api.rst +++ b/Documentation/core-api/kernel-api.rst @@ -147,10 +147,10 @@ Division Functions .. kernel-doc:: include/linux/math64.h :internal: -.. kernel-doc:: lib/div64.c +.. kernel-doc:: lib/math/div64.c :functions: div_s64_rem div64_u64_rem div64_u64 div64_s64 -.. kernel-doc:: lib/gcd.c +.. kernel-doc:: lib/math/gcd.c :export: UUID/GUID diff --git a/Documentation/dev-tools/gcov.rst b/Documentation/dev-tools/gcov.rst index 69a7d90c320a..46aae52a41d0 100644 --- a/Documentation/dev-tools/gcov.rst +++ b/Documentation/dev-tools/gcov.rst @@ -34,10 +34,6 @@ Configure the kernel with:: CONFIG_DEBUG_FS=y CONFIG_GCOV_KERNEL=y -select the gcc's gcov format, default is autodetect based on gcc version:: - - CONFIG_GCOV_FORMAT_AUTODETECT=y - and to get coverage data for the entire kernel:: CONFIG_GCOV_PROFILE_ALL=y @@ -169,6 +165,20 @@ b) gcov is run on the BUILD machine [user@build] gcov -o /tmp/coverage/tmp/out/init main.c +Note on compilers +----------------- + +GCC and LLVM gcov tools are not necessarily compatible. Use gcov_ to work with +GCC-generated .gcno and .gcda files, and use llvm-cov_ for Clang. + +.. _gcov: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html +.. _llvm-cov: https://llvm.org/docs/CommandGuide/llvm-cov.html + +Build differences between GCC and Clang gcov are handled by Kconfig. It +automatically selects the appropriate gcov format depending on the detected +toolchain. + + Troubleshooting --------------- diff --git a/Documentation/devicetree/bindings/pps/pps-gpio.txt b/Documentation/devicetree/bindings/pps/pps-gpio.txt index 3683874832ae..9012a2a02e14 100644 --- a/Documentation/devicetree/bindings/pps/pps-gpio.txt +++ b/Documentation/devicetree/bindings/pps/pps-gpio.txt @@ -7,6 +7,10 @@ Required properties: - compatible: should be "pps-gpio" - gpios: one PPS GPIO in the format described by ../gpio/gpio.txt +Additional required properties for the PPS ECHO functionality: +- echo-gpios: one PPS ECHO GPIO in the format described by ../gpio/gpio.txt +- echo-active-ms: duration in ms of the active portion of the echo pulse + Optional properties: - assert-falling-edge: when present, assert is indicated by a falling edge (instead of by a rising edge) @@ -19,5 +23,8 @@ Example: gpios = <&gpio1 26 GPIO_ACTIVE_HIGH>; assert-falling-edge; + echo-gpios = <&gpio1 27 GPIO_ACTIVE_HIGH>; + echo-active-ms = <100>; + compatible = "pps-gpio"; }; diff --git a/Documentation/filesystems/autofs-mount-control.txt b/Documentation/filesystems/autofs-mount-control.txt index 45edad6933cc..acc02fc57993 100644 --- a/Documentation/filesystems/autofs-mount-control.txt +++ b/Documentation/filesystems/autofs-mount-control.txt @@ -354,8 +354,10 @@ this ioctl is called until no further expire candidates are found. The call requires an initialized struct autofs_dev_ioctl with the ioctlfd field set to the descriptor obtained from the open call. In -addition an immediate expire, independent of the mount timeout, can be -requested by setting the how field of struct args_expire to 1. If no +addition an immediate expire that's independent of the mount timeout, +and a forced expire that's independent of whether the mount is busy, +can be requested by setting the how field of struct args_expire to +AUTOFS_EXP_IMMEDIATE or AUTOFS_EXP_FORCED, respectively . If no expire candidates can be found the ioctl returns -1 with errno set to EAGAIN. diff --git a/Documentation/filesystems/autofs.txt b/Documentation/filesystems/autofs.txt index 373ad25852d3..3af38c7fd26d 100644 --- a/Documentation/filesystems/autofs.txt +++ b/Documentation/filesystems/autofs.txt @@ -116,7 +116,7 @@ that purpose there is another flag. **DCACHE_MANAGE_TRANSIT** If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but -related behaviors are invoked, both using the `d_op->d_manage()` +related behaviours are invoked, both using the `d_op->d_manage()` dentry operation. Firstly, before checking to see if any filesystem is mounted on the @@ -193,8 +193,8 @@ VFS remain in RCU-walk mode, but can only tell it to get out of RCU-walk mode by returning `-ECHILD`. So `d_manage()`, when called with `rcu_walk` set, should either return --ECHILD if there is any reason to believe it is unsafe to end the -mounted filesystem, and otherwise should return 0. +-ECHILD if there is any reason to believe it is unsafe to enter the +mounted filesystem, otherwise it should return 0. autofs will return `-ECHILD` if an expiry of the filesystem has been initiated or is being considered, otherwise it returns 0. @@ -210,7 +210,7 @@ mounts that were created by `d_automount()` returning a filesystem to be mounted. As autofs doesn't return such a filesystem but leaves the mounting to the automount daemon, it must involve the automount daemon in unmounting as well. This also means that autofs has more control -of expiry. +over expiry. The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to the `umount` system call. Unmounting with MNT_EXPIRE will fail unless @@ -225,7 +225,7 @@ unmount any filesystems mounted on the autofs filesystem or remove any symbolic links or empty directories any time it likes. If the unmount or removal is successful the filesystem will be returned to the state it was before the mount or creation, so that any access of the name -will trigger normal auto-mount processing. In particlar, `rmdir` and +will trigger normal auto-mount processing. In particular, `rmdir` and `unlink` do not leave negative entries in the dcache as a normal filesystem would, so an attempt to access a recently-removed object is passed to autofs for handling. @@ -240,11 +240,18 @@ Normally the daemon only wants to remove entries which haven't been used for a while. For this purpose autofs maintains a "`last_used`" time stamp on each directory or symlink. For symlinks it genuinely does record the last time the symlink was "used" or followed to find -out where it points to. For directories the field is a slight -misnomer. It actually records the last time that autofs checked if -the directory or one of its descendents was busy and found that it -was. This is just as useful and doesn't require updating the field so -often. +out where it points to. For directories the field is used slightly +differently. The field is updated at mount time and during expire +checks if it is found to be in use (ie. open file descriptor or +process working directory) and during path walks. The update done +during path walks prevents frequent expire and immediate mount of +frequently accessed automounts. But in the case where a GUI continually +access or an application frequently scans an autofs directory tree +there can be an accumulation of mounts that aren't actually being +used. To cater for this case the "`strictexpire`" autofs mount option +can be used to avoid the "`last_used`" update on path walk thereby +preventing this apparent inability to expire mounts that aren't +really in use. The daemon is able to ask autofs if anything is due to be expired, using an `ioctl` as discussed later. For a *direct* mount, autofs @@ -255,8 +262,12 @@ up. There is an option with indirect mounts to consider each of the leaves that has been mounted on instead of considering the top-level names. -This is intended for compatability with version 4 of autofs and should -be considered as deprecated. +This was originally intended for compatibility with version 4 of autofs +and should be considered as deprecated for Sun Format automount maps. +However, it may be used again for amd format mount maps (which are +generally indirect maps) because the amd automounter allows for the +setting of an expire timeout for individual mounts. But there are +some difficulties in making the needed changes for this. When autofs considers a directory it checks the `last_used` time and compares it with the "timeout" value set when the filesystem was @@ -273,7 +284,7 @@ mounts. If it finds something in the root directory to expire it will return the name of that thing. Once a name has been returned the automount daemon needs to unmount any filesystems mounted below the name normally. As described above, this is unsafe for non-toplevel -mounts in a version-5 autofs. For this reason the current `automountd` +mounts in a version-5 autofs. For this reason the current `automount(8)` does not use this ioctl. The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or @@ -345,7 +356,7 @@ The `wait_queue_token` is a unique number which can identify a particular request to be acknowledged. When a message is sent over the pipe the affected dentry is marked as either "active" or "expiring" and other accesses to it block until the message is -acknowledged using one of the ioctls below and the relevant +acknowledged using one of the ioctls below with the relevant `wait_queue_token`. Communicating with autofs: root directory ioctls @@ -367,15 +378,14 @@ The available ioctl commands are: This mode is also entered if a write to the pipe fails. - **AUTOFS_IOC_PROTOVER**: This returns the protocol version in use. - **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which - is really a version number for the implementation. It is - currently 2. + is really a version number for the implementation. - **AUTOFS_IOC_SETTIMEOUT**: This passes a pointer to an unsigned long. The value is used to set the timeout for expiry, and the current timeout value is stored back through the pointer. - **AUTOFS_IOC_ASKUMOUNT**: Returns, in the pointed-to `int`, 1 if the filesystem could be unmounted. This is only a hint as the situation could change at any instant. This call can be - use to avoid a more expensive full unmount attempt. + used to avoid a more expensive full unmount attempt. - **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is anything suitable to expire. A pointer to a packet: @@ -400,6 +410,11 @@ The available ioctl commands are: **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored and objects are expired if the are not in use. + **AUTOFS_EXP_FORCED** causes the in use status to be ignored + and objects are expired ieven if they are in use. This assumes + that the daemon has requested this because it is capable of + performing the umount. + **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level name to expire. This is only safe when *maxproto* is 4. @@ -415,7 +430,7 @@ which can be used to communicate directly with the autofs filesystem. It requires CAP_SYS_ADMIN for access. The `ioctl`s that can be used on this device are described in a separate -document `autofs-mount-control.txt`, and are summarized briefly here. +document `autofs-mount-control.txt`, and are summarised briefly here. Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure: struct autofs_dev_ioctl { @@ -511,6 +526,21 @@ directories. Catatonic mode can only be left via the **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`. +The "ignore" mount option +------------------------- + +The "ignore" mount option can be used to provide a generic indicator +to applications that the mount entry should be ignored when displaying +mount information. + +In other OSes that provide autofs and that provide a mount list to user +space based on the kernel mount list a no-op mount option ("ignore" is +the one use on the most common OSes) is allowed so that autofs file +system users can optionally use it. + +This is intended to be used by user space programs to exclude autofs +mounts from consideration when reading the mounts list. + autofs, name spaces, and shared mounts -------------------------------------- |