diff options
Diffstat (limited to 'Documentation/cgroup-v2.txt')
-rw-r--r-- | Documentation/cgroup-v2.txt | 77 |
1 files changed, 68 insertions, 9 deletions
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 2cddab7efb20..74cdeaed9f7a 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -53,10 +53,14 @@ v1 is available under Documentation/cgroup-v1/. 5-3-2. Writeback 5-4. PID 5-4-1. PID Interface Files - 5-5. RDMA - 5-5-1. RDMA Interface Files - 5-6. Misc - 5-6-1. perf_event + 5-5. Device + 5-6. RDMA + 5-6-1. RDMA Interface Files + 5-7. Misc + 5-7-1. perf_event + 5-N. Non-normative information + 5-N-1. CPU controller root cgroup process behaviour + 5-N-2. IO controller root cgroup process behaviour 6. Namespace 6-1. Basics 6-2. The Root and Views @@ -279,7 +283,7 @@ thread mode, the following conditions must be met. exempt from this requirement. Topology-wise, a cgroup can be in an invalid state. Please consider -the following toplogy:: +the following topology:: A (threaded domain) - B (threaded) - C (domain, just created) @@ -420,7 +424,9 @@ The root cgroup is exempt from this restriction. Root contains processes and anonymous resource consumption which can't be associated with any other cgroups and requires special treatment from most controllers. How resource consumption in the root cgroup is governed -is up to each controller. +is up to each controller (for more information on this topic please +refer to the Non-normative information section in the Controllers +chapter). Note that the restriction doesn't get in the way if there is no enabled controller in the cgroup's "cgroup.subtree_control". This is @@ -1063,10 +1069,10 @@ PAGE_SIZE multiple when read back. reached the limit and allocation was about to fail. Depending on context result could be invocation of OOM - killer and retrying allocation or failing alloction. + killer and retrying allocation or failing allocation. Failed allocation in its turn could be returned into - userspace as -ENOMEM or siletly ignored in cases like + userspace as -ENOMEM or silently ignored in cases like disk readahead. For now OOM in memory cgroup kills tasks iff shortage has happened inside page fault. @@ -1191,7 +1197,7 @@ PAGE_SIZE multiple when read back. cgroups. The default is "max". Swap usage hard limit. If a cgroup's swap usage reaches this - limit, anonymous meomry of the cgroup will not be swapped out. + limit, anonymous memory of the cgroup will not be swapped out. Usage Guidelines @@ -1429,6 +1435,30 @@ through fork() or clone(). These will return -EAGAIN if the creation of a new process would cause a cgroup policy to be violated. +Device controller +----------------- + +Device controller manages access to device files. It includes both +creation of new device files (using mknod), and access to the +existing device files. + +Cgroup v2 device controller has no interface files and is implemented +on top of cgroup BPF. To control access to device files, a user may +create bpf programs of the BPF_CGROUP_DEVICE type and attach them +to cgroups. On an attempt to access a device file, corresponding +BPF programs will be executed, and depending on the return value +the attempt will succeed or fail with -EPERM. + +A BPF_CGROUP_DEVICE program takes a pointer to the bpf_cgroup_dev_ctx +structure, which describes the device access attempt: access type +(mknod/read/write) and device (type, major and minor numbers). +If the program returns 0, the attempt fails with -EPERM, otherwise +it succeeds. + +An example of BPF_CGROUP_DEVICE program may be found in the kernel +source tree in the tools/testing/selftests/bpf/dev_cgroup.c file. + + RDMA ---- @@ -1481,6 +1511,35 @@ always be filtered by cgroup v2 path. The controller can still be moved to a legacy hierarchy after v2 hierarchy is populated. +Non-normative information +------------------------- + +This section contains information that isn't considered to be a part of +the stable kernel API and so is subject to change. + + +CPU controller root cgroup process behaviour +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When distributing CPU cycles in the root cgroup each thread in this +cgroup is treated as if it was hosted in a separate child cgroup of the +root cgroup. This child cgroup weight is dependent on its thread nice +level. + +For details of this mapping see sched_prio_to_weight array in +kernel/sched/core.c file (values from this array should be scaled +appropriately so the neutral - nice 0 - value is 100 instead of 1024). + + +IO controller root cgroup process behaviour +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Root cgroup processes are hosted in an implicit leaf child node. +When distributing IO resources this implicit child node is taken into +account as if it was a normal child cgroup of the root cgroup with a +weight value of 200. + + Namespace ========= |