summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/blockdev/zram.rst66
-rw-r--r--Documentation/admin-guide/cgroup-v1/memory.rst32
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst43
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt24
-rw-r--r--Documentation/admin-guide/mm/damon/start.rst4
-rw-r--r--Documentation/admin-guide/mm/damon/usage.rst8
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst64
7 files changed, 196 insertions, 45 deletions
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 091e8bb38887..678d70d6e1c3 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -102,17 +102,41 @@ Examples::
#select lzo compression algorithm
echo lzo > /sys/block/zram0/comp_algorithm
-For the time being, the `comp_algorithm` content does not necessarily
-show every compression algorithm supported by the kernel. We keep this
-list primarily to simplify device configuration and one can configure
-a new device with a compression algorithm that is not listed in
-`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
-and, if some of the algorithms were built as modules, it's impossible
-to list all of them using, for instance, /proc/crypto or any other
-method. This, however, has an advantage of permitting the usage of
-custom crypto compression modules (implementing S/W or H/W compression).
-
-4) Set Disksize
+For the time being, the `comp_algorithm` content shows only compression
+algorithms that are supported by zram.
+
+4) Set compression algorithm parameters: Optional
+=================================================
+
+Compression algorithms may support specific parameters which can be
+tweaked for particular dataset. ZRAM has an `algorithm_params` device
+attribute which provides a per-algorithm params configuration.
+
+For example, several compression algorithms support `level` parameter.
+In addition, certain compression algorithms support pre-trained dictionaries,
+which significantly change algorithms' characteristics. In order to configure
+compression algorithm to use external pre-trained dictionary, pass full
+path to the `dict` along with other parameters::
+
+ #pass path to pre-trained zstd dictionary
+ echo "algo=zstd dict=/etc/dictioary" > /sys/block/zram0/algorithm_params
+
+ #same, but using algorithm priority
+ echo "priority=1 dict=/etc/dictioary" > \
+ /sys/block/zram0/algorithm_params
+
+ #pass path to pre-trained zstd dictionary and compression level
+ echo "algo=zstd level=8 dict=/etc/dictioary" > \
+ /sys/block/zram0/algorithm_params
+
+Parameters are algorithm specific: not all algorithms support pre-trained
+dictionaries, not all algorithms support `level`. Furthermore, for certain
+algorithms `level` controls the compression level (the higher the value the
+better the compression ratio, it even can take negatives values for some
+algorithms), for other algorithms `level` is acceleration level (the higher
+the value the lower the compression ratio).
+
+5) Set Disksize
===============
Set disk size by writing the value to sysfs node 'disksize'.
@@ -132,7 +156,7 @@ There is little point creating a zram of greater than twice the size of memory
since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
size of the disk when not in use so a huge zram is wasteful.
-5) Set memory limit: Optional
+6) Set memory limit: Optional
=============================
Set memory limit by writing the value to sysfs node 'mem_limit'.
@@ -151,7 +175,7 @@ Examples::
# To disable memory limit
echo 0 > /sys/block/zram0/mem_limit
-6) Activate
+7) Activate
===========
::
@@ -162,7 +186,7 @@ Examples::
mkfs.ext4 /dev/zram1
mount /dev/zram1 /tmp
-7) Add/remove zram devices
+8) Add/remove zram devices
==========================
zram provides a control interface, which enables dynamic (on-demand) device
@@ -182,7 +206,7 @@ execute::
echo X > /sys/class/zram-control/hot_remove
-8) Stats
+9) Stats
========
Per-device statistics are exported as various nodes under /sys/block/zram<id>/
@@ -205,6 +229,7 @@ writeback_limit_enable RW show and set writeback_limit feature
max_comp_streams RW the number of possible concurrent compress
operations
comp_algorithm RW show and change the compression algorithm
+algorithm_params WO setup compression algorithm parameters
compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes
backing_dev RW set up backend storage for zram to write out
@@ -283,15 +308,15 @@ a single line of text and contains the following stats separated by whitespace:
Unit: 4K bytes
============== =============================================================
-9) Deactivate
-=============
+10) Deactivate
+==============
::
swapoff /dev/zram0
umount /dev/zram1
-10) Reset
+11) Reset
=========
Write any positive value to 'reset' sysfs node::
@@ -487,11 +512,14 @@ registered compression algorithms, increases our chances of finding the
algorithm that successfully compresses a particular page. Sometimes, however,
it is convenient (and sometimes even necessary) to limit recompression to
only one particular algorithm so that it will not try any other algorithms.
-This can be achieved by providing a algo=NAME parameter:::
+This can be achieved by providing a `algo` or `priority` parameter:::
#use zstd algorithm only (if registered)
echo "type=huge algo=zstd" > /sys/block/zramX/recompress
+ #use zstd algorithm only (if zstd was registered under priority 1)
+ echo "type=huge priority=1" > /sys/block/zramX/recompress
+
memory tracking
===============
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 9cde26d33843..270501db9f4e 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -78,18 +78,24 @@ Brief summary of control files.
memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded
memory.soft_limit_in_bytes set/show soft limit of memory usage
This knob is not available on CONFIG_PREEMPT_RT systems.
+ This knob is deprecated and shouldn't be
+ used.
memory.stat show various statistics
memory.use_hierarchy set/show hierarchical account enabled
This knob is deprecated and shouldn't be
used.
memory.force_empty trigger forced page reclaim
memory.pressure_level set memory pressure notifications
+ This knob is deprecated and shouldn't be
+ used.
memory.swappiness set/show swappiness parameter of vmscan
(See sysctl's vm.swappiness)
memory.move_charge_at_immigrate set/show controls of moving charges
This knob is deprecated and shouldn't be
used.
memory.oom_control set/show oom controls.
+ This knob is deprecated and shouldn't be
+ used.
memory.numa_stat show the number of memory usage per numa
node
memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
@@ -105,10 +111,18 @@ Brief summary of control files.
memory.kmem.max_usage_in_bytes show max kernel memory usage recorded
memory.kmem.tcp.limit_in_bytes set/show hard limit for tcp buf memory
+ This knob is deprecated and shouldn't be
+ used.
memory.kmem.tcp.usage_in_bytes show current tcp buf memory allocation
+ This knob is deprecated and shouldn't be
+ used.
memory.kmem.tcp.failcnt show the number of tcp buf memory usage
hits limits
+ This knob is deprecated and shouldn't be
+ used.
memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded
+ This knob is deprecated and shouldn't be
+ used.
==================================== ==========================================
1. History
@@ -693,8 +707,10 @@ For compatibility reasons writing 1 to memory.use_hierarchy will always pass::
# echo 1 > memory.use_hierarchy
-7. Soft limits
-==============
+7. Soft limits (DEPRECATED)
+===========================
+
+THIS IS DEPRECATED!
Soft limits allow for greater sharing of memory. The idea behind soft limits
is to allow control groups to use as much of the memory as needed, provided
@@ -834,8 +850,10 @@ It's applicable for root and non-root cgroup.
.. _cgroup-v1-memory-oom-control:
-10. OOM Control
-===============
+10. OOM Control (DEPRECATED)
+============================
+
+THIS IS DEPRECATED!
memory.oom_control file is for OOM notification and other controls.
@@ -882,8 +900,10 @@ At reading, current status of OOM is shown.
The number of processes belonging to this cgroup killed by any
kind of OOM killer.
-11. Memory Pressure
-===================
+11. Memory Pressure (DEPRECATED)
+================================
+
+THIS IS DEPRECATED!
The pressure level notifications can be used to monitor the memory
allocation cost; based on the pressure, applications can implement
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index c65756afd372..69af2173555f 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1343,11 +1343,14 @@ The following nested keys are defined.
all the existing limitations and potential future extensions.
memory.peak
- A read-only single value file which exists on non-root
- cgroups.
+ A read-write single value file which exists on non-root cgroups.
+
+ The max memory usage recorded for the cgroup and its descendants since
+ either the creation of the cgroup or the most recent reset for that FD.
- The max memory usage recorded for the cgroup and its
- descendants since the creation of the cgroup.
+ A write of any non-empty string to this file resets it to the
+ current memory usage for subsequent reads through the same
+ file descriptor.
memory.oom.group
A read-write single value file which exists on non-root
@@ -1624,6 +1627,25 @@ The following nested keys are defined.
Usually because failed to allocate some continuous swap space
for the huge page.
+ numa_pages_migrated (npn)
+ Number of pages migrated by NUMA balancing.
+
+ numa_pte_updates (npn)
+ Number of pages whose page table entries are modified by
+ NUMA balancing to produce NUMA hinting faults on access.
+
+ numa_hint_faults (npn)
+ Number of NUMA hinting faults.
+
+ pgdemote_kswapd
+ Number of pages demoted by kswapd.
+
+ pgdemote_direct
+ Number of pages demoted directly.
+
+ pgdemote_khugepaged
+ Number of pages demoted by khugepaged.
+
memory.numa_stat
A read-only nested-keyed file which exists on non-root cgroups.
@@ -1673,11 +1695,14 @@ The following nested keys are defined.
Healthy workloads are not expected to reach this limit.
memory.swap.peak
- A read-only single value file which exists on non-root
- cgroups.
+ A read-write single value file which exists on non-root cgroups.
+
+ The max swap usage recorded for the cgroup and its descendants since
+ the creation of the cgroup or the most recent reset for that FD.
- The max swap usage recorded for the cgroup and its
- descendants since the creation of the cgroup.
+ A write of any non-empty string to this file resets it to the
+ current memory usage for subsequent reads through the same
+ file descriptor.
memory.swap.max
A read-write single value file which exists on non-root
@@ -1741,6 +1766,8 @@ The following nested keys are defined.
Note that this is subtly different from setting memory.swap.max to
0, as it still allows for pages to be written to the zswap pool.
+ This setting has no effect if zswap is disabled, and swapping
+ is allowed unless memory.swap.max is set to 0.
memory.pressure
A read-only nested-keyed file.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8337d0fed311..19b71ff1168e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4152,6 +4152,21 @@
Disable NUMA, Only set up a single NUMA node
spanning all memory.
+ numa=fake=<size>[MG]
+ [KNL, ARM64, RISCV, X86, EARLY]
+ If given as a memory unit, fills all system RAM with
+ nodes of size interleaved over physical nodes.
+
+ numa=fake=<N>
+ [KNL, ARM64, RISCV, X86, EARLY]
+ If given as an integer, fills all system RAM with N
+ fake nodes interleaved over physical nodes.
+
+ numa=fake=<N>U
+ [KNL, ARM64, RISCV, X86, EARLY]
+ If given as an integer followed by 'U', it will
+ divide each physical node into N emulated nodes.
+
numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic
NUMA balancing.
Allowed values are enable and disable
@@ -6655,6 +6670,15 @@
<deci-seconds>: poll all this frequency
0: no polling (default)
+ thp_anon= [KNL]
+ Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
+ state is one of "always", "madvise", "never" or "inherit".
+ Control the default behavior of the system with respect
+ to anonymous transparent hugepages.
+ Can be used multiple times for multiple anon THP sizes.
+ See Documentation/admin-guide/mm/transhuge.rst for more
+ details.
+
threadirqs [KNL,EARLY]
Force threading of all interrupt handlers except those
marked explicitly IRQF_NO_THREAD.
diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst
index 054010a7f3d8..c4dddf6733cd 100644
--- a/Documentation/admin-guide/mm/damon/start.rst
+++ b/Documentation/admin-guide/mm/damon/start.rst
@@ -7,7 +7,7 @@ Getting Started
This document briefly describes how you can use DAMON by demonstrating its
default user space tool. Please note that this document describes only a part
of its features for brevity. Please refer to the usage `doc
-<https://github.com/awslabs/damo/blob/next/USAGE.md>`_ of the tool for more
+<https://github.com/damonitor/damo/blob/next/USAGE.md>`_ of the tool for more
details.
@@ -26,7 +26,7 @@ User Space Tool
For the demonstration, we will use the default user space tool for DAMON,
called DAMON Operator (DAMO). It is available at
-https://github.com/awslabs/damo. The examples below assume that ``damo`` is on
+https://github.com/damonitor/damo. The examples below assume that ``damo`` is on
your ``$PATH``. It's not mandatory, though.
Because DAMO is using the sysfs interface (refer to :doc:`usage` for the
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index 26df6cfa4441..d9be9f7caa7d 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -7,19 +7,19 @@ Detailed Usages
DAMON provides below interfaces for different users.
- *DAMON user space tool.*
- `This <https://github.com/awslabs/damo>`_ is for privileged people such as
+ `This <https://github.com/damonitor/damo>`_ is for privileged people such as
system administrators who want a just-working human-friendly interface.
Using this, users can use the DAMON’s major features in a human-friendly way.
It may not be highly tuned for special cases, though. For more detail,
please refer to its `usage document
- <https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
+ <https://github.com/damonitor/damo/blob/next/USAGE.md>`_.
- *sysfs interface.*
:ref:`This <sysfs_interface>` is for privileged user space programmers who
want more optimized use of DAMON. Using this, users can use DAMON’s major
features by reading from and writing to special sysfs files. Therefore,
you can write and use your personalized DAMON sysfs wrapper programs that
reads/writes the sysfs files instead of you. The `DAMON user space tool
- <https://github.com/awslabs/damo>`_ is one example of such programs.
+ <https://github.com/damonitor/damo>`_ is one example of such programs.
- *Kernel Space Programming Interface.*
:doc:`This </mm/damon/api>` is for kernel space programmers. Using this,
users can utilize every feature of DAMON most flexibly and efficiently by
@@ -543,7 +543,7 @@ memory rate becomes larger than 60%, or lower than 30%". ::
# echo 300 > watermarks/low
Please note that it's highly recommended to use user space tools like `damo
-<https://github.com/awslabs/damo>`_ rather than manually reading and writing
+<https://github.com/damonitor/damo>`_ rather than manually reading and writing
the files as above. Above is only for an example.
.. _tracepoint:
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 058485daf186..cfdd16a52e39 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -202,6 +202,16 @@ PMD-mappable transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
+All THPs at fault and collapse time will be added to _deferred_list,
+and will therefore be split under memory presure if they are considered
+"underused". A THP is underused if the number of zero-filled pages in
+the THP is above max_ptes_none (see below). It is possible to disable
+this behaviour by writing 0 to shrink_underused, and enable it by writing
+1 to it::
+
+ echo 0 > /sys/kernel/mm/transparent_hugepage/shrink_underused
+ echo 1 > /sys/kernel/mm/transparent_hugepage/shrink_underused
+
khugepaged will be automatically started when PMD-sized THP is enabled
(either of the per-size anon control or the top-level control are set
to "always" or "madvise"), and it'll be automatically shutdown when
@@ -284,13 +294,37 @@ that THP is shared. Exceeding the number would block the collapse::
A higher value may increase memory footprint for some workloads.
-Boot parameter
-==============
+Boot parameters
+===============
+
+You can change the sysfs boot time default for the top-level "enabled"
+control by passing the parameter ``transparent_hugepage=always`` or
+``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
+kernel command line.
+
+Alternatively, each supported anonymous THP size can be controlled by
+passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
+where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
+supported anonymous THP) and ``<state>`` is one of ``always``, ``madvise``,
+``never`` or ``inherit``.
+
+For example, the following will set 16K, 32K, 64K THP to ``always``,
+set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
+to ``never``::
-You can change the sysfs boot time defaults of Transparent Hugepage
-Support by passing the parameter ``transparent_hugepage=always`` or
-``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
-to the kernel command line.
+ thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
+
+``thp_anon=`` may be specified multiple times to configure all THP sizes as
+required. If ``thp_anon=`` is specified at least once, any anon THP sizes
+not explicitly configured on the command line are implicitly set to
+``never``.
+
+``transparent_hugepage`` setting only affects the global toggle. If
+``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
+However, if a valid ``thp_anon`` setting is provided by the user, the
+PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
+is not defined within a valid ``thp_anon``, its policy will default to
+``never``.
Hugepages in tmpfs/shmem
========================
@@ -447,6 +481,12 @@ thp_deferred_split_page
splitting it would free up some memory. Pages on split queue are
going to be split under memory pressure.
+thp_underused_split_page
+ is incremented when a huge page on the split queue was split
+ because it was underused. A THP is underused if the number of
+ zero pages in the THP is above a certain threshold
+ (/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none).
+
thp_split_pmd
is incremented every time a PMD split into table of PTEs.
This can happen, for instance, when application calls mprotect() or
@@ -527,6 +567,18 @@ split_deferred
it would free up some memory. Pages on split queue are going to
be split under memory pressure, if splitting is possible.
+nr_anon
+ the number of anonymous THP we have in the whole system. These THPs
+ might be currently entirely mapped or have partially unmapped/unused
+ subpages.
+
+nr_anon_partially_mapped
+ the number of anonymous THP which are likely partially mapped, possibly
+ wasting memory, and have been queued for deferred memory reclamation.
+ Note that in corner some cases (e.g., failed migration), we might detect
+ an anonymous THP as "partially mapped" and count it here, even though it
+ is not actually partially mapped anymore.
+
As the system ages, allocating huge pages may be expensive as the
system uses memory compaction to copy data around memory to free a
huge page for use. There are some counters in ``/proc/vmstat`` to help