From 760df93ecdd59fc1c213a491b5adee79f53606da Mon Sep 17 00:00:00 2001 From: Shen Feng Date: Thu, 2 Apr 2009 16:57:20 -0700 Subject: documentation: update Documentation/filesystem/proc.txt and Documentation/sysctls Now /proc/sys is described in many places and much information is redundant. This patch updates the proc.txt and move the /proc/sys desciption out to the files in Documentation/sysctls. Details are: merge - 2.1 /proc/sys/fs - File system data - 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem - 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface with Documentation/sysctls/fs.txt. remove - 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats since it's not better then the Documentation/binfmt_misc.txt. merge - 2.3 /proc/sys/kernel - general kernel parameters with Documentation/sysctls/kernel.txt remove - 2.5 /proc/sys/dev - Device specific parameters since it's obsolete the sysfs is used now. remove - 2.6 /proc/sys/sunrpc - Remote procedure calls since it's not better then the Documentation/sysctls/sunrpc.txt move - 2.7 /proc/sys/net - Networking stuff - 2.9 Appletalk - 2.10 IPX to newly created Documentation/sysctls/net.txt. remove - 2.8 /proc/sys/net/ipv4 - IPV4 settings since it's not better then the Documentation/networking/ip-sysctl.txt. add - Chapter 3 Per-Process Parameters to descibe /proc//xxx parameters. Signed-off-by: Shen Feng Cc: Randy Dunlap Cc: "David S. Miller" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/00-INDEX | 2 + Documentation/sysctl/fs.txt | 74 ++++++++++++++++- Documentation/sysctl/kernel.txt | 53 ++++++++++++ Documentation/sysctl/net.txt | 174 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 301 insertions(+), 2 deletions(-) create mode 100644 Documentation/sysctl/net.txt (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/00-INDEX b/Documentation/sysctl/00-INDEX index a20a9066dc4c..1286f455992f 100644 --- a/Documentation/sysctl/00-INDEX +++ b/Documentation/sysctl/00-INDEX @@ -10,6 +10,8 @@ fs.txt - documentation for /proc/sys/fs/*. kernel.txt - documentation for /proc/sys/kernel/*. +net.txt + - documentation for /proc/sys/net/*. sunrpc.txt - documentation for /proc/sys/sunrpc/*. vm.txt diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index f99254327ae5..1458448436cc 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -1,5 +1,6 @@ Documentation for /proc/sys/fs/* kernel version 2.2.10 (c) 1998, 1999, Rik van Riel + (c) 2009, Shen Feng For general info and legal blurb, please look in README. @@ -14,7 +15,12 @@ kernel. Since some of the files _can_ be used to screw up your system, it is advisable to read both documentation and source before actually making adjustments. +1. /proc/sys/fs +---------------------------------------------------------- + Currently, these files are in /proc/sys/fs: +- aio-max-nr +- aio-nr - dentry-state - dquot-max - dquot-nr @@ -30,8 +36,15 @@ Currently, these files are in /proc/sys/fs: - super-max - super-nr -Documentation for the files in /proc/sys/fs/binfmt_misc is -in Documentation/binfmt_misc.txt. +============================================================== + +aio-nr & aio-max-nr: + +aio-nr is the running total of the number of events specified on the +io_setup system call for all currently active aio contexts. If aio-nr +reaches aio-max-nr then io_setup will fail with EAGAIN. Note that +raising aio-max-nr does not result in the pre-allocation or re-sizing +of any kernel data structures. ============================================================== @@ -178,3 +191,60 @@ requests. aio-max-nr allows you to change the maximum value aio-nr can grow to. ============================================================== + + +2. /proc/sys/fs/binfmt_misc +---------------------------------------------------------- + +Documentation for the files in /proc/sys/fs/binfmt_misc is +in Documentation/binfmt_misc.txt. + + +3. /proc/sys/fs/mqueue - POSIX message queues filesystem +---------------------------------------------------------- + +The "mqueue" filesystem provides the necessary kernel features to enable the +creation of a user space library that implements the POSIX message queues +API (as noted by the MSG tag in the POSIX 1003.1-2001 version of the System +Interfaces specification.) + +The "mqueue" filesystem contains values for determining/setting the amount of +resources used by the file system. + +/proc/sys/fs/mqueue/queues_max is a read/write file for setting/getting the +maximum number of message queues allowed on the system. + +/proc/sys/fs/mqueue/msg_max is a read/write file for setting/getting the +maximum number of messages in a queue value. In fact it is the limiting value +for another (user) limit which is set in mq_open invocation. This attribute of +a queue must be less or equal then msg_max. + +/proc/sys/fs/mqueue/msgsize_max is a read/write file for setting/getting the +maximum message size value (it is every message queue's attribute set during +its creation). + + +4. /proc/sys/fs/epoll - Configuration options for the epoll interface +-------------------------------------------------------- + +This directory contains configuration options for the epoll(7) interface. + +max_user_instances +------------------ + +This is the maximum number of epoll file descriptors that a single user can +have open at a given time. The default value is 128, and should be enough +for normal users. + +max_user_watches +---------------- + +Every epoll file descriptor can store a number of files to be monitored +for event readiness. Each one of these monitored files constitutes a "watch". +This configuration option sets the maximum number of "watches" that are +allowed for each user. +Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes +on a 64bit one. +The current default value for max_user_watches is the 1/32 of the available +low memory, divided for the "watch" cost in bytes. + diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index a4ccdd1981cf..f11ca7979fa6 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -1,5 +1,6 @@ Documentation for /proc/sys/kernel/* kernel version 2.2.10 (c) 1998, 1999, Rik van Riel + (c) 2009, Shen Feng For general info and legal blurb, please look in README. @@ -18,6 +19,7 @@ Currently, these files might (depending on your configuration) show up in /proc/sys/kernel: - acpi_video_flags - acct +- auto_msgmni - core_pattern - core_uses_pid - ctrl-alt-del @@ -33,6 +35,7 @@ show up in /proc/sys/kernel: - msgmax - msgmnb - msgmni +- nmi_watchdog - osrelease - ostype - overflowgid @@ -40,6 +43,7 @@ show up in /proc/sys/kernel: - panic - pid_max - powersave-nap [ PPC only ] +- panic_on_unrecovered_nmi - printk - randomize_va_space - real-root-dev ==> Documentation/initrd.txt @@ -55,6 +59,7 @@ show up in /proc/sys/kernel: - sysrq ==> Documentation/sysrq.txt - tainted - threads-max +- unknown_nmi_panic - version ============================================================== @@ -381,3 +386,51 @@ can be ORed together: 512 - A kernel warning has occurred. 1024 - A module from drivers/staging was loaded. +============================================================== + +auto_msgmni: + +Enables/Disables automatic recomputing of msgmni upon memory add/remove or +upon ipc namespace creation/removal (see the msgmni description above). +Echoing "1" into this file enables msgmni automatic recomputing. +Echoing "0" turns it off. +auto_msgmni default value is 1. + +============================================================== + +nmi_watchdog: + +Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero +the NMI watchdog is enabled and will continuously test all online cpus to +determine whether or not they are still functioning properly. Currently, +passing "nmi_watchdog=" parameter at boot time is required for this function +to work. + +If LAPIC NMI watchdog method is in use (nmi_watchdog=2 kernel parameter), the +NMI watchdog shares registers with oprofile. By disabling the NMI watchdog, +oprofile may have more registers to utilize. + +============================================================== + +unknown_nmi_panic: + +The value in this file affects behavior of handling NMI. When the value is +non-zero, unknown NMI is trapped and then panic occurs. At that time, kernel +debugging information is displayed on console. + +NMI switch that most IA32 servers have fires unknown NMI up, for example. +If a system hangs up, try pressing the NMI switch. + +============================================================== + +panic_on_unrecovered_nmi: + +The default Linux behaviour on an NMI of either memory or unknown is to continue +operation. For many environments such as scientific computing it is preferable +that the box is taken out and the error dealt with than an uncorrected +parity/ECC error get propogated. + +A small number of systems do generate NMI's for bizarre random reasons such as +power management so the default is off. That sysctl works like the existing +panic controls already in that directory. + diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt new file mode 100644 index 000000000000..995c2257b3e7 --- /dev/null +++ b/Documentation/sysctl/net.txt @@ -0,0 +1,174 @@ +Documentation for /proc/sys/net/* kernel version 2.4.0-test11-pre4 + (c) 1999 Terrehon Bowden + Bodo Bauer + (c) 2000 Jorge Nerin + (c) 2009 Shen Feng + +For general info and legal blurb, please look in README. + +============================================================== + +This file contains the documentation for the sysctl files in +/proc/sys/net and is valid for Linux kernel version 2.4.0-test11-pre4. + +The interface to the networking parts of the kernel is located in +/proc/sys/net. The following table shows all possible subdirectories.You may +see only some of them, depending on your kernel's configuration. + + +Table : Subdirectories in /proc/sys/net +.............................................................................. + Directory Content Directory Content + core General parameter appletalk Appletalk protocol + unix Unix domain sockets netrom NET/ROM + 802 E802 protocol ax25 AX25 + ethernet Ethernet protocol rose X.25 PLP layer + ipv4 IP version 4 x25 X.25 protocol + ipx IPX token-ring IBM token ring + bridge Bridging decnet DEC net + ipv6 IP version 6 +.............................................................................. + +1. /proc/sys/net/core - Network core options +------------------------------------------------------- + +rmem_default +------------ + +The default setting of the socket receive buffer in bytes. + +rmem_max +-------- + +The maximum receive socket buffer size in bytes. + +wmem_default +------------ + +The default setting (in bytes) of the socket send buffer. + +wmem_max +-------- + +The maximum send socket buffer size in bytes. + +message_burst and message_cost +------------------------------ + +These parameters are used to limit the warning messages written to the kernel +log from the networking code. They enforce a rate limit to make a +denial-of-service attack impossible. A higher message_cost factor, results in +fewer messages that will be written. Message_burst controls when messages will +be dropped. The default settings limit warning messages to one every five +seconds. + +warnings +-------- + +This controls console messages from the networking stack that can occur because +of problems on the network like duplicate address or bad checksums. Normally, +this should be enabled, but if the problem persists the messages can be +disabled. + +netdev_budget +------------- + +Maximum number of packets taken from all interfaces in one polling cycle (NAPI +poll). In one polling cycle interfaces which are registered to polling are +probed in a round-robin manner. The limit of packets in one such probe can be +set per-device via sysfs class/net//weight . + +netdev_max_backlog +------------------ + +Maximum number of packets, queued on the INPUT side, when the interface +receives packets faster than kernel can process them. + +optmem_max +---------- + +Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence +of struct cmsghdr structures with appended data. + +2. /proc/sys/net/unix - Parameters for Unix domain sockets +------------------------------------------------------- + +There are only two files in this subdirectory. They control the delays for +deleting and destroying socket descriptors. + + +3. /proc/sys/net/ipv4 - IPV4 settings +------------------------------------------------------- +Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for +descriptions of these entries. + + +4. Appletalk +------------------------------------------------------- + +The /proc/sys/net/appletalk directory holds the Appletalk configuration data +when Appletalk is loaded. The configurable parameters are: + +aarp-expiry-time +---------------- + +The amount of time we keep an ARP entry before expiring it. Used to age out +old hosts. + +aarp-resolve-time +----------------- + +The amount of time we will spend trying to resolve an Appletalk address. + +aarp-retransmit-limit +--------------------- + +The number of times we will retransmit a query before giving up. + +aarp-tick-time +-------------- + +Controls the rate at which expires are checked. + +The directory /proc/net/appletalk holds the list of active Appletalk sockets +on a machine. + +The fields indicate the DDP type, the local address (in network:node format) +the remote address, the size of the transmit pending queue, the size of the +received queue (bytes waiting for applications to read) the state and the uid +owning the socket. + +/proc/net/atalk_iface lists all the interfaces configured for appletalk.It +shows the name of the interface, its Appletalk address, the network range on +that address (or network number for phase 1 networks), and the status of the +interface. + +/proc/net/atalk_route lists each known network route. It lists the target +(network) that the route leads to, the router (may be directly connected), the +route flags, and the device the route is using. + + +5. IPX +------------------------------------------------------- + +The IPX protocol has no tunable values in proc/sys/net. + +The IPX protocol does, however, provide proc/net/ipx. This lists each IPX +socket giving the local and remote addresses in Novell format (that is +network:node:port). In accordance with the strange Novell tradition, +everything but the port is in hex. Not_Connected is displayed for sockets that +are not tied to a specific remote address. The Tx and Rx queue sizes indicate +the number of bytes pending for transmission and reception. The state +indicates the state the socket is in and the uid is the owning uid of the +socket. + +The /proc/net/ipx_interface file lists all IPX interfaces. For each interface +it gives the network number, the node number, and indicates if the network is +the primary network. It also indicates which device it is bound to (or +Internal for internal networks) and the Frame Type if appropriate. Linux +supports 802.3, 802.2, 802.2 SNAP and DIX (Blue Book) ethernet framing for +IPX. + +The /proc/net/ipx_route table holds a list of IPX routes. For each route it +gives the destination network, the router node (or Directly) and the network +address of the router (or Connected) for internal networks. -- cgit v1.2.3 From 45dad7bd9d9b65a30d6e790b111f6f2d8f746d22 Mon Sep 17 00:00:00 2001 From: Li Xiaodong Date: Thu, 2 Apr 2009 16:57:21 -0700 Subject: documentation: fix unix_dgram_qlen description Previous description about system parameter in /proc/sys/net/unix/ is wrong (or missed). Simply add a new description about unix_dgram_qlen according to latest kernel. Signed-off-by: Li Xiaodong Cc: "David S. Miller" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/net.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index 995c2257b3e7..a34d55b65441 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -93,8 +93,9 @@ of struct cmsghdr structures with appended data. 2. /proc/sys/net/unix - Parameters for Unix domain sockets ------------------------------------------------------- -There are only two files in this subdirectory. They control the delays for -deleting and destroying socket descriptors. +There is only one file in this directory. +unix_dgram_qlen limits the max number of datagrams queued in Unix domain +socket's buffer. It will not take effect unless PF_UNIX flag is spicified. 3. /proc/sys/net/ipv4 - IPV4 settings -- cgit v1.2.3 From fafd688e4c0c34da0f3de909881117d374e4c7af Mon Sep 17 00:00:00 2001 From: Peter W Morreale Date: Mon, 6 Apr 2009 19:00:29 -0700 Subject: mm: add /proc controls for pdflush threads Add /proc entries to give the admin the ability to control the minimum and maximum number of pdflush threads. This allows finer control of pdflush on both large and small machines. The rationale is simply one size does not fit all. Admins on large and/or small systems may want to tune the min/max pdflush thread count to best suit their needs. Right now the min/max is hardcoded to 2/8. While probably a fair estimate for smaller machines, large machines with large numbers of CPUs and large numbers of filesystems/block devices may benefit from larger numbers of threads working on different block devices. Even if the background flushing algorithm is radically changed, it is still likely that multiple threads will be involved and admins would still desire finer control on the min/max other than to have to recompile the kernel. The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and '/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions. The minimum value for nr_pdflush_threads_min is 1 and the maximum value is the current value of nr_pdflush_threads_max. This minimum is required since additional thread creation is performed in a pdflush thread itself. The minimum value for nr_pdflush_threads_max is the current value of nr_pdflush_threads_min and the maximum value can be 1000. Documentation/sysctl/vm.txt is also updated. [akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly] Signed-off-by: Peter W Morreale Reviewed-by: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/vm.txt | 28 ++++++++++++++++++++++++++++ include/linux/writeback.h | 2 ++ kernel/sysctl.c | 23 +++++++++++++++++++++++ mm/pdflush.c | 31 +++++++++++++++++++------------ 4 files changed, 72 insertions(+), 12 deletions(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 3197fc83bc51..97c4b3284329 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -39,6 +39,8 @@ Currently, these files are in /proc/sys/vm: - nr_hugepages - nr_overcommit_hugepages - nr_pdflush_threads +- nr_pdflush_threads_min +- nr_pdflush_threads_max - nr_trim_pages (only if CONFIG_MMU=n) - numa_zonelist_order - oom_dump_tasks @@ -463,6 +465,32 @@ The default value is 0. ============================================================== +nr_pdflush_threads_min + +This value controls the minimum number of pdflush threads. + +At boot time, the kernel will create and maintain 'nr_pdflush_threads_min' +threads for the kernel's lifetime. + +The default value is 2. The minimum value you can specify is 1, and +the maximum value is the current setting of 'nr_pdflush_threads_max'. + +See 'nr_pdflush_threads_max' below for more information. + +============================================================== + +nr_pdflush_threads_max + +This value controls the maximum number of pdflush threads that can be +created. The pdflush algorithm will create a new pdflush thread (up to +this maximum) if no pdflush threads have been available for >= 1 second. + +The default value is 8. The minimum value you can specify is the +current value of 'nr_pdflush_threads_min' and the +maximum is 1000. + +============================================================== + overcommit_memory: This value contains a flag that enables memory overcommitment. diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 93445477f86a..9c1ed1fb6ddb 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -168,6 +168,8 @@ void writeback_set_ratelimit(void); /* pdflush.c */ extern int nr_pdflush_threads; /* Global so it can be exported to sysctl read-only. */ +extern int nr_pdflush_threads_max; /* Global so it can be exported to sysctl */ +extern int nr_pdflush_threads_min; /* Global so it can be exported to sysctl */ #endif /* WRITEBACK_H */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index b125e3387568..72eb1a41dcab 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -101,6 +101,7 @@ static int __maybe_unused one = 1; static int __maybe_unused two = 2; static unsigned long one_ul = 1; static int one_hundred = 100; +static int one_thousand = 1000; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -1026,6 +1027,28 @@ static struct ctl_table vm_table[] = { .mode = 0444 /* read-only*/, .proc_handler = &proc_dointvec, }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "nr_pdflush_threads_min", + .data = &nr_pdflush_threads_min, + .maxlen = sizeof nr_pdflush_threads_min, + .mode = 0644 /* read-write */, + .proc_handler = &proc_dointvec_minmax, + .strategy = &sysctl_intvec, + .extra1 = &one, + .extra2 = &nr_pdflush_threads_max, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "nr_pdflush_threads_max", + .data = &nr_pdflush_threads_max, + .maxlen = sizeof nr_pdflush_threads_max, + .mode = 0644 /* read-write */, + .proc_handler = &proc_dointvec_minmax, + .strategy = &sysctl_intvec, + .extra1 = &nr_pdflush_threads_min, + .extra2 = &one_thousand, + }, { .ctl_name = VM_SWAPPINESS, .procname = "swappiness", diff --git a/mm/pdflush.c b/mm/pdflush.c index 235ac440c44e..f2caf96993f8 100644 --- a/mm/pdflush.c +++ b/mm/pdflush.c @@ -57,6 +57,14 @@ static DEFINE_SPINLOCK(pdflush_lock); */ int nr_pdflush_threads = 0; +/* + * The max/min number of pdflush threads. R/W by sysctl at + * /proc/sys/vm/nr_pdflush_threads_max/min + */ +int nr_pdflush_threads_max __read_mostly = MAX_PDFLUSH_THREADS; +int nr_pdflush_threads_min __read_mostly = MIN_PDFLUSH_THREADS; + + /* * The time at which the pdflush thread pool last went empty */ @@ -68,7 +76,7 @@ static unsigned long last_empty_jifs; * Thread pool management algorithm: * * - The minimum and maximum number of pdflush instances are bound - * by MIN_PDFLUSH_THREADS and MAX_PDFLUSH_THREADS. + * by nr_pdflush_threads_min and nr_pdflush_threads_max. * * - If there have been no idle pdflush instances for 1 second, create * a new one. @@ -134,14 +142,13 @@ static int __pdflush(struct pdflush_work *my_work) * To throttle creation, we reset last_empty_jifs. */ if (time_after(jiffies, last_empty_jifs + 1 * HZ)) { - if (list_empty(&pdflush_list)) { - if (nr_pdflush_threads < MAX_PDFLUSH_THREADS) { - last_empty_jifs = jiffies; - nr_pdflush_threads++; - spin_unlock_irq(&pdflush_lock); - start_one_pdflush_thread(); - spin_lock_irq(&pdflush_lock); - } + if (list_empty(&pdflush_list) && + nr_pdflush_threads < nr_pdflush_threads_max) { + last_empty_jifs = jiffies; + nr_pdflush_threads++; + spin_unlock_irq(&pdflush_lock); + start_one_pdflush_thread(); + spin_lock_irq(&pdflush_lock); } } @@ -153,7 +160,7 @@ static int __pdflush(struct pdflush_work *my_work) */ if (list_empty(&pdflush_list)) continue; - if (nr_pdflush_threads <= MIN_PDFLUSH_THREADS) + if (nr_pdflush_threads <= nr_pdflush_threads_min) continue; pdf = list_entry(pdflush_list.prev, struct pdflush_work, list); if (time_after(jiffies, pdf->when_i_went_to_sleep + 1 * HZ)) { @@ -259,9 +266,9 @@ static int __init pdflush_init(void) * Pre-set nr_pdflush_threads... If we fail to create, * the count will be decremented. */ - nr_pdflush_threads = MIN_PDFLUSH_THREADS; + nr_pdflush_threads = nr_pdflush_threads_min; - for (i = 0; i < MIN_PDFLUSH_THREADS; i++) + for (i = 0; i < nr_pdflush_threads_min; i++) start_one_pdflush_thread(); return 0; } -- cgit v1.2.3 From ca8b9950298c84ca528a5943409a727c04ec88f8 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Mon, 13 Apr 2009 14:39:36 -0700 Subject: Documentation/sysctl/net.txt: fix a typo s/spicified/specified Signed-off-by: Li Zefan Cc: "David S. Miller" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/net.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index a34d55b65441..df38ef046f8d 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -95,7 +95,7 @@ of struct cmsghdr structures with appended data. There is only one file in this directory. unix_dgram_qlen limits the max number of datagrams queued in Unix domain -socket's buffer. It will not take effect unless PF_UNIX flag is spicified. +socket's buffer. It will not take effect unless PF_UNIX flag is specified. 3. /proc/sys/net/ipv4 - IPV4 settings -- cgit v1.2.3 From 9e4a5bda89034502fb144331e71a0efdfd5fae97 Mon Sep 17 00:00:00 2001 From: Andrea Righi Date: Thu, 30 Apr 2009 15:08:57 -0700 Subject: mm: prevent divide error for small values of vm_dirty_bytes Avoid setting less than two pages for vm_dirty_bytes: this is necessary to avoid potential division by 0 (like the following) in get_dirty_limits(). [ 49.951610] divide error: 0000 [#1] PREEMPT SMP [ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent [ 49.952195] CPU 1 [ 49.952195] Modules linked in: pcspkr [ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1 [ 49.952195] RIP: 0010:[] [] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202 [ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3 [ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000 [ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000 [ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001 [ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78 [ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000 [ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0 [ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250) [ 49.952195] Stack: [ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000 [ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218 [ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7 [ 49.952195] Call Trace: [ 49.952195] [] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0 [ 49.952195] [] ? ext3_writeback_write_end+0x9e/0x120 [ 49.952195] [] generic_file_buffered_write+0x12f/0x330 [ 49.952195] [] __generic_file_aio_write_nolock+0x26d/0x460 [ 49.952195] [] ? generic_file_aio_write+0x52/0xd0 [ 49.952195] [] generic_file_aio_write+0x69/0xd0 [ 49.952195] [] ext3_file_write+0x26/0xc0 [ 49.952195] [] do_sync_write+0xf1/0x140 [ 49.952195] [] ? get_lock_stats+0x2a/0x60 [ 49.952195] [] ? autoremove_wake_function+0x0/0x40 [ 49.952195] [] vfs_write+0xcb/0x190 [ 49.952195] [] sys_write+0x50/0x90 [ 49.952195] [] system_call_fastpath+0x16/0x1b [ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02 [ 49.952195] RIP [] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP [ 50.096523] ---[ end trace 008d7aa02f244d7b ]--- Signed-off-by: Andrea Righi Cc: Peter Zijlstra Cc: David Rientjes Cc: Dave Chinner Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/vm.txt | 4 ++++ kernel/sysctl.c | 5 ++++- 2 files changed, 8 insertions(+), 1 deletion(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 97c4b3284329..b716d33912d8 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -90,6 +90,10 @@ will itself start writeback. If dirty_bytes is written, dirty_ratio becomes a function of its value (dirty_bytes / the amount of dirtyable system memory). +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any +value lower than this limit will be ignored and the old configuration will be +retained. + ============================================================== dirty_expire_centisecs diff --git a/kernel/sysctl.c b/kernel/sysctl.c index e3d2c7dd59b9..ea78fa101ad6 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -103,6 +103,9 @@ static unsigned long one_ul = 1; static int one_hundred = 100; static int one_thousand = 1000; +/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */ +static unsigned long dirty_bytes_min = 2 * PAGE_SIZE; + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; @@ -1006,7 +1009,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = &dirty_bytes_handler, .strategy = &sysctl_intvec, - .extra1 = &one_ul, + .extra1 = &dirty_bytes_min, }, { .procname = "dirty_writeback_centisecs", -- cgit v1.2.3