summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2005-10-30[PATCH] semaphore: Remove __MUTEX_INITIALIZER()Arthur Othieno23-67/+0
__MUTEX_INITIALIZER() has no users, and equates to the more commonly used DECLARE_MUTEX(), thus making it pretty much redundant. Remove it for good. Signed-off-by: Arthur Othieno <a.othieno@bluewin.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] rocketport: make it work when statically linked into kernelBjorn Helgaas1-8/+2
The driver had incorrectly wrapped module_init(rp_init) in #ifdef MODULE, so it worked only when compiled as a module. Tested by Wolfgang Denk with this device: 00:0e.0 Communication controller: Comtrol Corporation RocketPort 8 port w/RJ11 connectors (rev 04) Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at 7000 [size=64] Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] posix-cpu-timers: fix overrun reportingRoland McGrath1-6/+10
This change corrects an omission in posix_cpu_timer_schedule, so that it correctly propagates the overrun calculation to where it will get reported to the user. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] RCU torture-testing kernel modulePaul E. McKenney7-1/+648
This patch is a rewrite of the one submitted on October 1st, using modules (http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2). This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an intense torture test of the RCU infratructure. This is needed due to the continued changes to the RCU infrastructure to accommodate dynamic ticks, CPU hotplug, realtime, and so on. Most of the code is in a separate file that is compiled only if the CONFIG variable is set. Documentation on how to run the test and interpret the output is also included. This code has been tested on i386 and ppc64, and an earlier version of the code has received extensive testing on a number of architectures as part of the PREEMPT_RT patchset. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fs/attr.c: remove BUG()Alexey Dobriyan1-3/+0
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] include/linux/kernel.h:BUILD_BUG_ON(): fix a commentNikita Danilov1-1/+1
Fix comment describing BUILD_BUG_ON: BUG_ON is not an assertion (unfortunately). Signed-off-by: Nikita Danilov <nikita@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] watchdog: update .owner field of struct pci_driverLaurent Riffard2-0/+2
This updates .owner field of struct pci_driver. This allows SYSFS to create the symlink from the driver to the module which provides it. Signed-off-by: Laurent Riffard <laurent.riffard@free.fr> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] SyncLink adapters: updates .owner field of struct pci_driverLaurent Riffard2-0/+2
This updates .owner field of struct pci_driver. This allows SYSFS to create the symlink from the driver to the module which provides it. Signed-off-by: Laurent Riffard <laurent.riffard@free.fr> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] epca: update .owner field of struct pci_driverLaurent Riffard1-0/+1
This updates .owner field of struct pci_driver. This allows SYSFS to create the symlink from the driver to the module which provides it. Signed-off-by: Laurent Riffard <laurent.riffard@free.fr> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fix vgacon blankingPozsar Balazs1-0/+2
This patch fixes a long-standing vgacon bug: characters with the bright bit set were left on the screen and not blacked out. All I did was that I lookuped up some examples on the net about setting the vga palette, and added the call missing from the linux kernel, but included in all other ones. It works for me. You can test this by writing something with the bright set to the console, for example: echo -e "\e[1;31mhello there\e[0m" and then wait for the console to blank itself (by default, after 10 mins of inactivity), maybe making it faster using setterm -blank 1 so you only have to wait 1 minute. Signed-off-by: Pozsar Balazs <pozsy@uhulinux.hu> Cc: James Simmons <jsimmons@infradead.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Test for sb_getblk return valueGlauber de Oliveira Costa3-1/+22
This patch adds tests for the return value of sb_getblk() in the ext2/3 filesystems. In fs/buffer.c it is stated that the getblk() function never fails. However, it does can return NULL in some situations due to I/O errors, which may lead us to NULL pointer dereferences Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fix nr_unused accounting, and avoid recursing in iput with ↵Andrea Arcangeli2-12/+17
I_WILL_FREE set list_move(&inode->i_list, &inode_in_use); } else { list_move(&inode->i_list, &inode_unused); + inodes_stat.nr_unused++; } } wake_up_inode(inode); Are you sure the above diff is correct? It was added somewhere between 2.6.5 and 2.6.8. I think it's wrong. The only way I can imagine the i_count to be zero in the above path, is that I_WILL_FREE is set. And if I_WILL_FREE is set, then we must not increase nr_unused. So I believe the above change is buggy and it will definitely overstate the number of unused inodes and it should be backed out. Note that __writeback_single_inode before calling __sync_single_inode, can drop the spinlock and we can have both the dirty and locked bitflags clear here: spin_unlock(&inode_lock); __wait_on_inode(inode); iput(inode); XXXXXXX spin_lock(&inode_lock); } use inode again here a construct like the above makes zero sense from a reference counting standpoint. Either we don't ever use the inode again after the iput, or the inode_lock should be taken _before_ executing the iput (i.e. a __iput would be required). Taking the inode_lock after iput means the iget was useless if we keep using the inode after the iput. So the only chance the 2.6 was safe to call __writeback_single_inode with the i_count == 0, is that I_WILL_FREE is set (I_WILL_FREE will prevent the VM to free the inode in XXXXX). Potentially calling the above iput with I_WILL_FREE was also wrong because it would recurse in iput_final (the second mainline bug). The below (untested) patch fixes the nr_unused accounting, avoids recursing in iput when I_WILL_FREE is set and makes sure (with the BUG_ON) that we don't corrupt memory and that all holders that don't set I_WILL_FREE, keeps a reference on the inode! Signed-off-by: Andrea Arcangeli <andrea@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] modules: fix sparse warning for every MODULE_PARMPavel Roskin1-1/+3
sparse complains about every MODULE_PARM used in a module: warning: symbol '__parm_foo' was not declared. Should it be static? The fix is to split declaration and initialization. While MODULE_PARM is obsolete, it's not something sparse should report. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] extable: remove needless declarationNicolas Pitre1-3/+0
They aren't used anywhere in that file. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] setkeys needs rootAndrew Morton1-0/+3
Because people can play games reprogramming keys and leaving traps for the next user of the console. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] firmware: fix all kernel-doc warningsRandy Dunlap1-28/+41
Convert existing function docs to kernel-doc format. Eliminate all kernel-doc warnings. Fix some doc typos and a little whitespace cleanup. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] jiffies_64 cleanupThomas Gleixner25-93/+4
Define jiffies_64 in kernel/timer.c rather than having 24 duplicated defines in each architecture. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Remove orphaned TIOCGDEV compat ioctlBrian Gerst1-29/+0
This ioctl doesn't exist for native i386. Signed-off-by: Brian Gerst <bgerst@didntduck.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] ext3: sparse fixesBen Dooks8-6/+29
Fix warnings from sparse due to un-declared functions that should either have a header file or have been declared static fs/ext2/bitmap.c:14:15: warning: symbol 'ext2_count_free' was not declared. Should it be static? fs/ext2/namei.c:92:15: warning: symbol 'ext2_get_parent' was not declared. Should it be static? fs/ext3/bitmap.c:15:15: warning: symbol 'ext3_count_free' was not declared. Should it be static? fs/ext3/namei.c:1013:15: warning: symbol 'ext3_get_parent' was not declared. Should it be static? fs/ext3/xattr.c:214:1: warning: symbol 'ext3_xattr_block_get' was not declared. Should it be static? fs/ext3/xattr.c:358:1: warning: symbol 'ext3_xattr_block_list' was not declared. Should it be static? fs/ext3/xattr.c:630:1: warning: symbol 'ext3_xattr_block_find' was not declared. Should it be static? fs/ext3/xattr.c:863:1: warning: symbol 'ext3_xattr_ibody_find' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Telecom Clock Driver for MPCBL0010 ATCA computer bladeMark Gross4-0/+914
Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fix de_thread() vs do_coredump() deadlockOleg Nesterov1-3/+13
de_thread() sends SIGKILL to all sub-threads and waits them to die in 'D' state. It is possible that one of the threads already dequeued coredump signal. When de_thread() unlocks ->sighand->lock that thread can enter do_coredump()->coredump_wait() and cause a deadlock. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Added a Receive_Abort to the Marvell serial driverCarlos Sanchez1-0/+2
Added a Receive_Abort to the Marvell serial driver Fix occasional input overrun errors on Marvell serial driver - If the Marvell serial driver is repeatedly started and then stopped it will occasionally report an input overrun error when started. - Added a Receive_Abort to the Marvell serial driver to abort previously received receive errors when re-starting the receive Acked-by: Mark A. Greer <mgreer@mvista.com> Signed-off-by: Carlos Sanchez <csanchez@mvista.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fuse: remove unused defineMiklos Szeredi1-1/+0
Setting ctime is implicit in all setattr cases, so the FATTR_CTIME definition is unnecessary. It is used by neither the kernel nor by userspace. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fuse: spelling fixesMiklos Szeredi2-9/+9
Correct some typos and inconsistent use of "initialise" vs "initialize" in comments. Reported by Ioannis Barkas. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Kconfig help text correction for CONFIG_FRAME_POINTERJesper Juhl1-2/+2
Fix-up the CONFIG_FRAME_POINTER help text language a bit. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] wait4 PTRACE_ATTACH race fixRoland McGrath1-0/+9
Back about a year ago when I last fiddled heavily with the do_wait code, I was thinking too hard about the wrong thing and I now think I introduced a bug whose inverse thought I was fixing. Apparently noone was looking too hard over much shoulder, so as to cite my bogus reasoning at the time. In the race condition when PTRACE_ATTACH is about to steal a child and then the child hits a tracing event (what my_ptrace_child checks for), the real parent does need to set its flag noting it has some eligible live children. Otherwise a spurious ECHILD error is possible, since the child in question is not yet on the ptrace_children list. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] ioc4 serial support - mostly cleanupPat Gefre1-36/+44
Various small mods for the Altix ioc4 serial driver - mostly cleanup: - remove UIF_INITIALIZED usage - use the 'lock' from uart_port - better multiple card support Signed-off-by: Patrick Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Locking problems while EXT3FS_DEBUG onGlauber de Oliveira Costa3-6/+3
I noticed some problems while running ext3 with the debug flag set on. More precisely, I was unable to umount the filesystem. Some investigation took me to the patch that follows. At a first glance , the lock/unlock I've taken out seems really not necessary, as the main code (outside debug) does not lock the super. The only additional danger operations that debug code introduces seems to be related to bitmap, but bitmap operations tends to be all atomic anyway. I also took the opportunity to fix 2 spelling errors. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] coredump_wait() cleanupOleg Nesterov1-8/+5
This patch deletes pointless code from coredump_wait(). 1. It does useless mm->core_waiters inc/dec under mm->mmap_sem, but any changes to ->core_waiters have no effect until we drop ->mmap_sem. 2. It calls yield() for absolutely unknown reason. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] PF_DEAD cleanupCoywolf Qi Hunt1-5/+5
The PF_DEAD setting doesn't belong to exit_notify(), move it to a proper place. Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cleanup for kernel/printk.cJesper Juhl1-36/+42
- Removes some trailing whitespace - Breaks long lines and make other small changes to conform to CodingStyle - Add explicit printk loglevels in two places. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] ide-cd mini cleanup of castsJesper Juhl1-25/+29
Remove some unneeded casts. Avoid an assignment in the case of kmalloc failure. Break a few instances of if (foo) whatever; into two lines. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Keys: Get rid of warning in kmod.c if keys disabledDavid Howells1-3/+3
The attached patch gets rid of a "statement without effect" warning when CONFIG_KEYS is disabled by making use of the return value of key_get(). The compiler will optimise all of this away when keys are disabled. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Keys: Add LSM hooks for key management [try #3]David Howells10-49/+191
The attached patch adds LSM hooks for key management facilities. The notable changes are: (1) The key struct now supports a security pointer for the use of security modules. This will permit key labelling and restrictions on which programs may access a key. (2) Security modules get a chance to note (or abort) the allocation of a key. (3) The key permission checking can now be enhanced by the security modules; the permissions check consults LSM if all other checks bear out. (4) The key permissions checking functions now return an error code rather than a boolean value. (5) An extra permission has been added to govern the modification of attributes (UID, GID, permissions). Note that there isn't an LSM hook specifically for each keyctl() operation, but rather the permissions hook allows control of individual operations based on the permission request bits. Key management access control through LSM is enabled by automatically if both CONFIG_KEYS and CONFIG_SECURITY are enabled. This should be applied on top of the patch ensubjected: [PATCH] Keys: Possessor permissions should be additive Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Keys: Export user-defined keyring operationsDavid Howells2-25/+71
Export user-defined key operations so that those who wish to define their own key type based on the user-defined key operations may do so (as has been requested). The header file created has been placed into include/keys/user-type.h, thus creating a directory where other key types may also be placed. Any objections to doing this? Signed-Off-By: David Howells <dhowells@redhat.com> Signed-Off-By: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] vm: remove unused/broken page_pte[_prot] macrosTejun Heo13-28/+0
This patch removes page_pte_prot and page_pte macros from all architectures. Some architectures define both, some only page_pte (broken) and others none. These macros are not used anywhere. page_pte_prot(page, prot) is identical to mk_pte(page, prot) and page_pte(page) is identical to page_pte_prot(page, __pgprot(0)). * The following architectures define both page_pte_prot and page_pte arm, arm26, ia64, sh64, sparc, sparc64 * The following architectures define only page_pte (broken) frv, i386, m32r, mips, sh, x86-64 * All other architectures define neither Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] vm: remove redundant assignment from __pagevec_release_nonlru()Tejun Heo1-1/+0
This patch removes redundant assignment from __pagevec_release_nonlru(). pages_to_free.cold is set to pvec->cold by pagevec_init() call right above the assignment. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fs: error case fix in __generic_file_aio_readTejun Heo1-2/+2
When __generic_file_aio_read() hits an error during reading, it reports the error iff nothing has successfully been read yet. This is condition - when an error occurs, if nothing has been read/written, report the error code; otherwise, report the amount of bytes successfully transferred upto that point. This corner case can be exposed by performing readv(2) with the following iov. iov[0] = len0 @ ptr0 iov[1] = len1 @ NULL (or any other invalid pointer) iov[2] = len2 @ ptr2 When file size is enough, performing above readv(2) results in len0 bytes from file_pos @ ptr0 len2 bytes from file_pos + len0 @ ptr2 And the return value is len0 + len2. Test program is attached to this mail. This patch makes __generic_file_aio_read()'s error handling identical to other functions. #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/uio.h> #include <errno.h> #include <string.h> int main(int argc, char **argv) { const char *path; struct stat stbuf; size_t len0, len1; void *buf0, *buf1; struct iovec iov[3]; int fd, i; ssize_t ret; if (argc < 2) { fprintf(stderr, "Usage: testreadv path (better be a " "small text file)\n"); return 1; } path = argv[1]; if (stat(path, &stbuf) < 0) { perror("stat"); return 1; } len0 = stbuf.st_size / 2; len1 = stbuf.st_size - len0; if (!len0 || !len1) { fprintf(stderr, "Dude, file is too small\n"); return 1; } if ((fd = open(path, O_RDONLY)) < 0) { perror("open"); return 1; } if (!(buf0 = malloc(len0)) || !(buf1 = malloc(len1))) { perror("malloc"); return 1; } memset(buf0, 0, len0); memset(buf1, 0, len1); iov[0].iov_base = buf0; iov[0].iov_len = len0; iov[1].iov_base = NULL; iov[1].iov_len = len1; iov[2].iov_base = buf1; iov[2].iov_len = len1; printf("vector "); for (i = 0; i < 3; i++) printf("%p:%zu ", iov[i].iov_base, iov[i].iov_len); printf("\n"); ret = readv(fd, iov, 3); if (ret < 0) perror("readv"); printf("readv returned %zd\nbuf0 = [%s]\nbuf1 = [%s]\n", ret, (char *)buf0, (char *)buf1); return 0; } Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] ptrace/coredump/exit_group deadlockAndrea Arcangeli2-5/+8
I could seldom reproduce a deadlock with a task not killable in T state (TASK_STOPPED, not TASK_TRACED) by attaching a NPTL threaded program to gdb, by segfaulting the task and triggering a core dump while some other task is executing exit_group and while one task is in ptrace_attached TASK_STOPPED state (not TASK_TRACED yet). This originated from a gdb bugreport (the fact gdb was segfaulting the task wasn't a kernel bug), but I just incidentally noticed the gdb bug triggered a real kernel bug as well. Most threads hangs in exit_mm because the core_dumping is still going, the core dumping hangs because the stopped task doesn't exit, the stopped task can't wakeup because it has SIGNAL_GROUP_EXIT set, hence the deadlock. To me it seems that the problem is that the force_sig_specific(SIGKILL) in zap_threads is a noop if the task has PF_PTRACED set (like in this case because gdb is attached). The __ptrace_unlink does nothing because the signal->flags is set to SIGNAL_GROUP_EXIT|SIGNAL_STOP_DEQUEUED (verified). The above info also shows that the stopped task hit a race and got the stop signal (presumably by the ptrace_attach, only the attach, state is still TASK_STOPPED and gdb hangs waiting the core before it can set it to TASK_TRACED) after one of the thread invoked the core dump (it's the core dump that sets signal->flags to SIGNAL_GROUP_EXIT). So beside the fact nobody would wakeup the task in __ptrace_unlink (the state is _not_ TASK_TRACED), there's a secondary problem in the signal handling code, where a task should ignore the ptrace-sigstops as long as SIGNAL_GROUP_EXIT is set (or the wakeup in __ptrace_unlink path wouldn't be enough). So I attempted to make this patch that seems to fix the problem. There were various ways to fix it, perhaps you prefer a different one, I just opted to the one that looked safer to me. I also removed the clearing of the stopped bits from the zap_other_threads (zap_other_threads was safe unlike zap_threads). I don't like useless code, this whole NPTL signal/ptrace thing is already unreadable enough and full of corner cases without confusing useless code into it to make it even less readable. And if this code is really needed, then you may want to explain why it's not being done in the other paths that sets SIGNAL_GROUP_EXIT at least. Even after this patch I still wonder who serializes the read of p->ptrace in zap_threads. Patch is called ptrace-core_dump-exit_group-deadlock-1. This was the trace I've got: test T ffff81003e8118c0 0 14305 1 14311 14309 (NOTLB) ffff810058ccdde8 0000000000000082 000001f4000037e1 ffff810000000013 00000000000000f8 ffff81003e811b00 ffff81003e8118c0 ffff810011362100 0000000000000012 ffff810017ca4180 Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff80141677>{finish_stop+87} <ffffffff8014367f>{get_signal_to_deliver+1359} <ffffffff8010d3ad>{do_signal+157} <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80111575>{sys_ptrace+2293} <ffffffff80131810>{default_wake_function+0} <ffffffff80196399>{sys_ioctl+73} <ffffffff8010dd27>{sysret_signal+28} <ffffffff8010e00f>{ptregscall_common+103} test D ffff810011362100 0 14309 1 14305 14312 (NOTLB) ffff810053c81cf8 0000000000000082 0000000000000286 0000000000000001 0000000000000195 ffff810011362340 ffff810011362100 ffff81002e338040 ffff810001e0ca80 0000000000000001 Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173} <ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149} <ffffffff801381af>{do_exit+479} <ffffffff80138d0c>{do_group_exit+252} <ffffffff801436db>{get_signal_to_deliver+1451} <ffffffff8010d3ad>{do_signal+157} <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2 <ffffffff8014208a>{force_sig_info+186} <ffffffff804479a0>{do_int3+112} <ffffffff8010e308>{retint_signal+61} test D ffff81002e338040 0 14311 1 14716 14305 (NOTLB) ffff81005ca8dcf8 0000000000000082 0000000000000286 0000000000000001 0000000000000120 ffff81002e338280 ffff81002e338040 ffff8100481cb740 ffff810001e0ca80 0000000000000001 Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173} <ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149} <ffffffff801381af>{do_exit+479} <ffffffff80142d0e>{__dequeue_signal+558} <ffffffff80138d0c>{do_group_exit+252} <ffffffff801436db>{get_signal_to_deliver+1451} <ffffffff8010d3ad>{do_signal+157} <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+208} <ffffffff8014208a>{force_sig_info+186} <ffffffff804479a0>{do_int3+112} <ffffffff8010e308>{retint_signal+61} test D ffff810017ca4180 0 14312 1 14309 13882 (NOTLB) ffff81005d15fcb8 0000000000000082 ffff81005d15fc58 ffffffff80130816 0000000000000897 ffff810017ca43c0 ffff810017ca4180 ffff81003e8118c0 0000000000000082 ffffffff801317ed Call Trace:<ffffffff80130816>{activate_task+150} <ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173} <ffffffff80131810>{default_wake_function+0} <ffffffff8018cdc3>{do_coredump+819} <ffffffff80445f52>{thread_return+82} <ffffffff801436d4>{get_signal_to_deliver+1444} <ffffffff8010d3ad>{do_signal+157} <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2 <ffffffff804472e5>{_spin_unlock_irqrestore+5} <ffffffff8014208a>{force_sig_info+186} <ffffffff804476ff>{do_general_protection+159} <ffffffff8010e308>{retint_signal+61} Signed-off-by: Andrea Arcangeli <andrea@suse.de> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cpusets: automatic numa mempolicy rebindingPaul Jackson3-0/+74
This patch automatically updates a tasks NUMA mempolicy when its cpuset memory placement changes. It does so within the context of the task, without any need to support low level external mempolicy manipulation. If a system is not using cpusets, or if running on a system with just the root (all-encompassing) cpuset, then this remap is a no-op. Only when a task is moved between cpusets, or a cpusets memory placement is changed does the following apply. Otherwise, the main routine below, rebind_policy() is not even called. When mixing cpusets, scheduler affinity, and NUMA mempolicies, the essential role of cpusets is to place jobs (several related tasks) on a set of CPUs and Memory Nodes, the essential role of sched_setaffinity is to manage a jobs processor placement within its allowed cpuset, and the essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs memory placement within its allowed cpuset. However, CPU affinity and NUMA memory placement are managed within the kernel using absolute system wide numbering, not cpuset relative numbering. This is ok until a job is migrated to a different cpuset, or what's the same, a jobs cpuset is moved to different CPUs and Memory Nodes. Then the CPU affinity and NUMA memory placement of the tasks in the job need to be updated, to preserve their cpuset-relative position. This can be done for CPU affinity using sched_setaffinity() from user code, as one task can modify anothers CPU affinity. This cannot be done from an external task for NUMA memory placement, as that can only be modified in the context of the task using it. However, it easy enough to remap a tasks NUMA mempolicy automatically when a task is migrated, using the existing cpuset mechanism to trigger a refresh of a tasks memory placement after its cpuset has changed. All that is needed is the old and new nodemask, and notice to the task that it needs to rebind its mempolicy. The tasks mems_allowed has the old mask, the tasks cpuset has the new mask, and the existing cpuset_update_current_mems_allowed() mechanism provides the notice. The bitmap/cpumask/nodemask remap operators provide the cpuset relative calculations. This patch leaves open a couple of issues: 1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies: These mempolicies may reference nodes outside of those allowed to the current task by its cpuset. Tasks are migrated as part of jobs, which reside on what might be several cpusets in a subtree. When such a job is migrated, all NUMA memory policy references to nodes within that cpuset subtree should be translated, and references to any nodes outside that subtree should be left untouched. A future patch will provide the cpuset mechanism needed to mark such subtrees. With that patch, we will be able to correctly migrate these other memory policies across a job migration. 2) Updating cpuset, affinity and memory policies in user space: This is harder. Any placement state stored in user space using system-wide numbering will be invalidated across a migration. More work will be required to provide user code with a migration-safe means to manage its cpuset relative placement, while preserving the current API's that pass system wide numbers, not cpuset relative numbers across the kernel-user boundary. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cpusets: bitmap and mask remap operatorsPaul Jackson4-0/+212
In the forthcoming task migration support, a key calculation will be mapping cpu and node numbers from the old set to the new set while preserving cpuset-relative offset. For example, if a task and its pages on nodes 8-11 are being migrated to nodes 24-27, then pages on node 9 (the 2nd node in the old set) should be moved to node 25 (the 2nd node in the new set.) As with other bitmap operations, the proper way to code this is to provide the underlying calculation in lib/bitmap.c, and then to provide the usual cpumask and nodemask wrappers. This patch provides that. These operations are termed 'remap' operations. Both remapping a single bit and a set of bits is supported. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cpusets: confine pdflush to its cpusetPaul Jackson1-0/+13
This patch keeps pdflush daemons on the same cpuset as their parent, the kthread daemon. Some large NUMA configurations put as much as they can of kernel threads and other classic Unix load in what's called a bootcpuset, keeping the rest of the system free for dedicated jobs. This effort is thwarted by pdflush, which dynamically destroys and recreates pdflush daemons depending on load. It's easy enough to force the originally created pdflush deamons into the bootcpuset, at system boottime. But the pdflush threads created later were allowed to run freely across the system, due to the necessary line in their startup kthread(): set_cpus_allowed(current, CPU_MASK_ALL); By simply coding pdflush to start its threads with the cpus_allowed restrictions of its cpuset (inherited from kthread, its parent) we can ensure that dynamically created pdflush threads are also kept in the bootcpuset. On systems w/o cpusets, or w/o a bootcpuset implementation, the following will have no affect, leaving pdflush to run on any CPU, as before. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cpusets: simple renamePaul Jackson1-0/+16
Add support for renaming cpusets. Only allow simple rename of cpuset directories in place. Don't allow moving cpusets elsewhere in hierarchy or renaming the special cpuset files in each cpuset directory. The usefulness of this simple rename became apparent when developing task migration facilities. It allows building a second cpuset hierarchy using new names and containing new CPUs and Memory Nodes, moving tasks from the old to the new cpusets, removing the old cpusets, and then renaming the new cpusets to be just like the old names, so that any knowledge that the tasks had of their cpuset names will still be valid. Leaf node cpusets can be migrated to other CPUs or Memory Nodes by just updating their 'cpus' and 'mems' files, but because no cpuset can contain CPUs or Nodes not in its parent cpuset, one cannot do this in a cpuset hierarchy without first expanding all the non-leaf cpusets to contain the union of both the old and new CPUs and Nodes, which would obfuscate the one-to-one migration of a task from one cpuset to another required to correctly migrate the physical page frames currently allocated to that task. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cpusets: dual semaphore locking overhaulPaul Jackson2-138/+282
Overhaul cpuset locking. Replace single semaphore with two semaphores. The suggestion to use two locks was made by Roman Zippel. Both locks are global. Code that wants to modify cpusets must first acquire the exclusive manage_sem, which allows them read-only access to cpusets, and holds off other would-be modifiers. Before making actual changes, the second semaphore, callback_sem must be acquired as well. Code that needs only to query cpusets must acquire callback_sem, which is also a global exclusive lock. The earlier problems with double tripping are avoided, because it is allowed for holders of manage_sem to nest the second callback_sem lock, and only callback_sem is needed by code called from within __alloc_pages(), where the double tripping had been possible. This is not quite the same as a normal read/write semaphore, because obtaining read-only access with intent to change must hold off other such attempts, while allowing read-only access w/o such intention. Changing cpusets involves several related checks and changes, which must be done while allowing read-only queries (to avoid the double trip), but while ensuring nothing changes (holding off other would be modifiers.) This overhaul of cpuset locking also makes careful use of task_lock() to guard access to the task->cpuset pointer, closing a couple of race conditions noticed while reading this code (thanks, Roman). I've never seen these races fail in any use or test. See further the comments in the code. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cpusets: remove depth counted locking hackPaul Jackson1-65/+40
Remove a rather hackish depth counter on cpuset locking. The depth counter was avoiding a possible double trip on the global cpuset_sem semaphore. It worked, but now an improved version of cpuset locking is available, to come in the next patch, using two global semaphores. This patch reverses "cpuset semaphore depth check deadlock fix" The kernel still works, even after this patch, except for some rare and difficult to reproduce race conditions when agressively creating and destroying cpusets marked with the notify_on_release option, on very large systems. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] cpuset cleanupPaul Jackson1-1/+0
Remove one more useless line from cpuset_common_file_read(). Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] proc: fix of error path in proc_get_inode()Kirill Korotaev1-7/+10
This patch fixes incorrect error path in proc_get_inode(), when module can't be get due to being unloaded. When try_module_get() fails, this function puts de(!) and still returns inode with non-getted de. There are still unresolved known bugs in proc yet to be fixed: - proc_dir_entry tree is managed without any serialization - create_proc_entry() doesn't setup de->owner anyhow, so setting it later manually is inatomic. - looks like almost all modules do not care whether it's de->owner is set... Signed-Off-By: Denis Lunev <den@sw.ru> Signed-Off-By: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] fuse: clean up dead code related to nfs exportingMiklos Szeredi1-2/+3
Remove last remains of NFS exportability support. The code is actually buggy (as reported by Akshat Aranya), since 'alias' will be leaked if it's non-null and alias->d_flags has DCACHE_DISCONNECTED. This is not an active bug, since there will never be any disconnected dentries. But it's better to get rid of the unnecessary complexity anyway. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] add_timer() of a pending timer is illegalAndrew Morton1-1/+2
In the recent timer rework we lost the check for an add_timer() of an already-pending timer. That check was useful for networking, so put it back. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] block cleanups: Fix iosched module refcount leakNate Diller1-1/+3
If the requested I/O scheduler is already in place, elevator_switch simply leaves the queue alone, and returns. However, it forgets to call elevator_put, so 'echo [current_sched] > /sys/block/[dev]/queue/scheduler' will leak a reference, causing the current_sched module to be permanently pinned in memory. Signed-off-by: Nate Diller <nate@namesys.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>