summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2009-12-10compat_ioctl: pass compat pointer directly to handlersArnd Bergmann1-128/+83
Instead of having each handler call compat_ptr, we can now convert the pointer once and pass that to each handler. This saves a little bit of both source and object code size. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10compat_ioctl: simplify lookup tableArnd Bergmann1-46/+44
The compat_ioctl table now only contains entries for COMPATIBLE_IOCTL, so we only need to know if a number is listed in it or now. As an optimization, we hash the table entries with a reversible transformation to get a more uniform distribution over it, sort the table at startup and then guess the position in the table when an ioctl number gets called to do a linear search from there. With the current set of ioctl numbers and the chosen transformation function, we need an average of four steps to find if a number is in the set, all of the accesses within one or two cache lines. This at least as good as the previous hash table approach but saves 8.5 kb of kernel memory. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10compat_ioctl: simplify calling of handlersArnd Bergmann1-47/+39
The compat_ioctl array now contains only entries for ioctl numbers that do not require a separate handler. By special-casing the ULONG_IOCTL case in the do_ioctl_trans function, we can kill the final use of a function pointer in the array. text data bss dec hex filename 7539 13352 2080 22971 59bb before/fs/compat_ioctl.o 7910 8552 2080 18542 486e after/fs/compat_ioctl.o Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10compat_ioctl: inline all conversion handlersArnd Bergmann1-54/+81
This makes all ioctl conversion handlers called from a single switch statement, leaving only COMPATIBLE_IOCTL and ULONG_IOCTL statements in the table. This is somewhat more space efficient and also lets us simplify the handling of the lookup table significantly. before: text data bss dec hex filename 7619 14024 2080 23723 5cab obj/fs/compat_ioctl.o after: 7567 13352 2080 22999 59d7 obj/fs/compat_ioctl.o Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2009-12-10compat_ioctl: Remove BKLArnd Bergmann1-2/+0
We have always called ioctl conversion handlers under the big kernel lock, although that is generally not necessary. In particular it is not needed for conversion of data structures and for calling sys_ioctl or do_vfs_ioctl, which will get the BKL again if needed. Handlers doing more than those two have been moved out, so we can kill off the BKL from compat_sys_ioctl. This may significantly improve latencies with 32 bit applications, and it avoids a common scenario where a thread acquires the BKL twice. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-12-10compat_ioctl: remove all VT ioctl handlingArnd Bergmann1-187/+1
The VT driver now handles all of these ioctls directly, so we can remove the handlers from common code. These are the only handlers that require the BKL because they directly perform the ioctl action rather than just converting the data structures. Once they are gone, we can remove the BKL from the remaining ioctl conversion handlers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-12-10Merge branch 'for-linus' of ↵Linus Torvalds14-53/+46
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: always use GFP_NOFS
2009-12-10Merge branch 'for_linus' of ↵Linus Torvalds19-498/+629
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits) ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) ext4: Do not override ext2 or ext3 if built they are built as modules jbd2: Export jbd2_log_start_commit to fix ext4 build ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT ext4: Wait for proper transaction commit on fsync ext4: fix incorrect block reservation on quota transfer. ext4: quota macros cleanup ext4: ext4_get_reserved_space() must return bytes instead of blocks ext4: remove blocks from inode prealloc list on failure ext4: wait for log to commit when umounting ext4: Avoid data / filesystem corruption when write fails to copy data ext4: Use ext4 file system driver for ext2/ext3 file system mounts ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() ext4: remove unused parameter wbc from __ext4_journalled_writepage() ext4: remove encountered_congestion trace ext4: move_extent_per_page() cleanup ext4: initialize moved_len before calling ext4_move_extents() ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT ext4: use ext4_data_block_valid() in ext4_free_blocks() ...
2009-12-10Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds8-446/+1093
* 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Multi-device mirror support exofs: Move all operations to an io_engine exofs: move osd.c to ios.c exofs: statfs blocks is sectors not FS blocks exofs: Prints on mount and unmout exofs: refactor exofs_i_info initialization into common helper exofs: dbg-print less exofs: More sane debug print trivial: some small fixes in exofs documentation
2009-12-10Merge git://git.infradead.org/ubifs-2.6Linus Torvalds3-18/+17
* git://git.infradead.org/ubifs-2.6: UBIFS: fix return code in check_leaf UBI: flush wl before clearing update marker MAINTAINERS: change e-mail of Artem Bityutskiy UBIFS: remove manual O_SYNC handling UBIFS: support mounting of UBI volume character devices UBI: Add ubi_open_volume_path
2009-12-10NFS: Fix nfs_migrate_page()Trond Myklebust1-2/+3
The call to migrate_page() will cause the page->private field to be cleared. Also fix up the locking around the page->private transfer, so that we ensure that calls to nfs_page_find_request() don't end up racing. Finally, fix up a double free bug: nfs_unlock_request() already calls nfs_release_request() for us... Reported-by: Wu Fengguang <fengguang.wu@intel.com> Tested-by: Andi Kleen <andi@firstfloor.org> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-10ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()Roel Kluin1-1/+1
Return the PTR_ERR of the correct pointer. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Fix data / filesystem corruption when write fails to copy dataJan Kara1-4/+14
When ext3_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Reported-by: James Y Knight <foom@fuhm.net> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext4: Support for 64-bit quota formatJan Kara1-6/+24
Add support for new 64-bit quota format. It is enough to add proper mount options handling. The rest is done by the generic code. Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Support for vfsv1 quota formatJan Kara1-6/+24
We just have to add proper mount options handling. The rest is handled by the generic quota code. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10quota: Implement quota format with 64-bit space and inode limitsJan Kara3-39/+153
So far the maximum quota space limit was 4TB. Apparently this isn't enough for Lustre guys anymore. So implement new quota format which raises block limits to 2^64 bytes. Also store number of inodes and inode limits in 64-bit variables as 2^32 files isn't that insanely high anymore. The first version of the patch has been developed by Andrew Perepechko <Andrew.Perepechko@Sun.COM>. CC: Andrew.Perepechko@Sun.COM Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10quota: Move definition of QFMT_OCFS2 to linux/quota.hJan Kara1-4/+0
Move definition of this constant to linux/quota.h so that it cannot clash with other format IDs. CC: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: fix comment in ext2_find_entry about return valuesJérémy Cochoy1-2/+2
Signed-off-by: Jérémy Cochoy <jeremy.cochoy@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Unify log messages in ext3Alexey Fisher1-205/+227
Make messages produced by ext3 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684964] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended dmesg after patch: [ 873.300792] EXT3-fs (loop0): using internal journaln [ 873.300796] EXT3-fs (loop0): mounted filesystem with writeback data mode [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. [ 723.755642] EXT3-fs (loop0): error: bad blocksize 8192 [ 357.874687] EXT3-fs (loop0): error: no journal found. mounting ext3 over ext2? [ 873.300764] EXT3-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: clear uptodate flag on super block I/O errorStephen Hemminger1-0/+16
This fixes a WARN backtrace in mark_buffer_dirty() that occurs during unmount when a USB or floppy device is removed. I reported this a kernel regression, but looks like it might have been there for longer than that. The super block update from a previous operation has marked the buffer as in error, and the flag has to be cleared before doing the update. (Similar code already exists in ext4). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: Unify log messages in ext2Alexey Fisher4-80/+101
make messages produced by ext2 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] dmesg after patch: [ 4893.684892] EXT2-fs (loop0): reservations ON [ 4893.684896] EXT2-fs (loop0): xip option not supported [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: make "norecovery" an alias for "noload"Eric Sandeen1-0/+4
Users on the list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext3. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: Don't update the superblock in ext3_statfs()Eric Sandeen1-2/+0
commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext3_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. The modifications were originally to keep the sb "more" in sync, so that a readonly fsck of the device didn't flag this as an error (as often), but apparently e2fsprogs deals with this differently now, anyway. Based on Ted's patch for ext4, which was in turn based on my work on that bug and another preliminary patch... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext3: journal all modifications in ext3_xattr_set_handleEric Sandeen1-3/+4
ext3_xattr_set_handle() was zeroing out an inode outside of journaling constraints; this is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Although ext3 doesn't have the crc issue, modifications out of journal control are a Bad Thing. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10quota: Fix WARN_ON in lookup_one_lenJan Kara1-0/+2
We should hold i_mutex when looking up quota files for journaled quotas, otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't hold i_mutex previously probably could not lead to a real bug since the filesystem is just being mounted / remounted read-write and thus the root directory cannot change anyway but it's definitely cleaner with i_mutex. Reported-by: Bastien ROUCARIES <roucaries.bastien@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10const: struct quota_format_opsAlexey Dobriyan3-3/+3
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ubifs: remove manual O_SYNC handlingChristoph Hellwig1-12/+1
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate call to ubifs_sync_wbufs_by_inode which is already covered by ubifs_fsync. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10afs: remove manual O_SYNC handlingChristoph Hellwig1-9/+0
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate manual invocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10kill wait_on_page_writeback_rangeChristoph Hellwig2-7/+3
All callers really want the more logical filemap_fdatawait_range interface, so convert them to use it and merge wait_on_page_writeback_range into filemap_fdatawait_range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10vfs: Implement proper O_SYNC semanticsChristoph Hellwig11-15/+29
While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10zisofs: Implement reading of compressed files when PAGE_CACHE_SIZE > ↵Jan Kara2-250/+286
compress block size Also split and cleanup zisofs_readpage() when we are changing it anyway. Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10exofs: Multi-device mirror supportBoaz Harrosh6-28/+361
This patch changes on-disk format, it is accompanied with a parallel patch to mkfs.exofs that enables multi-device capabilities. After this patch, old exofs will refuse to mount a new formatted FS and new exofs will refuse an old format. This is done by moving the magic field offset inside the FSCB. A new FSCB *version* field was added. In the future, exofs will refuse to mount unmatched FSCB version. To up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option before mounting. Introduced, a new object that contains a *device-table*. This object contains the default *data-map* and a linear array of devices information, which identifies the devices used in the filesystem. This object is only written to offline by mkfs.exofs. This is why it is kept separate from the FSCB, since the later is written to while mounted. Same partition number, same object number is used on all devices only the device varies. * define the new format, then load the device table on mount time make sure every thing is supported. * Change I/O engine to now support Mirror IO, .i.e write same data to multiple devices, read from a random device to spread the read-load from multiple clients (TODO: stripe read) Implementation notes: A few points introduced in previous patch should be mentioned here: * Special care was made so absolutlly all operation that have any chance of failing are done before any osd-request is executed. This is to minimize the need for a data consistency recovery, to only real IO errors. * Each IO state has a kref. It starts at 1, any osd-request executed will increment the kref, finally when all are executed the first ref is dropped. At IO-done, each request completion decrements the kref, the last one to return executes the internal _last_io() routine. _last_io() will call the registered io_state_done. On sync mode a caller does not supply a done method, indicating a synchronous request, the caller is put to sleep and a special io_state_done is registered that will awaken the caller. Though also in sync mode all operations are executed in parallel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Move all operations to an io_engineBoaz Harrosh5-350/+644
In anticipation for multi-device operations, we separate osd operations into an abstract I/O API. Currently only one device is used but later when adding more devices, we will drive all devices in parallel according to a "data_map" that describes how data is arranged on multiple devices. The file system level operates, like before, as if there is one object (inode-number) and an i_size. The io engine will split this to the same object-number but on multiple device. At first we introduce Mirror (raid 1) layout. But at the final outcome we intend to fully implement the pNFS-Objects data-map, including raid 0,4,5,6 over mirrored devices, over multiple device-groups. And more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12 * Define an io_state based API for accessing osd storage devices in an abstract way. Usage: First a caller allocates an io state with: exofs_get_io_state(struct exofs_sb_info *sbi, struct exofs_io_state** ios); Then calles one of: exofs_sbi_create(struct exofs_io_state *ios); exofs_sbi_remove(struct exofs_io_state *ios); exofs_sbi_write(struct exofs_io_state *ios); exofs_sbi_read(struct exofs_io_state *ios); exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len); And when done exofs_put_io_state(struct exofs_io_state *ios); * Convert all source files to use this new API * Convert from bio_alloc to bio_kmalloc * In io engine we make use of the now fixed osd_req_decode_sense There are no functional changes or on disk additions after this patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: move osd.c to ios.cBoaz Harrosh2-1/+1
If I do a "git mv" together with a massive code change and commit in one patch, git looses the rename and records a delete/new instead. This is bad because I want a rename recorded so later rebased/cherry-picked patches to the old name will work. Also the --follow is lost. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: statfs blocks is sectors not FS blocksBoaz Harrosh1-4/+6
Even though exofs has a 4k block size, statfs blocks is in sectors (512 bytes). Also if target returns 0 for capacity then make it ULLONG_MAX. df does not like zero-size filesystems Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Prints on mount and unmoutBoaz Harrosh1-0/+11
It is important to print in the logs when a filesystem was mounted and eventually unmounted. Print the osd-device's osd_name and pid the FS was mounted/unmounted on. TODO: How to also print the namespace path the filesystem was mounted on? Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: refactor exofs_i_info initialization into common helperBoaz Harrosh1-2/+8
There are two places that initialize inodes: exofs_iget() and exofs_new_inode() As more members of exofs_i_info that need initialization are added this code will grow. (soon) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: dbg-print lessBoaz Harrosh1-5/+7
Iner-loops printing is converted to EXOFS_DBG2 which is #defined to nothing. It is now almost bareable to just leave debug-on. Every operation is printed once, with most relevant info (I hope). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: More sane debug printBoaz Harrosh1-2/+1
debug prints should be somewhat useful without actually reading the source code Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-09Merge branch 'for-linus' of ↵Linus Torvalds37-64/+60
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...
2009-12-09ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)Theodore Ts'o1-2/+2
Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09ext4: Do not override ext2 or ext3 if built they are built as modulesTheodore Ts'o2-3/+3
The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09jbd2: Export jbd2_log_start_commit to fix ext4 buildTheodore Ts'o1-0/+1
This fixes: ERROR: "jbd2_log_start_commit" [fs/ext4/ext4.ko] undefined! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09Merge branch 'reiserfs/kill-bkl' of ↵Linus Torvalds16-163/+420
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: (31 commits) kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl() kill-the-bkl/reiserfs: always lock the ioctl path kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency kill-the-bkl/reiserfs: panic in case of lock imbalance kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock kill-the-bkl/reiserfs: acquire the inode mutex safely kill-the-bkl/reiserfs: unlock only when needed in search_by_key kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() kill-the-bkl/reiserfs: conditionaly release the write lock on fs_changed() kill-the-BKL/reiserfs: add reiserfs_cond_resched() ...
2009-12-09Merge commit 'origin/master' into nextBenjamin Herrenschmidt80-1721/+2571
Conflicts: include/linux/kvm.h
2009-12-08nfs41: Invoke RECLAIM_COMPLETE on all new client idsRicardo Labiaga1-3/+4
The NFSv4.1 spec indicates RECLAIM_COMPLETE is to be issued whenever a client establishes a new client id, not only after detecting the server has rebooted. Set the NFS4CLNT_RECLAIM_REBOOT bit after every new client id has been established. This enables us to issue RECLAIM_COMPLETE during the wrap up of the NFS4CLNT_RECLAIM_REBOOT state. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-08Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds12-68/+143
* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits) cfq-iosched: Do not access cfqq after freeing it block: include linux/err.h to use ERR_PTR cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit blkio: Allow CFQ group IO scheduling even when CFQ is a module blkio: Implement dynamic io controlling policy registration blkio: Export some symbols from blkio as its user CFQ can be a module block: Fix io_context leak after failure of clone with CLONE_IO block: Fix io_context leak after clone with CLONE_IO cfq-iosched: make nonrot check logic consistent io controller: quick fix for blk-cgroup and modular CFQ cfq-iosched: move IO controller declerations to a header file cfq-iosched: fix compile problem with !CONFIG_CGROUP blkio: Documentation blkio: Wait on sync-noidle queue even if rq_noidle = 1 blkio: Implement group_isolation tunable blkio: Determine async workload length based on total number of queues blkio: Wait for cfq queue to get backlogged if group is empty blkio: Propagate cgroup weight updation to cfq groups blkio: Drop the reference to queue once the task changes cgroup blkio: Provide some isolation between groups ...
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2-749/+4
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6Linus Torvalds10-136/+60
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...
2009-12-08NFSv41: Fix a potential state leakage when restarting nfs4_close_prepareTrond Myklebust1-17/+35
Currently, if the call to nfs4_setup_sequence() in nfs4_close_prepare fails, any later retries will fail to launch an RPC call, due to the fact that the &state->flags will have been cleared. Ditto if nfs4_close_done() triggers a call to the NFSv4.1 version of nfs_restart_rpc(). We therefore move the actual clearing of the state->flags to nfs4_close_done(), when we know that the RPC call was successful. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>