summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2012-05-04vfs: clean up __d_lookup_rcu() and dentry_cmp() interfacesLinus Torvalds2-51/+120
The calling conventions for __d_lookup_rcu() and dentry_cmp() are annoying in different ways, and there is actually one single underlying reason for both of the annoyances. The fundamental reason is that we do the returned dentry sequence number check inside __d_lookup_rcu() instead of doing it in the caller. This results in two annoyances: - __d_lookup_rcu() now not only needs to return the dentry and the sequence number that goes along with the lookup, it also needs to return the inode pointer that was validated by that sequence number check. - and because we did the sequence number check early (to validate the name pointer and length) we also couldn't just pass the dentry itself to dentry_cmp(), we had to pass the counted string that contained the name. So that sequence number decision caused two separate ugly calling conventions. Both of these problems would be solved if we just did the sequence number check in the caller instead. There's only one caller, and that caller already has to do the sequence number check for the parent anyway, so just do that. That allows us to stop returning the dentry->d_inode in that in-out argument (pointer-to-pointer-to-inode), so we can make the inode argument just a regular input inode pointer. The caller can just load the inode from dentry->d_inode, and then do the sequence number check after that to make sure that it's synchronized with the name we looked up. And it allows us to just pass in the dentry to dentry_cmp(), which is what all the callers really wanted. Sure, dentry_cmp() has to be a bit careful about the dentry (which is not stable during RCU lookup), but that's actually very simple. And now that dentry_cmp() can clearly see that the first string argument is a dentry, we can use the direct word access for that, instead of the careful unaligned zero-padding. The dentry name is always properly aligned, since it is a single path component that is either embedded into the dentry itself, or was allocated with kmalloc() (see __d_alloc). Finally, this also uninlines the nasty slow-case for dentry comparisons: that one *does* need to do a sequence number check, since it will call in to the low-level filesystems, and we want to give those a stable inode pointer and path component length/start arguments. Doing an extra sequence check for that slow case is not a problem, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-03vfs: make word-at-a-time accesses handle a non-existing pageLinus Torvalds2-6/+24
It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that can have holes in the kernel address space: it seems to happen easily with Xen, and it looks like the AMD gart64 code will also punch holes dynamically. Actually hitting that case is still very unlikely, so just do the access, and take an exception and fix it up for the very unlikely case of it being a page-crosser with no next page. And hey, this abstraction might even help other architectures that have other issues with unaligned word accesses than the possible missing next page. IOW, this could do the byte order magic too. Peter Anvin fixed a thinko in the shifting for the exception case. Reported-and-tested-by: Jana Saout <jana@saout.de> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-02Merge tag 'nfs-for-3.4-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds13-139/+267
Pull NFS client bugfixes from Trond Myklebust: - Fixes for the NFSv4 security negotiation - Use the correct hostname when mounting from a private namespace - NFS net namespace bugfixes for the pipefs filesystem - NFSv4 GETACL bugfixes - IPv6 bugfix for NFSv4 referrals * tag 'nfs-for-3.4-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: Use the correct hostname in the client identifier string SUNRPC: RPC client must use the current utsname hostname string NFS: get module in idmap PipeFS notifier callback NFS: Remove unused function nfs_lookup_with_sec() NFS: Honor the authflavor set in the clone mount data NFS: Fix following referral mount points with different security NFS: Do secinfo as part of lookup NFS: Handle exceptions coming out of nfs4_proc_fs_locations() NFS: Fix SECINFO_NO_NAME SUNRPC: traverse clients tree on PipeFS event SUNRPC: set per-net PipeFS superblock before notification SUNRPC: skip clients with program without PipeFS entries SUNRPC: skip dead but not buried clients on PipeFS events Avoid beyond bounds copy while caching ACL Avoid reading past buffer when calling GETACL fix page number calculation bug for block layout decode buffer NFSv4.1 fix page number calculation bug for filelayout decode buffers pnfs-obj: Remove unused variable from objlayout_get_deviceinfo() nfs4: fix referrals on mounts that use IPv6 addrs
2012-04-30nfsd: fix nfs4recover.c printk format warningRandy Dunlap1-1/+1
Fix printk format warnings -- both items are size_t, so use %zu to print them. fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t' fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'unsigned int' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-30NFSv4.1: Use the correct hostname in the client identifier stringTrond Myklebust1-3/+2
We need to use the hostname of the process that created the nfs_client. That hostname is now stored in the rpc_client->cl_nodename. Also remove the utsname()->domainname component. There is no reason to include the NIS/YP domainname in a client identifier string. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-29autofs: make the autofsv5 packet file descriptor use a packetized pipeLinus Torvalds3-2/+13
The autofs packet size has had a very unfortunate size problem on x86: because the alignment of 'u64' differs in 32-bit and 64-bit modes, and because the packet data was not 8-byte aligned, the size of the autofsv5 packet structure differed between 32-bit and 64-bit modes despite looking otherwise identical (300 vs 304 bytes respectively). We first fixed that up by making the 64-bit compat mode know about this problem in commit a32744d4abae ("autofs: work around unhappy compat problem on x86-64"), and that made a 32-bit 'systemd' work happily on a 64-bit kernel because everything then worked the same way as on a 32-bit kernel. But it turned out that 'automount' had actually known and worked around this problem in user space, so fixing the kernel to do the proper 32-bit compatibility handling actually *broke* 32-bit automount on a 64-bit kernel, because it knew that the packet sizes were wrong and expected those incorrect sizes. As a result, we ended up reverting that compatibility mode fix, and thus breaking systemd again, in commit fcbf94b9dedd. With both automount and systemd doing a single read() system call, and verifying that they get *exactly* the size they expect but using different sizes, it seemed that fixing one of them inevitably seemed to break the other. At one point, a patch I seriously considered applying from Michael Tokarev did a "strcmp()" to see if it was automount that was doing the operation. Ugly, ugly. However, a prettier solution exists now thanks to the packetized pipe mode. By marking the communication pipe as being packetized (by simply setting the O_DIRECT flag), we can always just write the bigger packet size, and if user-space does a smaller read, it will just get that partial end result and the extra alignment padding will simply be thrown away. This makes both automount and systemd happy, since they now get the size they asked for, and the kernel side of autofs simply no longer needs to care - it could pad out the packet arbitrarily. Of course, if there is some *other* user of autofs (please, please, please tell me it ain't so - and we haven't heard of any) that tries to read the packets with multiple writes, that other user will now be broken - the whole point of the packetized mode is that one system call gets exactly one packet, and you cannot read a packet in pieces. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-29pipes: add a "packetized pipe" mode for writingLinus Torvalds1-2/+29
The actual internal pipe implementation is already really about individual packets (called "pipe buffers"), and this simply exposes that as a special packetized mode. When we are in the packetized mode (marked by O_DIRECT as suggested by Alan Cox), a write() on a pipe will not merge the new data with previous writes, so each write will get a pipe buffer of its own. The pipe buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn will tell the reader side to break the read at that boundary (and throw away any partial packet contents that do not fit in the read buffer). End result: as long as you do writes less than PIPE_BUF in size (so that the pipe doesn't have to split them up), you can now treat the pipe as a packet interface, where each read() system call will read one packet at a time. You can just use a sufficiently big read buffer (PIPE_BUF is sufficient, since bigger than that doesn't guarantee atomicity anyway), and the return value of the read() will naturally give you the size of the packet. NOTE! We do not support zero-sized packets, and zero-sized reads and writes to a pipe continue to be no-ops. Also note that big packets will currently be split at write time, but that the size at which that happens is not really specified (except that it's bigger than PIPE_BUF). Currently that limit is the system page size, but we might want to explicitly support bigger packets some day. The main user for this is going to be the autofs packet interface, allowing us to stop having to care so deeply about exact packet sizes (which have had bugs with 32/64-bit compatibility modes). But user space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will fail with an EINVAL on kernels that do not support this interface. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org # needed for systemd/autofs interaction fix Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-28NFS: get module in idmap PipeFS notifier callbackStanislav Kinsbursky1-0/+4
This is bug fix. Notifier callback is called from SUNRPC module. So before dereferencing NFS module we have to make sure, that it's alive. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-28Merge branch 'for-linus' of ↵Linus Torvalds15-139/+148
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has our collection of bug fixes. I missed the last rc because I thought our patches were making NFS crash during my xfs test runs. Turns out it was an NFS client bug fixed by someone else while I tried to bisect it. All of these fixes are small, but some are fairly high impact. The biggest are fixes for our mount -o remount handling, a deadlock due to GFP_KERNEL allocations in readdir, and a RAID10 error handling bug. This was tested against both 3.3 and Linus' master as of this morning." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (26 commits) Btrfs: reduce lock contention during extent insertion Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir Btrfs: Fix space checking during fs resize Btrfs: fix block_rsv and space_info lock ordering Btrfs: Prevent root_list corruption Btrfs: fix repair code for RAID10 Btrfs: do not start delalloc inodes during sync Btrfs: fix that check_int_data mount option was ignored Btrfs: don't count CRC or header errors twice while scrubbing Btrfs: fix btrfs_ioctl_dev_info() crash on missing device btrfs: don't return EINTR Btrfs: double unlock bug in error handling Btrfs: always store the mirror we read the eb from fs/btrfs/volumes.c: add missing free_fs_devices btrfs: fix early abort in 'remount' Btrfs: fix max chunk size check in chunk allocator Btrfs: add missing read locks in backref.c Btrfs: don't call free_extent_buffer twice in iterate_irefs Btrfs: Make free_ipath() deal gracefully with NULL pointers Btrfs: avoid possible use-after-free in clear_extent_bit() ...
2012-04-28Revert "autofs: work around unhappy compat problem on x86-64"Linus Torvalds4-23/+3
This reverts commit a32744d4abae24572eff7269bc17895c41bd0085. While that commit was technically the right thing to do, and made the x86-64 compat mode work identically to native 32-bit mode (and thus fixing the problem with a 32-bit systemd install on a 64-bit kernel), it turns out that the automount binaries had workarounds for this compat problem. Now, the workarounds are disgusting: doing an "uname()" to find out the architecture of the kernel, and then comparing it for the 64-bit cases and fixing up the size of the read() in automount for those. And they were confused: it's not actually a generic 64-bit issue at all, it's very much tied to just x86-64, which has different alignment for an 'u64' in 64-bit mode than in 32-bit mode. But the end result is that fixing the compat layer actually breaks the case of a 32-bit automount on a x86-64 kernel. There are various approaches to fix this (including just doing a "strcmp()" on current->comm and comparing it to "automount"), but I think that I will do the one that teaches pipes about a special "packet mode", which will allow user space to not have to care too deeply about the padding at the end of the autofs packet. That change will make the compat workaround unnecessary, so let's revert it first, and get automount working again in compat mode. The packetized pipes will then fix autofs for systemd. Reported-and-requested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Ian Kent <raven@themaw.net> Cc: stable@kernel.org # for 3.3 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-27Merge git://git.samba.org/sfrench/cifs-2.6Linus Torvalds3-11/+16
Pull CIFS fixes from Steve French. * git://git.samba.org/sfrench/cifs-2.6: Use correct conversion specifiers in cifs_show_options CIFS: Show backupuid/gid in /proc/mounts cifs: fix offset handling in cifs_iovec_write
2012-04-27Btrfs: reduce lock contention during extent insertionChris Mason1-2/+7
We're spending huge amounts of time on lock contention during end_io processing because we unconditionally assume we are overwriting an existing extent in the file for each IO. This checks to see if we are outside i_size, and if so, it uses a less expensive readonly search of the btree to look for existing extents. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdirChris Mason1-29/+1
Btrfs has an optimization where it will preallocate dentries during readdir to fill in enough information to open the inode without an extra lookup. But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and that leads to deadlocks because our readdir code has tree locks held. For now, disable this optimization. We'll fix the gfp mask in the next merge window. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27NFS: Remove unused function nfs_lookup_with_sec()Bryan Schumaker1-62/+0
This fixes a compiler warning. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Honor the authflavor set in the clone mount dataBryan Schumaker4-7/+8
The authflavor is set in an nfs_clone_mount structure and passed to the xdev_mount() functions where it was promptly ignored. Instead, use it to initialize an rpc_clnt for the cloned server. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Fix following referral mount points with different securityBryan Schumaker5-26/+72
I create a new proc_lookup_mountpoint() to use when submounting an NFS v4 share. This function returns an rpc_clnt to use for performing an fs_locations() call on a referral's mountpoint. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Do secinfo as part of lookupBryan Schumaker5-20/+103
Whenever lookup sees wrongsec do a secinfo and retry the lookup to find attributes of the file or directory, such as "is this a referral mountpoint?". This also allows me to remove handling -NFS4ERR_WRONSEC as part of getattr xdr decoding. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Handle exceptions coming out of nfs4_proc_fs_locations()Bryan Schumaker1-1/+14
We don't want to return -NFS4ERR_WRONGSEC to the VFS because it could cause the kernel to oops. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Fix SECINFO_NO_NAMEBryan Schumaker1-5/+19
I was using the same decoder function for SECINFO and SECINFO_NO_NAME, so it was returning an error when it tried to decode an OP_SECINFO_NO_NAME header as OP_SECINFO. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27Avoid beyond bounds copy while caching ACLSachin Prabhu2-8/+6
When attempting to cache ACLs returned from the server, if the bitmap size + the ACL size is greater than a PAGE_SIZE but the ACL size itself is smaller than a PAGE_SIZE, we can read past the buffer page boundary. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27Btrfs: Fix space checking during fs resizeDaniel J Blueman1-1/+1
Fix out-of-space checking, addressing a warning and potential resource leak when resizing the filesystem down while allocating blocks. Signed-off-by: Daniel J Blueman <daniel@quora.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix block_rsv and space_info lock orderingStefan Behrens1-2/+2
may_commit_transaction() calls spin_lock(&space_info->lock); spin_lock(&delayed_rsv->lock); and update_global_block_rsv() calls spin_lock(&block_rsv->lock); spin_lock(&sinfo->lock); Lockdep complains about this at run time. Everywhere except in update_global_block_rsv(), the space_info lock is the outer lock, therefore the locking order in update_global_block_rsv() is changed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: Prevent root_list corruptionDaniel J Blueman1-0/+2
I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add correct locking to address this. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix repair code for RAID10Jan Schmidt1-1/+2
btrfs_map_block sets mirror_num, so that the repair code knows eventually which device gave us the read error. For RAID10, mirror_num must be 1 or 2. Before this fix mirror_num was incorrectly related to our stripe index. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: do not start delalloc inodes during syncJosef Bacik1-1/+0
btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and start writing them out, but it doesn't splice the list or anything so as long as somebody is doing work on the box you could end up in this section _forever_. So just remove it, it's not needed anyway since sync will start writeback on all inodes anyway, all we need to do is wait for ordered extents and then we can commit the transaction. In my horrible torture test sync goes from taking 4 minutes to about 1.5 minutes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Avoid reading past buffer when calling GETACLSachin Prabhu2-13/+21
Bug noticed in commit bf118a342f10dafe44b14451a1392c3254629a1f When calling GETACL, if the size of the bitmap array, the length attribute and the acl returned by the server is greater than the allocated buffer(args.acl_len), we can Oops with a General Protection fault at _copy_from_pages() when we attempt to read past the pages allocated. This patch allocates an extra PAGE for the bitmap and checks to see that the bitmap + attribute_length + ACLs don't exceed the buffer space allocated to it. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reported-by: Jian Li <jiali@redhat.com> [Trond: Fixed a size_t vs unsigned int printk() warning] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds4-5/+4
Merge fixes from Andrew Morton: "13 fixes. The acerhdf patches aren't (really) fixes. But they've been stuck in my tree for up to two years, sent to Matthew multiple times and the developers are unhappy." * emailed from Andrew Morton <akpm@linux-foundation.org>: (13 patches) mm: fix NULL ptr dereference in move_pages mm: fix NULL ptr dereference in migrate_pages revert "proc: clear_refs: do not clear reserved pages" drivers/rtc/rtc-ds1307.c: fix BUG shown with lock debugging enabled arch/arm/mach-ux500/mbox-db5500.c: world-writable sysfs fifo file hugetlbfs: lockdep annotate root inode properly acerhdf: lowered default temp fanon/fanoff values acerhdf: add support for new hardware acerhdf: add support for Aspire 1410 BIOS v1.3314 fs/buffer.c: remove BUG() in possible but rare condition mm: fix up the vmscan stat in vmstat epoll: clear the tfile_check_list on -ELOOP mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma
2012-04-26fix page number calculation bug for block layout decode bufferJim Rees1-1/+3
Signed-off-by: Jim Rees <rees@umich.edu> Suggested-by: Andy Adamson <andros@netapp.com> Suggested-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26NFSv4.1 fix page number calculation bug for filelayout decode buffersAndy Adamson2-2/+2
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26pnfs-obj: Remove unused variable from objlayout_get_deviceinfo()Sachin Bhamare1-2/+0
Local variable 'sb' was not being used in objlayout_get_deviceinfo(). Signed-off-by: Sachin Bhamare <sbhamare@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26nfs4: fix referrals on mounts that use IPv6 addrsWeston Andros Adamson1-3/+27
All referrals (IPv4 addr, IPv6 addr, and DNS) are broken on mounts of IPv6 addresses, because validation code uses a path that is parsed from the dev_name ("<server>:<path>") by splitting on the first colon and colons are used in IPv6 addrs. This patch ignores colons within IPv6 addresses that are escaped by '[' and ']'. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-25Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds8-31/+73
Pull NFS client bugfixes from Trond Myklebust: - Fix NFSv4 infinite loops on open(O_TRUNC) - Fix an Oops and an infinite loop in the NFSv4 flock code - Don't register the PipeFS filesystem until it has been set up - Fix an Oops in nfs_try_to_update_request - Don't reuse NFSv4 open owners: fixes a bad sequence id storm. * tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Keep dropped state owners on the LRU list for a while NFSv4: Ensure that we don't drop a state owner more than once NFSv4: Ensure we do not reuse open owner names nfs: Enclose hostname in brackets when needed in nfs_do_root_mount NFS: put open context on error in nfs_flush_multi NFS: put open context on error in nfs_pagein_multi NFSv4: Fix open(O_TRUNC) and ftruncate() error handling NFSv4: Ensure that we check lock exclusive/shared type against open modes NFSv4: Ensure that the LOCK code sets exception->inode NFS: check for req==NULL in nfs_try_to_update_request cleanup SUNRPC: register PipeFS file system after pernet sybsystem
2012-04-25revert "proc: clear_refs: do not clear reserved pages"Will Deacon1-3/+0
Revert commit 85e72aa5384 ("proc: clear_refs: do not clear reserved pages"), which was a quick fix suitable for -stable until ARM had been moved over to the gate_vma mechanism: https://lkml.org/lkml/2012/1/14/55 With commit f9d4861f ("ARM: 7294/1: vectors: use gate_vma for vectors user mapping"), ARM does now use the gate_vma, so the PageReserved check can be removed from the proc code. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Nicolas Pitre <nico@linaro.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-25hugetlbfs: lockdep annotate root inode properlyAneesh Kumar K.V1-0/+1
This fixes the below reported false lockdep warning. e096d0c7e2e4 ("lockdep: Add helper function for dir vs file i_mutex annotation") added a similar annotation for every other inode in hugetlbfs but missed the root inode because it was allocated by a separate function. For HugeTLB fs we allow taking i_mutex in mmap. HugeTLB fs doesn't support file write and its file read callback is modified in a05b0855fd ("hugetlbfs: avoid taking i_mutex from hugetlbfs_read()") to not take i_mutex. Hence for HugeTLB fs with regular files we really don't take i_mutex with mmap_sem held. ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc1+ #322 Not tainted ------------------------------------------------------- bash/1572 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff810f1618>] might_fault+0x40/0x90 but task is already holding lock: (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#12){+.+.+.}: [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa [<ffffffff816a2f5e>] __mutex_lock_common+0x48/0x350 [<ffffffff816a3325>] mutex_lock_nested+0x2a/0x31 [<ffffffff811fb8e1>] hugetlbfs_file_mmap+0x7d/0x104 [<ffffffff810f859a>] mmap_region+0x272/0x47d [<ffffffff810f8a39>] do_mmap_pgoff+0x294/0x2ee [<ffffffff810f8b65>] sys_mmap_pgoff+0xd2/0x10e [<ffffffff8103d19e>] sys_mmap+0x1d/0x1f [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75 [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa [<ffffffff810f1645>] might_fault+0x6d/0x90 [<ffffffff81125d62>] filldir+0x6a/0xc2 [<ffffffff81133a83>] dcache_readdir+0x5c/0x222 [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8 [<ffffffff811260b6>] sys_getdents+0x79/0xc9 [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#12); lock(&mm->mmap_sem); lock(&sb->s_type->i_mutex_key#12); lock(&mm->mmap_sem); *** DEADLOCK *** 1 lock held by bash/1572: #0: (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8 stack backtrace: Pid: 1572, comm: bash Not tainted 3.4.0-rc1+ #322 Call Trace: [<ffffffff81699a3c>] print_circular_bug+0x1f8/0x209 [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75 [<ffffffff810f38aa>] ? handle_pte_fault+0x5ff/0x614 [<ffffffff8109e622>] ? mark_lock+0x2d/0x258 [<ffffffff810f1618>] ? might_fault+0x40/0x90 [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa [<ffffffff810f1618>] ? might_fault+0x40/0x90 [<ffffffff816a3249>] ? __mutex_lock_common+0x333/0x350 [<ffffffff810f1645>] might_fault+0x6d/0x90 [<ffffffff810f1618>] ? might_fault+0x40/0x90 [<ffffffff81125d62>] filldir+0x6a/0xc2 [<ffffffff81133a83>] dcache_readdir+0x5c/0x222 [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74 [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74 [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74 [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8 [<ffffffff811260b6>] sys_getdents+0x79/0xc9 [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-25fs/buffer.c: remove BUG() in possible but rare conditionGlauber Costa1-1/+0
While stressing the kernel with with failing allocations today, I hit the following chain of events: alloc_page_buffers(): bh = alloc_buffer_head(GFP_NOFS); if (!bh) goto no_grow; <= path taken grow_dev_page(): bh = alloc_page_buffers(page, size, 0); if (!bh) goto failed; <= taken, consequence of the above and then the failed path BUG()s the kernel. The failure is inserted a litte bit artificially, but even then, I see no reason why it should be deemed impossible in a real box. Even though this is not a condition that we expect to see around every time, failed allocations are expected to be handled, and BUG() sounds just too much. As a matter of fact, grow_dev_page() can return NULL just fine in other circumstances, so I propose we just remove it, then. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-25epoll: clear the tfile_check_list on -ELOOPJason Baron1-1/+3
An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent circular epoll dependencies from being created. However, in that case we do not properly clear the 'tfile_check_list'. Thus, add a call to clear_tfile_check_list() for the -ELOOP case. Signed-off-by: Jason Baron <jbaron@redhat.com> Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Davide Libenzi <davidel@xmailserver.org> Tested-by: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-24Use correct conversion specifiers in cifs_show_optionsSachin Prabhu1-4/+4
cifs_show_options uses the wrong conversion specifier for uid, gid, rsize & wsize. Correct this to %u to match it to the variable type 'unsigned integer'. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-24CIFS: Show backupuid/gid in /proc/mountsSachin Prabhu2-6/+10
Show backupuid/backupgid in /proc/mounts for cifs shares mounted with the backupuid/backupgid feature. Also consolidate the two separate checks for pvolume_info->backupuid_specified into a single if condition in cifs_setup_cifs_sb(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-24GFS2: Instruct DLM to avoid queue convert slowdownBob Peterson1-3/+7
This patch instructs DLM to prevent an "in place" conversion, where the lock just stays on the granted queue, and instead forces the conversion to the back of the convert queue. This is done on upward conversions only. This is useful in cases where, for example, a lock is frequently needed in PR on one node, but another node needs it temporarily in EX to update it. This may happen, for example, when the rindex is being updated by gfs2_grow. The gfs2_grow needs to have the lock in EX, but the other nodes need to re-read it to retrieve the updates. The glock is already granted in PR on the non-growing nodes, so this prevents them from continually re-granting the lock in PR, and forces the EX from gfs2_grow to go through. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-23Merge tag 'ext4_for_linus' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "These are two low-risk bug fixes for ext4, fixing a compile warning and a potential deadlock." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: super.c: unused variable warning without CONFIG_QUOTA jbd2: use GFP_NOFS for blkdev_issue_flush
2012-04-23super.c: unused variable warning without CONFIG_QUOTAEldad Zack1-0/+2
sb info is only checked with quota support. fs/ext4/super.c: In function ‘parse_options’: fs/ext4/super.c:1600:23: warning: unused variable ‘sbi’ [-Wunused-variable] Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-04-23jbd2: use GFP_NOFS for blkdev_issue_flushShaohua Li1-2/+2
flush request is issued in transaction commit code path, so looks using GFP_KERNEL to allocate memory for flush request bio falls into the classic deadlock issue. I saw btrfs and dm get it right, but ext4, xfs and md are using GFP. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
2012-04-23Merge tag 'dlm-fixes-3.4' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm fixes from David Teigland: "This includes one short patch fixing the behavior of the QUECVT flag, which the gfs2 folks are waiting on." * tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix QUECVT when convert queue is empty
2012-04-23dlm: fix QUECVT when convert queue is emptyDavid Teigland1-0/+12
The QUECVT flag should not prevent conversions from being granted immediately when the convert queue is empty. Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-21NFSv4: Keep dropped state owners on the LRU list for a whileTrond Myklebust1-9/+11
To ensure that we don't reuse their identifiers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-21NFSv4: Ensure that we don't drop a state owner more than onceTrond Myklebust1-3/+7
Retest the RB_EMPTY_NODE() condition under the spin lock to ensure that we don't call rb_erase() more than once on the same state owner. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-21kill mm argument of vm_munmap()Al Viro1-1/+1
it's always current->mm Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-21aio: don't bother with unmapping when aio_free_ring() is coming from exit_aio()Al Viro1-1/+14
... since exit_mmap() is coming and it will munmap() everything anyway. In all other cases aio_free_ring() has ctx->mm == current->mm; moreover, all other callers of vm_munmap() have mm == current->mm, so this will allow us to get rid of mm argument of vm_munmap(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-20NFSv4: Ensure we do not reuse open owner namesTrond Myklebust4-7/+10
The NFSv4 spec is ambiguous about whether or not it is permissible to reuse open owner names, so play it safe. This patch adds a timestamp to the state_owner structure, and combines that with the IDA based uniquifier. Fixes a regression whereby the Linux server returns NFS4ERR_BAD_SEQID. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-20VM: add "vm_mmap()" helper functionLinus Torvalds5-47/+15
This continues the theme started with vm_brk() and vm_munmap(): vm_mmap() does the same thing as do_mmap(), but additionally does the required VM locking. This uninlines (and rewrites it to be clearer) do_mmap(), which sadly duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have to export our internal do_mmap_pgoff() function. Some day we hopefully don't have to export do_mmap() either, if all modular users can become the simpler vm_mmap() instead. We're actually very close to that already, with the notable exception of the (broken) use in i810, and a couple of stragglers in binfmt_elf. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>