Age | Commit message (Collapse) | Author | Files | Lines |
|
If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID
then there is no recovery possible. Just quit without returning an error.
Also, note that the client must not assume that the NFSv4 lease has been
renewed when it sees an error on DELEGRETURN.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
|
|
When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session
draining is off, but DELEGRETURN can still get a session error.
The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and
the DELEGRETURN done then restarts the RPC task in the prepare state.
With the state manager still processing the NFS4CLNT_DELEGRETURN flag with
session draining off, these DELEGRETURNs will cycle with errors filling up the
session slots.
This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the
NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the
state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task
as seen in this kernel thread dump:
kernel: 4.12.32.53-ma D 0000000000000000 0 3393 2 0x00000000
kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140
kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001
kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058
kernel: Call Trace:
kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc]
kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90
kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50
kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs]
kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs]
kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs]
kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs]
kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs]
kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs]
kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30
kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs]
kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs]
kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs]
kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0
kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
The state manager can not therefore process the DELEGRETURN session errors.
Change the async handler to wait for recovery on session errors.
Signed-off-by: Andy Adamson <andros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Fix the following warning:
linux-nfs/fs/nfs/inode.c:315:1: warning: ‘inline’ is not at
beginning of declaration [-Wold-style-declaration]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
When CONFIG_NFS_V4_2 is toggled nfsd and lockd will be recompiled,
instead of only the nfs client. This patch moves a small amount of code
into the client directory to avoid unnecessary recompiles.
Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Pull NFS client bugfixes:
- Stable fix for data corruption when retransmitting O_DIRECT writes
- Stable fix for a deep recursion/stack overflow bug in rpc_release_client
- Stable fix for infinite looping when mounting a NFSv4.x volume
- Fix a typo in the nfs mount option parser
- Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is
* tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix pnfs Kconfig defaults
NFS: correctly report misuse of "migration" mount option.
nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once
SUNRPC: Avoid deep recursion in rpc_release_client
SUNRPC: Fix a data corruption issue when retransmitting RPC calls
|
|
Defaulting to m seem to prevent building the pnfs layout modules into the
kernel. Default to the value of CONFIG_NFS_V4 make sure they are
built in for built-in NFSv4 support and modular for a modular NFSv4.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The current test on valid use of the "migration" mount option can never
report an error as it will only do so if
mnt->version !=4 && mnt->minor_version != 0
(and some other condition), but if that test would succeed, then the previous
test has already gone-to out_minorversion_mismatch.
So change the && to an || to get correct semantics.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
There is no guarantee that doing so will work however, so we can end up
retrying the call in an infinite loop.
Worse yet, we create the new client using rpc_clone_client_set_auth,
which creates the new client as a child of the old one. Thus, we can end
up with a *very* long lineage of rpc_clnts. When we go to put all of the
references to them, we can end up with a long call chain that can smash
the stack as each rpc_free_client() call can recurse back into itself.
This patch fixes this by simply ensuring that the SETCLIENTID call will
only be retried in this situation if the last attempt did not use
RPC_AUTH_UNIX.
Note too that with this change, we don't need the (i > 2) check in the
-EACCES case since we now have a more reliable test as to whether we
should reattempt.
Cc: stable@vger.kernel.org # v3.10+
Cc: Chuck Lever <chuck.lever@oracle.com>
Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
|
|
We already check for nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL)
in nfs4_label_alloc()
We check the minor version in _nfs4_server_capabilities before setting
NFS_CAP_SECURITY_LABEL.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We don't want to be setting capabilities and/or requesting attributes
that are not appropriate for the NFSv4 minor version.
- Ensure that we clear the NFS_CAP_SECURITY_LABEL capability when appropriate
- Ensure that we limit the attribute bitmasks to the mounted_on_fileid
attribute and less for NFSv4.0
- Ensure that we limit the attribute bitmasks to suppattr_exclcreat and
less for NFSv4.1
- Ensure that we limit it to change_sec_label or less for NFSv4.2
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Currently, if the server is doing NFSv4.2 and supports labeled NFS, then
our on-the-wire READDIR request ends up asking for the label information,
which is then ignored unless we're doing readdirplus.
This patch ensures that READDIR doesn't ask the server for label information
at all unless the readdir->bitmask contains the FATTR4_WORD2_SECURITY_LABEL
attribute, and the readdir->plus flag is set.
While we're at it, optimise away the 3rd bitmap field if it is zero.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Currently, we fetch the security label when revalidating an inode's
attributes, but don't apply it. This is in contrast to the readdir()
codepath where we do apply label changes.
Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Ensure that _nfs4_do_get_security_label() also initialises the
SEQUENCE call correctly, by having it call into nfs4_call_sync().
Reported-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Chao reported the following oops when testing labeled NFS:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
PGD 277bbd067 PUD 2777ea067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache sg coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul iTCO_wdt glue_helper ablk_helper cryptd iTCO_vendor_support bnx2 pcspkr serio_raw i7core_edac cdc_ether microcode usbnet edac_core mii lpc_ich i2c_i801 mfd_core shpchp ioatdma dca acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm ata_piix libata megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 4 PID: 25657 Comm: chcon Not tainted 3.10.0-33.el7.x86_64 #1
Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784 , BIOS -[D6E150CUS-1.11]- 02/08/2011
task: ffff880178397220 ti: ffff8801595d2000 task.ti: ffff8801595d2000
RIP: 0010:[<ffffffffa0568703>] [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
RSP: 0018:ffff8801595d3888 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff8801595d3b30 RCX: 0000000000000b4c
RDX: ffff8801595d3b30 RSI: ffff8801595d38e0 RDI: ffff880278b6ec00
RBP: ffff8801595d38c8 R08: ffff8801595d3b30 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801595d38e0
R13: ffff880277a4a780 R14: ffffffffa05686c0 R15: ffff8802765f206c
FS: 00007f2c68486800(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000027651a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 ffff880277865800 ffff880278b6ec00 ffff880277a4a780
ffff8801595d3948 ffffffffa02ad926 ffff8801595d3b30 ffff8802765f206c
Call Trace:
[<ffffffffa02ad926>] rpcauth_wrap_req+0x86/0xd0 [sunrpc]
[<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
[<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
[<ffffffffa02a1ecb>] call_transmit+0x18b/0x290 [sunrpc]
[<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
[<ffffffffa02aae14>] __rpc_execute+0x84/0x400 [sunrpc]
[<ffffffffa02ac40e>] rpc_execute+0x5e/0xa0 [sunrpc]
[<ffffffffa02a2ea0>] rpc_run_task+0x70/0x90 [sunrpc]
[<ffffffffa02a2f03>] rpc_call_sync+0x43/0xa0 [sunrpc]
[<ffffffffa055284d>] _nfs4_do_set_security_label+0x11d/0x170 [nfsv4]
[<ffffffffa0558861>] nfs4_set_security_label.isra.69+0xf1/0x1d0 [nfsv4]
[<ffffffff815fca8b>] ? avc_alloc_node+0x24/0x125
[<ffffffff815fcd2f>] ? avc_compute_av+0x1a3/0x1b5
[<ffffffffa055897b>] nfs4_xattr_set_nfs4_label+0x3b/0x50 [nfsv4]
[<ffffffff811bc772>] generic_setxattr+0x62/0x80
[<ffffffff811bcfc3>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff811bd1c5>] vfs_setxattr+0xb5/0xc0
[<ffffffff811bd2fe>] setxattr+0x12e/0x1c0
[<ffffffff811a4d22>] ? final_putname+0x22/0x50
[<ffffffff811a4f2b>] ? putname+0x2b/0x40
[<ffffffff811aa1cf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff8119bc29>] ? __sb_start_write+0x49/0x100
[<ffffffff811bd66f>] SyS_lsetxattr+0x8f/0xd0
[<ffffffff8160cf99>] system_call_fastpath+0x16/0x1b
Code: 48 8b 02 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 d0 00 00 00 00 48 c7 45 d8 00 00 00 00 48 c7 45 e0 00 00 00 00 <48> 8b 00 48 8b 00 48 85 c0 0f 84 ae 00 00 00 48 8b 80 b8 03 00
RIP [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
RSP <ffff8801595d3888>
CR2: 0000000000000000
The problem is that _nfs4_do_set_security_label calls rpc_call_sync()
directly which fails to do any setup of the SEQUENCE call. Have it use
nfs4_call_sync() instead which does the right thing. While we're at it
change the name of "args" to "arg" to better match the pattern in
_nfs4_do_setattr.
Reported-by: Chao Ye <cye@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
commit 6686390bab6a0e0 (NFS: remove incorrect "Lock reclaim failed!"
warning.) added a test for a delegation before checking to see if any
reclaimed locks failed. The test however is backward and is only doing
that check when a delegation is held instead of when one isn't.
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning.
Cc: stable@vger.kernel.org # 3.12
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next
Pull fs-cache fixes from David Howells:
Can you pull these commits to fix an issue with NFS whereby caching can be
enabled on a file that is open for writing by subsequently opening it for
reading. This can be made to crash by opening it for writing again if you're
quick enough.
The gist of the patchset is that the cookie should be acquired at inode
creation only and subsequently enabled and disabled as appropriate (which
dispenses with the backing objects when they're not needed).
The extra synchronisation that NFS does can then be dispensed with as it is
thenceforth managed by FS-Cache.
Could you send these on to Linus?
This likely will need fixing also in CIFS and 9P also once the FS-Cache
changes are upstream. AFS and Ceph are probably safe.
* 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
FS-Cache: Provide the ability to enable/disable cookies
FS-Cache: Add use/unuse/wake cookie wrappers
|
|
This check was added by Al Viro with
d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f "nfs d_revalidate() is too
trigger-happy with d_drop()", with the explanation that we don't want to
remove the root of a disconnected tree, which will still be included on
the s_anon list.
But DCACHE_DISCONNECTED does *not* actually identify dentries that are
disconnected from the dentry tree or hashed on s_anon. IS_ROOT() is the
way to do that.
Also add a comment from Al's commit to remind us why this check is
there.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
the 'error' variable was been assigned twice in vain.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This patch adds support for multiple security options which can be
specified using a colon-delimited list of security flavors (the same
syntax as nfsd's exports file).
This is useful, for instance, when NFSv4.x mounts cross SECINFO
boundaries. With this patch a user can use "sec=krb5i,krb5p"
to mount a remote filesystem using krb5i, but can still cross
into krb5p-only exports.
New mounts will try all security options before failing. NFSv4.x
SECINFO results will be compared against the sec= flavors to
find the first flavor in both lists or if no match is found will
return -EPERM.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Since the parsed sec= flavor is now stored in nfs_server->auth_info,
we no longer need an nfs_server flag to determine if a sec= option was
used.
This flag has not been completely removed because it is still needed for
the (old but still supported) non-text parsed mount options ABI
compatability.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Cache the auth_info structure in nfs_server and pass these values to submounts.
This lays the groundwork for supporting multiple sec= options.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
When filling parsed_mount_data, store the parsed sec= mount option in
the new struct nfs_auth_info and the chosen flavor in selected_flavor.
This patch lays the groundwork for supporting multiple sec= options.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
It's not used outside of nfs4namespace.c anymore.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
In nfs4_wait_clnt_recover(), hold a reference to the clp being
waited on. The state manager can reduce clp->cl_count to 1, in
which case the nfs_put_client() in nfs4_run_state_manager() can
free *clp before wait_on_bit() returns and allows
nfs4_wait_clnt_recover() to run again.
The behavior at that point is non-deterministic. If the waited-on
bit still happens to be zero, wait_on_bit() will wake the waiter as
expected. If the bit is set again (say, if the memory was poisoned
when freed) wait_on_bit() can leave the waiter asleep.
This is a narrow fix which ensures the safety of accessing *clp in
nfs4_wait_clnt_recover(), but does not address the continued use
of a possibly freed *clp after nfs4_wait_clnt_recover() returns
(see nfs_end_delegation_return(), for example).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Broadly speaking, v4.1 migration is untested. There are no servers
in the wild that support NFSv4.1 migration. However, as server
implementations become available, we do want to enable testing by
developers, while leaving it disabled for environments for which
broken migration support would be an unpleasant surprise.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease
moved" condition is reported differently than it is in NFSv4.0.
NFSv4 minor version 0 servers return an error status code,
NFS4ERR_LEASE_MOVED, to signal that a lease has moved. This error
causes the whole compound operation to fail. Normal compounds
against this server continue to fail until the client performs
migration recovery on the migrated share.
Minor version 1 and later servers assert a bit flag in the reply to
a compound's SEQUENCE operation to signal LEASE_MOVED. This is not
a fatal condition: operations against this server continue normally.
The server asserts this flag until the client performs migration
recovery on the migrated share.
Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4
clients not using NFSv4.0.
After the server asserts any of the sr_status_flags in the SEQUENCE
operation in a typical compound, our client initiates standard lease
recovery. For NFSv4.1+, a stand-alone SEQUENCE operation is
performed to discover what recovery is needed.
If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE
operation, our client attempts to discover which FSIDs have been
migrated, and then performs migration recovery on each.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
With NFSv4 minor version 0, the asynchronous lease RENEW
heartbeat can return NFS4ERR_LEASE_MOVED. Error recovery logic for
async RENEW is a separate code path from the generic NFS proc paths,
so it must be updated to handle NFS4ERR_LEASE_MOVED as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Currently the Linux NFS client ignores the operation status code for
the RELEASE_LOCKOWNER operation. Like NFSv3's UMNT operation,
RELEASE_LOCKOWNER is a courtesy to help servers manage their
resources, and the outcome is not consequential for the client.
During a migration, a server may report NFS4ERR_LEASE_MOVED, in
which case the client really should retry, since typically
LEASE_MOVED has nothing to do with the current operation, but does
prevent it from going forward.
Also, it's important for a client to respond as soon as possible to
a moved lease condition, since the client's lease could expire on
the destination without further action by the client.
NFS4ERR_DELAY is not included in the list of valid status codes for
RELEASE_LOCKOWNER in RFC 3530bis. However, rfc3530-migration-update
does permit migration-capable servers to return DELAY to clients,
but only in the context of an ongoing migration. In this case the
server has frozen lock state in preparation for migration, and a
client retry would help the destination server purge unneeded state
once migration recovery is complete.
Interestly, NFS4ERR_MOVED is not valid for RELEASE_LOCKOWNER, even
though lock owners can be migrated with Transparent State Migration.
Note that RFC 3530bis section 9.5 includes RELEASE_LOCKOWNER in the
list of operations that renew a client's lease on the server if they
succeed. Now that our client pays attention to the operation's
status code, we can note that renewal appropriately.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Trigger lease-moved recovery when a request returns
NFS4ERR_LEASE_MOVED.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
A migration on the FSID in play for the current NFS operation
is reported via the error status code NFS4ERR_MOVED.
"Lease moved" means that a migration has occurred on some other
FSID than the one for the current operation. It's a signal that
the client should take action immediately to handle a migration
that it may not have noticed otherwise. This is so that the
client's lease does not expire unnoticed on the destination server.
In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED
error status code.
To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server
to see if it is still present. Invoke nfs4_try_migration() if the
FSID is no longer present on the server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Introduce a mechanism for probing a server to determine if an FSID
is present or absent.
The on-the-wire compound is different between minor version 0 and 1.
Minor version 0 appends a RENEW operation to identify which client
ID is probing. Minor version 1 has a SEQUENCE operation in the
compound which effectively carries the same information.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
When a server returns NFS4ERR_MOVED during a delegation recall,
trigger the new migration recovery logic in the state manager.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
When a server returns NFS4ERR_MOVED, trigger the new migration
recovery logic in the state manager.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
I'm going to use this exit label also for migration recovery
failures.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Migration recovery and state recovery must be serialized, so handle
both in the state manager thread.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
NFS_SB() returns the pointer to an nfs_server struct, given a
pointer to a super_block. But we have no way to go back the other
way.
Add a super_block backpointer field so that, given an nfs_server
struct, it is easy to get to the filesystem's root dentry.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The nfs4_proc_fs_locations() function is invoked during referral
processing to perform a GETATTR(fs_locations) on an object's parent
directory in order to discover the target of the referral. It
performs a LOOKUP in the compound, so the client needs to know the
parent's file handle a priori.
Unfortunately this function is not adequate for handling migration
recovery. We need to probe fs_locations information on an FSID, but
there's no parent directory available for many operations that
can return NFS4ERR_MOVED.
Another subtlety: recovering from NFS4ERR_LEASE_MOVED is a process
of walking over a list of known FSIDs that reside on the server, and
probing whether they have migrated. Once the server has detected
that the client has probed all migrated file systems, it stops
returning NFS4ERR_LEASE_MOVED.
A minor version zero server needs to know what client ID is
requesting fs_locations information so it can clear the flag that
forces it to continue returning NFS4ERR_LEASE_MOVED. This flag is
set per client ID and per FSID. However, the client ID is not an
argument of either the PUTFH or GETATTR operations. Later minor
versions have client ID information embedded in the compound's
SEQUENCE operation.
Therefore, by convention, minor version zero clients send a RENEW
operation in the same compound as the GETATTR(fs_locations), since
RENEW's one argument is a clientid4. This allows a minor version
zero server to identify correctly the client that is probing for a
migration.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Allow code in nfsv4.ko to use _nfs_display_fhandle().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The differences between minor version 0 and minor version 1
migration will be abstracted by the addition of a set of migration
recovery ops.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Introduce functions that can walk through an array of returned
fs_locations information and connect a transport to one of the
destination servers listed therein.
Note that NFS minor version 1 introduces "fs_locations_info" which
extends the locations array sorting criteria available to clients.
This is not supported yet.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
New function nfs4_update_server() moves an nfs_server to a different
nfs_client. This is done as part of migration recovery.
Though it may be appealing to think of them as the same thing,
migration recovery is not the same as following a referral.
For a referral, the client has not descended into the file system
yet: it has no nfs_server, no super block, no inodes or open state.
It is enough to simply instantiate the nfs_server and super block,
and perform a referral mount.
For a migration, however, we have all of those things already, and
they have to be moved to a different nfs_client. No local namespace
changes are needed here.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state
and can safely skip being reprocessed, but must still call update_open_stateid
to make sure that all active fmodes are recovered.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 3.7.x: f494a6071d3: NFSv4: fix NULL dereference
Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missin
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Currently, if the call to nfs_refresh_inode fails, then we end up leaking
a reference count, due to the call to nfs4_get_open_state.
While we're at it, replace nfs4_get_open_state with a simple call to
atomic_inc(); there is no need to do a full lookup of the struct nfs_state
since it is passed as an argument in the struct nfs4_opendata, and
is already assigned to the variable 'state'.
Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missing
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This is an unneeded check that could cause the client to fail to recover
opens.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
_nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached
open CLAIM_PREVIOUS, but this can happen. An example is when there are
RDWR openers and RDONLY openers on a delegation stateid. The recovery
path will first try an open CLAIM_PREVIOUS for the RDWR openers, this
marks the delegation as not needing RECLAIM anymore, so the open
CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc.
The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state
returning PTR_ERR(rpc_status) when !rpc_done. When the open is
cached, rpc_done == 0 and rpc_status == 0, thus
_nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected
by callers of nfs4_opendata_to_nfs4_state().
This can be reproduced easily by opening the same file two times on an
NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then
sleeping for a long time. While the files are held open, kick off state
recovery and this NULL dereference will be hit every time.
An example OOPS:
[ 65.003602] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000030
[ 65.005312] IP: [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.006820] PGD 7b0ea067 PUD 791ff067 PMD 0
[ 65.008075] Oops: 0000 [#1] SMP
[ 65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache
snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd
_seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc
lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio
_raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw
_vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr
m mptscsih mptbase i2c_core
[ 65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19
.x86_64 #1
[ 65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 07/31/2013
[ 65.022012] task: ffff88003707e320 ti: ffff88007b906000 task.ti: ffff88007b906000
[ 65.023414] RIP: 0010:[<ffffffffa037d6ee>] [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.025079] RSP: 0018:ffff88007b907d10 EFLAGS: 00010246
[ 65.026042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 65.027321] RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000000
[ 65.028691] RBP: ffff88007b907d38 R08: 0000000000016f60 R09: 0000000000000000
[ 65.029990] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 65.031295] R13: 0000000000000050 R14: 0000000000000000 R15: 0000000000000001
[ 65.032527] FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
[ 65.033981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 65.035177] CR2: 0000000000000030 CR3: 000000007b27f000 CR4: 00000000000407f0
[ 65.036568] Stack:
[ 65.037011] 0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220
[ 65.038472] ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80
[ 65.039935] ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40
[ 65.041468] Call Trace:
[ 65.042050] [<ffffffffa037e4a5>] nfs4_close_state+0x15/0x20 [nfsv4]
[ 65.043209] [<ffffffffa036a6c8>] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4]
[ 65.044529] [<ffffffffa036a886>] nfs4_open_recover+0x116/0x150 [nfsv4]
[ 65.045730] [<ffffffffa036d98d>] nfs4_open_reclaim+0xad/0x150 [nfsv4]
[ 65.046905] [<ffffffffa037d979>] nfs4_do_reclaim+0x149/0x5f0 [nfsv4]
[ 65.048071] [<ffffffffa037e1dc>] nfs4_run_state_manager+0x3bc/0x670 [nfsv4]
[ 65.049436] [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[ 65.050686] [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[ 65.051943] [<ffffffff81088640>] kthread+0xc0/0xd0
[ 65.052831] [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[ 65.054697] [<ffffffff8165686c>] ret_from_fork+0x7c/0xb0
[ 65.056396] [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[ 65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44
[ 65.065225] RIP [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[ 65.067175] RSP <ffff88007b907d10>
[ 65.068570] CR2: 0000000000000030
[ 65.070098] ---[ end trace 0d1fe4f5c7dd6f8b ]---
Cc: <stable@vger.kernel.org> #3.7+
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|