summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-05-02pNFS: Fix a typo in pnfs_generic_alloc_ds_commitsTrond Myklebust1-1/+1
If the layout segment is invalid, we want to just resend the remaining writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02pNFS: Fix a deadlock when coalescing writes and returning the layoutTrond Myklebust1-2/+0
Consider the following deadlock: Process P1 Process P2 Process P3 ========== ========== ========== lock_page(page) lseg = pnfs_update_layout(inode) lo = NFS_I(inode)->layout pnfs_error_mark_layout_for_return(lo) lock_page(page) lseg = pnfs_update_layout(inode) In this scenario, - P1 has declared the layout to be in error, but P2 holds a reference to a layout segment on that inode, so the layoutreturn is deferred. - P2 is waiting for a page lock held by P3. - P3 is asking for a new layout segment, but is blocked waiting for the layoutreturn. The fix is to ensure that pnfs_error_mark_layout_for_return() does not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow the latter to call LAYOUTGET so that it can make progress and unblock P2. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02pNFS: Don't clear the layout return info if there are segments to returnTrond Myklebust1-1/+7
In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout return info if there are new segments queued for return due to, for instance, a race between a LAYOUTRETURN and a failed I/O attempt. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29pNFS: Ensure we commit the layout if it has been invalidatedTrond Myklebust3-0/+5
If the layout is being invalidated on the server, then we must invoke nfs_commit_inode() to ensure any commits to the DS get cleared out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29pNFS: Don't send COMMITs to the DSes if the server invalidated our layoutTrond Myklebust1-0/+7
If the layout was invalidated, then assume we should requeue all the pending writes for the DS in question. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust2-4/+21
If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28pNFS: Ensure we check layout validity before marking it for returnTrond Myklebust1-0/+4
pnfs_error_mark_layout_for_return needs to check that the layout is valid before calling pnfs_set_plh_return_info(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28NFS4.1 handle interrupted slot reuse from ERR_DELAYOlga Kornievskaia1-1/+2
If the RPC slot was interrupted and server replied to the next operation on the "reused" slot with ERR_DELAY, don't clear out the "interrupted" flag until we properly recover. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28NFSv4: check return value of xdr_inline_decodePan Bian1-0/+2
Function xdr_inline_decode() will return a NULL pointer if the input buffer does not have long enough buffer to decode nbytes of data. However, in function decode_op_map(), the return value of xdr_inline_decode() is not validated before it is used. This patch adds a check to the return value of xdr_inline_decode(). Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()Artem Savkov1-3/+3
Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer dereference like the one below. At the same time the check of retvalue of filelayout_check_deviceid() sets lseg to error, but does not free it before that. [ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c [ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.637420] PGD 4f23b067 [ 3000.637421] PUD 4a0f4067 [ 3000.637679] PMD 0 [ 3000.637937] [ 3000.638287] Oops: 0000 [#1] SMP [ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw [ 3000.645245] i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: xt_u32] [ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 4.11.0-rc7.1.el7.test.x86_64 #1 [ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 3000.647638] task: ffff8800415ada00 task.stack: ffffc90000ff0000 [ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.648696] RSP: 0018:ffffc90000ff39b8 EFLAGS: 00010246 [ 3000.649193] RAX: 0000000000000000 RBX: fffffffffffffff4 RCX: 00000000000d43be [ 3000.649859] RDX: 00000000000d43bd RSI: 0000000000000000 RDI: fffffffffffffff4 [ 3000.650530] RBP: ffffc90000ff39d8 R08: 000000000001e320 R09: ffffffffa05c35ce [ 3000.651203] R10: ffff88007fd1e320 R11: ffffea0001283d80 R12: 0000000001400040 [ 3000.651875] R13: ffff88004f77d9f0 R14: ffffc90000ff3cd8 R15: ffff8800417ade00 [ 3000.652546] FS: 00007fac4d5cd740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 3000.653304] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3000.653849] CR2: 000000000000003c CR3: 000000004f080000 CR4: 00000000000406e0 [ 3000.654527] Call Trace: [ 3000.654771] fl_pnfs_update_layout.constprop.20+0x10c/0x150 [nfs_layout_nfsv41_files] [ 3000.655505] filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files] [ 3000.656195] __nfs_pageio_add_request+0x11c/0x490 [nfs] [ 3000.656698] nfs_pageio_add_request+0xac/0x260 [nfs] [ 3000.657180] nfs_do_writepage+0x109/0x2e0 [nfs] [ 3000.657616] nfs_writepages_callback+0x16/0x30 [nfs] [ 3000.658096] write_cache_pages+0x26f/0x510 [ 3000.658495] ? nfs_do_writepage+0x2e0/0x2e0 [nfs] [ 3000.658946] ? _raw_spin_unlock_bh+0x1e/0x20 [ 3000.659357] ? wb_wakeup_delayed+0x5f/0x70 [ 3000.659748] ? __mark_inode_dirty+0x2eb/0x360 [ 3000.660170] nfs_writepages+0x84/0xd0 [nfs] [ 3000.660575] ? nfs_updatepage+0x571/0xb70 [nfs] [ 3000.661012] do_writepages+0x1e/0x30 [ 3000.661358] __filemap_fdatawrite_range+0xc6/0x100 [ 3000.661819] filemap_write_and_wait_range+0x41/0x90 [ 3000.662292] nfs_file_fsync+0x34/0x1f0 [nfs] [ 3000.662704] vfs_fsync_range+0x3d/0xb0 [ 3000.663065] vfs_fsync+0x1c/0x20 [ 3000.663385] nfs4_file_flush+0x57/0x80 [nfsv4] [ 3000.663813] filp_close+0x2f/0x70 [ 3000.664132] __close_fd+0x9a/0xc0 [ 3000.664453] SyS_close+0x23/0x50 [ 3000.664785] do_syscall_64+0x67/0x180 [ 3000.665162] entry_SYSCALL64_slow_path+0x25/0x25 [ 3000.665600] RIP: 0033:0x7fac4d0e1e90 [ 3000.665946] RSP: 002b:00007ffd54e90c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 3000.666679] RAX: ffffffffffffffda RBX: 00007fac4d3b5400 RCX: 00007fac4d0e1e90 [ 3000.667349] RDX: 0000000000000000 RSI: 00007fac4d5d9000 RDI: 0000000000000001 [ 3000.668031] RBP: 0000000000000000 R08: 00007fac4d3b6a00 R09: 00007fac4d5cd740 [ 3000.668709] R10: 00007ffd54e909e0 R11: 0000000000000246 R12: 0000000000000000 [ 3000.669385] R13: 00007fac4d3b5e80 R14: 0000000000000000 R15: 0000000000000000 [ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00 [ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: ffffc90000ff39b8 [ 3000.672462] CR2: 000000000000003c Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-26NFSv4: Don't special case "launder"Trond Myklebust3-30/+13
If the client receives a fatal server error from nfs_pageio_add_request(), then we should always truncate the page on which the error occurred. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-26NFS: Add a few more fatal I/O errors to nfs_error_is_fatal()Trond Myklebust1-0/+4
EACCES, EDQUOT, EFBIG and ESTALE are all fatal errors as far as NFS I/O is concerned. They need to be reported back to the application. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25Merge tag 'nfs-rdma-4.12-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust6-122/+295
NFS: NFS over RDMA Client Side Changes New Features: - Break RDMA connections after a connection timeout - Support for unloading the underlying device driver Bugfixes and cleanups: - Mark the receive workqueue as "read-mostly" - Silence warnings caused by ENOBUFS - Update a comment in xdr_init_decode_pages() - Remove rpcrdma_buffer->rb_pool.
2017-04-25NFSv3: nfs3_nlm_alloc_call should be declared staticTrond Myklebust1-3/+3
Fix compiler warnings. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25xprtrdma: Remove rpcrdma_buffer::rb_poolChuck Lever1-1/+0
Since commit 1e465fd4ff47 ("xprtrdma: Replace send and receive arrays"), this field is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25sunrpc: Fix xdr_init_decode_pages() documenting commentChuck Lever1-1/+1
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Squelch ENOBUFS warningsChuck Lever1-3/+5
When ro_map is out of buffers, that's not a permanent error, so don't report a problem. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Annotate receive workqueueChuck Lever1-1/+1
Micro-optimize the receive workqueue by marking it's anchor "read- mostly." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Revert commit d0f36c46deeaChuck Lever1-26/+7
Device removal is now adequately supported. Pinning the underlying device driver to prevent removal while an NFS mount is active is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Restore transport after device removalChuck Lever1-0/+48
After a device removal, enable the transport connect worker to restore normal operation if there is another device with connectivity to the server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Refactor rpcrdma_ep_connectChuck Lever1-46/+63
I'm about to add another arm to if (ep->rep_connected != 0) It will be cleaner to use a switch statement here. We'll be looking for a couple of specific errnos, or "anything else," basically to sort out the difference between a normal reconnect and recovery from device removal. This is a refactoring change only. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Support unplugging an HCA from under an NFS mountChuck Lever3-9/+101
The device driver for the underlying physical device associated with an RPC-over-RDMA transport can be removed while RPC-over-RDMA transports are still in use (ie, while NFS filesystems are still mounted and active). The IB core performs a connection event upcall to request that consumers free all RDMA resources associated with a transport. There may be pending RPCs when this occurs. Care must be taken to release associated resources without leaving references that can trigger a subsequent crash if a signal or soft timeout occurs. We rely on the caller of the transport's ->close method to ensure that the previous RPC task has invoked xprt_release but the transport remains write-locked. A DEVICE_REMOVE upcall forces a disconnect then sleeps. When ->close is invoked, it destroys the transport's H/W resources, then wakes the upcall, which completes and allows the core driver unload to continue. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=266 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Use same device when mapping or syncing DMA buffersChuck Lever3-9/+14
When the underlying device driver is reloaded, ia->ri_device will be replaced. All cached copies of that device pointer have to be updated as well. Commit 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and Receive buffers") added the rg_device field to each regbuf. As part of handling a device removal, rpcrdma_dma_unmap_regbuf is invoked on all regbufs for a transport. Simply calling rpcrdma_dma_map_regbuf for each Receive buffer after the driver has been reloaded should reinitialize rg_device correctly for every case except rpcrdma_wc_receive, which still uses rpcrdma_rep::rr_device. Ensure the same device that was used to map a Receive buffer is also used to sync it in rpcrdma_wc_receive by using rg_device there instead of rr_device. This is the only use of rr_device, so it can be removed. The use of regbufs in the send path is also updated, for completeness. Fixes: 54cbd6b0c6b9 ("xprtrdma: Delay DMA mapping Send and ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Refactor rpcrdma_ia_open()Chuck Lever3-27/+32
In order to unload a device driver and reload it, xprtrdma will need to close a transport's interface adapter, and then call rpcrdma_ia_open again, possibly finding a different interface adapter. Make rpcrdma_ia_open safe to call on the same transport multiple times. This is a refactoring change only. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Detect unreachable NFS/RDMA servers more reliablyChuck Lever1-0/+22
Current NFS clients rely on connection loss to determine when to retransmit. In particular, for protocols like NFSv4, clients no longer rely on RPC timeouts to drive retransmission: NFSv4 servers are required to terminate a connection when they need a client to retransmit pending RPCs. When a server is no longer reachable, either because it has crashed or because the network path has broken, the server cannot actively terminate a connection. Thus NFS clients depend on transport-level keepalive to determine when a connection must be replaced and pending RPCs retransmitted. However, RDMA RC connections do not have a native keepalive mechanism. If an NFS/RDMA server crashes after a client has sent RPCs successfully (an RC ACK has been received for all OTW RDMA requests), there is no way for the client to know the connection is moribund. In addition, new RDMA requests are subject to the RPC-over-RDMA credit limit. If the client has consumed all granted credits with NFS traffic, it is not allowed to send another RDMA request until the server replies. Thus it has no way to send a true keepalive when the workload has already consumed all credits with pending RPCs. To address this, forcibly disconnect a transport when an RPC times out. This prevents moribund connections from stopping the detection of failover or other configuration changes on the server. Note that even if the connection is still good, retransmitting any RPC will trigger a disconnect thanks to this logic in xprt_rdma_send_request: /* Must suppress retransmit to maintain credits */ if (req->rl_connect_cookie == xprt->connect_cookie) goto drop_connection; req->rl_connect_cookie = xprt->connect_cookie; Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25sunrpc: Export xprt_force_disconnect()Chuck Lever1-0/+1
xprt_force_disconnect() is already invoked from the socket transport. I want to invoke xprt_force_disconnect() from the RPC-over-RDMA transport, which is a separate module from sunrpc.ko. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25xprtrdma: Cancel refresh worker during buffer shutdownChuck Lever1-0/+1
Trying to create MRs while the transport is being torn down can cause a crash. Fixes: e2ac236c0b65 ("xprtrdma: Allocate MRs on demand") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-04-25NFS: Don't write back further requests if there is a pending write errorTrond Myklebust1-4/+21
If the server has already returned a fatal write error that the user has not yet received on this file, then don't write back the other pages. Instead, act as if they have been sent, and have returned with the same error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25pNFS: Fix use after free issues in pnfs_do_read()Trond Myklebust1-3/+13
The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr has been consumed, and so we should not be testing hdr->task.tk_status. If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and free hdr. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust4-0/+18
If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-21NFS: Always wait for I/O completion before unlockBenjamin Coddington3-6/+67
NFS attempts to wait for read and write completion before unlocking in order to ensure that the data returned was protected by the lock. When this waiting is interrupted by a signal, the unlock may be skipped, and messages similar to the following are seen in the kernel ring buffer: [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3: [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183 [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185 For NFSv3, the missing unlock will cause the server to refuse conflicting locks indefinitely. For NFSv4, the leftover lock will be removed by the server after the lease timeout. This patch fixes this issue by skipping the usual wait in nfs_iocounter_wait if the FL_CLOSE flag is set when signaled. Instead, the wait happens in the unlock RPC task on the NFS UOC rpc_waitqueue. For NFSv3, use lockd's new nlmclnt_operations along with nfs_async_iocounter_wait to defer NLM's unlock task until the lock context's iocounter reaches zero. For NFSv4, call nfs_async_iocounter_wait() directly from unlock's current rpc_call_prepare. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-21lockd: Introduce nlmclnt_operationsBenjamin Coddington8-5/+54
NFS would enjoy the ability to modify the behavior of the NLM client's unlock RPC task in order to delay the transmission of the unlock until IO that was submitted under that lock has completed. This ability can ensure that the NLM client will always complete the transmission of an unlock even if the waiting caller has been interrupted with fatal signal. For this purpose, a pointer to a struct nlmclnt_operations can be assigned in a nfs_module's nfs_rpc_ops that will install those nlmclnt_operations on the nlm_host. The struct nlmclnt_operations defines three callback operations that will be used in a following patch: nlmclnt_alloc_call - used to call back after a successful allocation of a struct nlm_rqst in nlmclnt_proc(). nlmclnt_unlock_prepare - used to call back during NLM unlock's rpc_call_prepare. The NLM client defers calling rpc_call_start() until this callback returns false. nlmclnt_release_call - used to call back when the NLM client's struct nlm_rqst is freed. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-21NFS: Add an iocounter wait function for async RPC tasksBenjamin Coddington5-1/+37
By sleeping on a new NFS Unlock-On-Close waitqueue, rpc tasks may wait for a lock context's iocounter to reach zero. The rpc waitqueue is only woken when the open_context has the NFS_CONTEXT_UNLOCK flag set in order to mitigate spurious wake-ups for any iocounter reaching zero. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-21locks: Set FL_CLOSE when removing flock locks on close()Benjamin Coddington3-2/+4
Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks. NFS will check for this flag to ensure an unlock is sent in a following patch. Fuse handles flock and posix locks differently for FL_CLOSE, and so requires a fixup to retain the existing behavior for flock. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-21NFS: Move the flock open mode check into nfs_flock()Benjamin Coddington2-16/+16
We only need to check lock exclusive/shared types against open mode when flock() is used on NFS, so move it into the flock-specific path instead of checking it for all locks. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-21NFS4: remove a redundant lock range checkBenjamin Coddington1-3/+0
flock64_to_posix_lock() is already doing this check Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20pNFS: unexport nfs4_pnfs_v3_ds_connect_unloadTrond Myklebust1-1/+0
It is not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layoutTrond Myklebust1-2/+0
They are not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20pNFS: Remove unused layout driver callbacksTrond Myklebust2-18/+4
encode_layoutreturn and encode_layoutcommit are now unused. Let's remove them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20nfs: remove the objlayout driverChristoph Hellwig9-2033/+0
The objlayout code has been in the tree, but it's been unmaintained and no server product for it actually ever shipped. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connectTrond Myklebust1-1/+1
The check in nfs4_ff_layout_prepare_ds() seems to be missing. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: a33e4b036d461 ("pNFS: return status from nfs4_pnfs_ds_connect") Cc: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # v4.11
2017-04-20NFSv4: Fix a hang in OPEN related to server rebootTrond Myklebust1-1/+3
If the server fails to return the attributes as part of an OPEN reply, and then reboots, we can end up hanging. The reason is that the client attempts to send a GETATTR in order to pick up the missing OPEN call, but fails to release the slot first, causing reboot recovery to deadlock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: 2e80dbe7ac51a ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Cc: stable@vger.kernel.org # v4.8+
2017-04-20NFS: move rw_mode to nfs_pageio_headerBenjamin Coddington6-14/+18
Let's try to have it in a cacheline in nfs4_proc_pgio_rpc_prepare(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20NFS: move nfs_pgarray_set() to open codeBenjamin Coddington1-20/+14
Since commit 00bfa30abe86 ("NFS: Create a common pgio_alloc and pgio_release function"), nfs_pgarray_set() has only a single caller. Let's open code it. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20NFS: Use GFP_NOIO for two allocations in writebackBenjamin Coddington1-4/+11
Prevent a deadlock that can occur if we wait on allocations that try to write back our pages. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 00bfa30abe869 ("NFS: Create a common pgio_alloc and pgio_release...") Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20NFS: Fix use after free in write error pathFred Isaman1-1/+1
Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Fixes: 0bcbf039f6b2b ("nfs: handle request add failure properly") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()Benjamin Coddington1-2/+4
Commit a7d42ddb3099727f58366fa006f850a219cce6c8 ("nfs: add mirroring support to pgio layer") moved pg_cleanup out of the path when there was non-sequental I/O that needed to be flushed. The result is that for layouts that have more than one layout segment per file, the pg_lseg is not cleared, so we can end up hitting the WARN_ON_ONCE(req_start >= seg_end) in pnfs_generic_pg_test since the pg_lseg will be pointing to that previously-flushed layout segment. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20sunrpc: don't check for failure from mempool_alloc()NeilBrown2-13/+0
When mempool_alloc() is allowed to sleep (GFP_NOIO allows sleeping) it cannot fail. So rpc_alloc_task() cannot fail, so rpc_new_task doesn't need to test for failure. Consequently rpc_new_task() cannot fail, so the callers don't need to test. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20NFS: fix usage of mempools.NeilBrown3-26/+27
When passed GFP flags that allow sleeping (such as GFP_NOIO), mempool_alloc() will never return NULL, it will wait until memory is available. This means that we don't need to handle failure, but that we do need to ensure one thread doesn't call mempool_alloc() twice on the one pool without queuing or freeing the first allocation. If multiple threads did this during times of high memory pressure, the pool could be exhausted and a deadlock could result. pnfs_generic_alloc_ds_commits() attempts to allocate from the nfs_commit_mempool while already holding an allocation from that pool. This is not safe. So change nfs_commitdata_alloc() to take a flag that indicates whether failure is acceptable. In pnfs_generic_alloc_ds_commits(), accept failure and handle it as we currently do. Else where, do not accept failure, and do not handle it. Even when failure is acceptable, we want to succeed if possible. That means both - using an entry from the pool if there is one - waiting for direct reclaim is there isn't. We call mempool_alloc(GFP_NOWAIT) to achieve the first, then kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the second. Each of these can fail, but together they do the best they can without blocking indefinitely. The objects returned by kmem_cache_alloc() will still be freed by mempool_free(). This is safe as mempool_alloc() uses exactly the same function to allocate objects (since the mempool was created with mempool_create_slab_pool()). The object returned by mempool_alloc() and kmem_cache_alloc() are indistinguishable so mempool_free() will handle both identically, either adding to the pool or calling kmem_cache_free(). Also, don't test for failure when allocating from nfs_wdata_mempool. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20NFS: Clean up nfs4_proc_get_lease_time()Anna Schumaker1-7/+3
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>