summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-23nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commitMike Snitzer7-15/+22
The nfsd_file will be passed, in future commits, by callers that enable LOCALIO support (for both regular NFS and pNFS IO). [Derived from patch authored by Weston Andros Adamson, but switched from passing struct file to struct nfsd_file] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: implement server support for NFS_LOCALIO_PROGRAMMike Snitzer4-1/+110
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL" RPC method that allows the Linux NFS client to verify the local Linux NFS server can see the nonce (single-use UUID) the client generated and made available in nfs_common. The server expects this protocol to use the same transport as NFS and NFSACL for its RPCs. This protocol isn't part of an IETF standard, nor does it need to be considering it is Linux-to-Linux auxiliary RPC protocol that amounts to an implementation detail. The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR methods are used instead of the less efficient variable sized methods. The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ): Linux Kernel Organization 400122 nfslocalio Signed-off-by: Mike Snitzer <snitzer@kernel.org> [neilb: factored out and simplified single localio protocol] Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add LOCALIO supportWeston Andros Adamson8-3/+134
Add server support for bypassing NFS for localhost reads, writes, and commits. This is only useful when both the client and server are running on the same host. If nfsd_open_local_fh() fails then the NFS client will both retry and fallback to normal network-based read, write and commit operations if localio is no longer supported. Care is taken to ensure the same NFS security mechanisms are used (authentication, etc) regardless of whether localio or regular NFS access is used. The auth_domain established as part of the traditional NFS client access to the NFS server is also used for localio. Store auth_domain for localio in nfsd_uuid_t and transfer it to the client if it is local to the server. Relative to containers, localio gives the client access to the network namespace the server has. This is required to allow the client to access the server's per-namespace nfsd_net struct. This commit also introduces the use of NFSD's percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh and other LOCALIO client code. CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: prepare for the NFS client to use nfsd_file for LOCALIOMike Snitzer5-2/+81
The next commit will introduce nfsd_open_local_fh() which returns an nfsd_file structure. This commit exposes LOCALIO's required NFSD symbols to the NFS client: - Make nfsd_open_local_fh() symbol and other required NFSD symbols available to NFS in a global 'nfs_to' nfsd_localio_operations struct (global access suggested by Trond, nfsd_localio_operations suggested by NeilBrown). The next commit will also introduce nfsd_localio_ops_init() that init_nfsd() will call to initialize 'nfs_to'. - Introduce nfsd_file_file() that provides access to nfsd_file's backing file. Keeps nfsd_file structure opaque to NFS client (as suggested by Jeff Layton). - Introduce nfsd_file_put_local() that will put the reference to the nfsd_file's associated nn->nfsd_serv and then put the reference to the nfsd_file (as suggested by NeilBrown). Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to Suggested-by: NeilBrown <neilb@suse.de> # nfsd_localio_operations Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: add NFS LOCALIO auxiliary protocol enablementMike Snitzer4-0/+178
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client to generate a nonce (single-use UUID) and associated nfs_uuid_t struct, register it with nfs_common for subsequent lookup and verification by the NFS server and if matched the NFS server populates members in the nfs_uuid_t struct. nfs_common's nfs_uuids list is the basis for localio enablement, as such it has members that point to nfsd memory for direct use by the client (e.g. 'net' is the server's network namespace, through it the client can access nn->nfsd_serv). This commit also provides the base nfs_uuid_t interfaces to allow proper net namespace refcounting for the LOCALIO use case. CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client enablement for LOCALIO. If both NFS_FS=m and NFSD=m then NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides nfs_common's LOCALIO support). # lsmod | grep nfs_localio nfs_localio 12288 2 nfsd,nfs sunrpc 745472 35 nfs_localio,nfsd,auth_rpcgss,lockd,nfsv3,nfs Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: replace program list with program arrayNeilBrown7-55/+67
A service created with svc_create_pooled() can be given a linked list of programs and all of these will be served. Using a linked list makes it cumbersome when there are several programs that can be optionally selected with CONFIG settings. After this patch is applied, API consumers must use only svc_create_pooled() when creating an RPC service that listens for more than one RPC program. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: add svcauth_map_clnt_to_svc_cred_localWeston Andros Adamson2-0/+33
Add new funtion svcauth_map_clnt_to_svc_cred_local which maps a generic cred to a svc_cred suitable for use in nfsd. This is needed by the localio code to map nfs client creds to nfs server credentials. Following from net/sunrpc/auth_unix.c:unx_marshal() it is clear that ->fsuid and ->fsgid must be used (rather than ->uid and ->gid). In addition, these uid and gid must be translated with from_kuid_munged() so local client uses correct uid and gid when acting as local server. Jeff Layton noted: This is where the magic happens. Since we're working in kuid_t/kgid_t, we don't need to worry about further idmapping. Suggested-by: NeilBrown <neilb@suse.de> # to approximate unx_marshal() Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: remove call_allocate() BUG_ONsMike Snitzer1-6/+0
Remove BUG_ON if p_arglen=0 to allow RPC with void arg. Remove BUG_ON if p_replen=0 to allow RPC with void return. The former was needed for the first revision of the LOCALIO protocol which had an RPC that took a void arg: /* raw RFC 9562 UUID */ typedef u8 uuid_t<UUID_SIZE>; program NFS_LOCALIO_PROGRAM { version LOCALIO_V1 { void NULL(void) = 0; uuid_t GETUUID(void) = 1; } = 1; } = 400122; The latter is needed for the final revision of the LOCALIO protocol which has a UUID_IS_LOCAL RPC which returns a void: /* raw RFC 9562 UUID */ typedef u8 uuid_t<UUID_SIZE>; program NFS_LOCALIO_PROGRAM { version LOCALIO_V1 { void NULL(void) = 0; void UUID_IS_LOCAL(uuid_t) = 1; } = 1; } = 400122; There is really no value in triggering a BUG_ON in response to either of these previously unsupported conditions. NeilBrown would like the entire 'if (proc->p_proc != 0)' branch removed (not just the one BUG_ON that must be removed for LOCALIO's immediate needs of returning void). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add nfsd_serv_try_get and nfsd_serv_putMike Snitzer2-1/+50
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any caller of nfsd_serv_try_get releases their reference using nfsd_serv_put. A percpu_ref is used to implement the interlock between nfsd_destroy_serv and any caller of nfsd_serv_try_get. This interlock is needed to properly wait for the completion of client initiated localio calls to nfsd (that are _not_ in the context of nfsd). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add nfsd_file_acquire_local()NeilBrown4-7/+92
nfsd_file_acquire_local() can be used to look up a file by filehandle without having a struct svc_rqst. This can be used by NFS LOCALIO to allow the NFS client to bypass the NFS protocol to directly access a file provided by the NFS server which is running in the same kernel. In nfsd_file_do_acquire() care is taken to always use fh_verify() if rqstp is not NULL (as is the case for non-LOCALIO callers). Otherwise the non-LOCALIO callers will not supply the correct and required arguments to __fh_verify (e.g. gssclient isn't passed). Introduce fh_verify_local() wrapper around __fh_verify to make it clear that LOCALIO is intended caller. Also, use GC for nfsd_file returned by nfsd_file_acquire_local. GC offers performance improvements if/when a file is reopened before launderette cleans it from the filecache's LRU. Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: factor out __fh_verify to allow NULL rqstp to be passedNeilBrown1-31/+60
__fh_verify() offers an interface like fh_verify() but doesn't require a struct svc_rqst *, instead it also takes the specific parts as explicit required arguments. So it is safe to call __fh_verify() with a NULL rqstp, but the net, cred, and client args must not be NULL. __fh_verify() does not use SVC_NET(), nor does the functions it calls. Rather than using rqstp->rq_client pass the client and gssclient explicitly to __fh_verify and then to nfsd_set_fh_dentry(). Lastly, it should be noted that the previous commit prepared for 4 associated tracepoints to only be used if rqstp is not NULL (this is a stop-gap that should be properly fixed so localio also benefits from the utility these tracepoints provide when debugging fh_verify issues). Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Short-circuit fh_verify tracepoints for LOCALIOChuck Lever1-8/+10
LOCALIO will be able to call fh_verify() with a NULL rqstp. In this case, the existing trace points need to be skipped because they want to dereference the address fields in the passed-in rqstp. Temporarily make these trace points conditional to avoid a seg fault in this case. Putting the "rqstp != NULL" check in the trace points themselves makes the check more efficient. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()Chuck Lever1-4/+4
Currently, fh_verify() makes some daring assumptions about which version of file handle the caller wants, based on the things it can find in the passed-in rqstp. The about-to-be-introduced LOCALIO use case sometimes has no svc_rqst context, so this logic won't work in that case. Instead, examine the passed-in file handle. It's .max_size field should carry information to allow nfsd_set_fh_dentry() to initialize the file handle appropriately. The file handle used by lockd and the one created by write_filehandle never need any of the version-specific fields (which affect things like write and getattr requests and pre/post attributes). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Refactor nfsd_setuser_and_check_port()NeilBrown1-9/+10
There are several places where __fh_verify unconditionally dereferences rqstp to check that the connection is suitably secure. They look at rqstp->rq_xprt which is not meaningful in the target use case of "localio" NFS in which the client talks directly to the local server. Prepare these to always succeed when rqstp is NULL. Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Handle @rqstp == NULL in check_nfsd_access()NeilBrown1-5/+25
LOCALIO-initiated open operations are not running in an nfsd thread and thus do not have an associated svc_rqst context. Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.hMike Snitzer3-20/+19
Eliminates duplicate functions in various files to allow for additional callers. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errnoMike Snitzer3-67/+68
Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be used by fs/nfs/localio.c Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: factor out nfs_errtbl and nfs_stat_to_errnoMike Snitzer8-160/+109
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and fs/nfs/nfs3xdr.c Will also be used by fs/nfsd/localio.c Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: add 'noalignwrite' option for lock-less 'lost writes' preventionDan Aloni4-0/+15
There are some applications that write to predefined non-overlapping file offsets from multiple clients and therefore don't need to rely on file locking. However, if these applications want non-aligned offsets and sizes they need to either use locks or risk data corruption, as the NFS client defaults to extending writes to whole pages. This commit adds a new mount option `noalignwrite`, which allows to turn that off and avoid the need of locking, as long as these applications don't overlap on offsets. Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: fix the comment of nfs_get_rootLi Lingfeng1-1/+1
The comment for nfs_get_root() needs to be updated as it would also be used by NFS4 as follows: @x[ nfs_get_root+1 nfs_get_tree_common+1819 nfs_get_tree+2594 vfs_get_tree+73 fc_mount+23 do_nfs4_mount+498 nfs4_try_get_tree+134 nfs_get_tree+2562 vfs_get_tree+73 path_mount+2776 do_mount+226 __se_sys_mount+343 __x64_sys_mount+106 do_syscall_64+69 entry_SYSCALL_64_after_hwframe+97 , mount.nfs4]: 1 Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSv4.2: Fix detection of "Proxying of Times" server supportRoi Azarzar1-2/+14
According to draft-ietf-nfsv4-delstid-07: If a server informs the client via the fattr4_open_arguments attribute that it supports OPEN_ARGS_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS and it returns a valid delegation stateid for an OPEN operation which sets the OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS flag, then it MUST query the client via a CB_GETATTR for the fattr4_time_deleg_access (see Section 5.2) attribute and fattr4_time_deleg_modify attribute (see Section 5.2). Thus, we should look that the server supports proxying of times via OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS. We want to be extra pedantic and continue to check that FATTR4_TIME_DELEG_ACCESS and FATTR4_TIME_DELEG_MODIFY are set. The server needs to expose both for the client to correctly detect "Proxying of Times" support. Signed-off-by: Roi Azarzar <roi.azarzar@vastdata.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: dcb3c20f7419 ("NFSv4: Add a capability for delegated attributes") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSv4: Fail mounts if the lease setup times outTrond Myklebust1-0/+6
If the server is down when the client is trying to mount, so that the calls to exchange_id or create_session fail, then we should allow the mount system call to fail rather than hang and block other mount/umount calls. Reported-by: Oleksandr Tymoshenko <ovt@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23fs: nfs: fix missing refcnt by replacing folio_set_private by ↵Zhaoyang Huang1-4/+2
folio_attach_private This patch is inspired by a code review of fs codes which aims at folio's extra refcnt that could introduce unwanted behavious when judging refcnt, such as[1].That is, the folio passed to mapping_evict_folio carries the refcnts from find_lock_entries, page_cache, corresponding to PTEs and folio's private if has. However, current code doesn't take the refcnt for folio's private which could have mapping_evict_folio miss the one to only PTE and lead to call filemap_release_folio wrongly. [1] long mapping_evict_folio(struct address_space *mapping, struct folio *folio) { ... //current code will misjudge here if there is one pte on the folio which is be deemed as the one as folio's private if (folio_ref_count(folio) > folio_nr_pages(folio) + folio_has_private(folio) + 1) return 0; if (!filemap_release_folio(folio, 0)) return 0; return remove_mapping(mapping, folio); } Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: Remove obsoleted declaration for nfs_read_prepareGaosheng Cui1-1/+0
The nfs_read_prepare() have been removed since commit a4cdda59111f ("NFS: Create a common pgio_rpc_prepare function"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23net/sunrpc: make use of the helper macro LIST_HEAD()Hongbo Li1-7/+3
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Here we can simplify the code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: clnt.c: Remove misleading commentSiddh Raman Pant1-5/+0
destroy_wait doesn't store all RPC clients. There was a list named "all_clients" above it, which got moved to struct sunrpc_net in 2012, but the comment was never removed. Fixes: 70abc49b4f4a ("SUNRPC: make SUNPRC clients list per network namespace context") Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: convert RPC_TASK_* constants to enumStephen Brennan1-7/+9
The RPC_TASK_* constants are defined as macros, which means that most kernel builds will not contain their definitions in the debuginfo. However, it's quite useful for debuggers to be able to view the task state constant and interpret it correctly. Conversion to an enum will ensure the constants are present in debuginfo and can be interpreted by debuggers without needing to hard-code them and track their changes. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: Fix -Wformat-truncation warningKunwu Chan1-1/+1
Increase size of the servername array to avoid truncated output warning. net/sunrpc/clnt.c:582:75: error:‘%s’ directive output may be truncated writing up to 107 bytes into a region of size 48 [-Werror=format-truncation=] 582 | snprintf(servername, sizeof(servername), "%s", | ^~ net/sunrpc/clnt.c:582:33: note:‘snprintf’ output between 1 and 108 bytes into a destination of size 48 582 | snprintf(servername, sizeof(servername), "%s", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 583 | sun->sun_path); Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: Remove unnecessary NULL check before kfree()Thorsten Blum1-2/+1
Since kfree() already checks if its argument is NULL, an additional check before calling kfree() is unnecessary and can be removed. Remove it and thus also the following Coccinelle/coccicheck warning reported by ifnullfree.cocci: WARNING: NULL check before some freeing functions is not needed Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: Annotate struct nfs_cache_array with __counted_by()Thorsten Blum1-3/+3
Add the __counted_by compiler attribute to the flexible array member array to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Increment size before adding a new struct to the array. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: simplify and guarantee owner uniqueness.NeilBrown6-24/+10
I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a v4.0 LOCK request to a Linux server (which had fixed the problem with RELEASE_LOCKOWNER bug fixed). The LOCK request presented a "new" lock owner so there are two seq ids in the request: that for the open file, and that for the new lock. Given the context I am confident that the new lock owner was reported to have the wrong seqid. As lock owner identifiers are reused, the server must still have a lock owner active which the client thinks is no longer active. I wasn't able to determine a root-cause but the simplest fix seems to be to ensure lock owners are always unique much as open owners are (thanks to a time stamp). The easiest way to ensure uniqueness is with a 64bit counter for each server. That will never cycle (if updated once a nanosecond the last 584 years. A single NFS server would not handle open/lock requests nearly that fast, and a Linux node is unlikely to have an uptime approaching that). This patch removes the 2 ida and instead uses a per-server atomic64_t to provide uniqueness. Note that the lock owner already encodes the id as 64 bits even though it is a 32bit value. So changing to a 64bit value does not change the encoding of the lock owner. The open owner encoding is now 4 bytes larger. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: fix memory leak in error path of nfs4_do_reclaimLi Lingfeng1-0/+1
Commit c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()") separate out the freeing of the state owners from nfs4_purge_state_owners() and finish it outside the rcu lock. However, the error path is omitted. As a result, the state owners in "freeme" will not be released. Fix it by adding freeing in the error path. Fixes: c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23Merge tag 'nfsd-6.12' into linux-next-with-localioAnna Schumaker200-495/+4955
NFSD 6.12 Release Notes Notable features of this release include: - Pre-requisites for automatically determining the RPC server thread count - Clean-up and preparation for supporting LOCALIO, which will be merged via the NFS client tree - Enhancements and fixes to NFSv4.2 COPY offload - A new Python-based tool for generating kernel SunRPC XDR encoding and decoding functions, added as an aid for prototyping features in protocols based on the Linux kernel's SunRPC implementation. As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle.
2024-09-20xdrgen: Prevent reordering of encoder and decoder functionsChuck Lever1-12/+12
I noticed that "xdrgen source" reorders the procedure encoder and decoder functions every time it is run. I would prefer that the generated code be more deterministic: it enables a reader to better see exactly what has changed between runs of the tool. The problem is that Python sets are not ordered. I use a Python set to ensure that, when multiple procedures use a particular argument or result type, the encoder/decoder for that type is emitted only once. Sets aren't ordered, but I can use Python dictionaries for this purpose to ensure the procedure functions are always emitted in the same order if the .x file does not change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20xdrgen: typedefs should use the built-in string and opaque functionsChuck Lever2-2/+2
'typedef opaque yada<XYZ>' should use xdrgen's built-in opaque encoder and decoder, to enable better compiler optimization. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20xdrgen: Fix return code checking in built-in XDR decodersChuck Lever4-5/+5
xdr_stream_encode_u32() returns XDR_UNIT on success. xdr_stream_decode_u32() returns zero or -EMSGSIZE, but never XDR_UNIT. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20tools: Add xdrgenChuck Lever153-0/+4196
Add a Python-based tool for translating XDR specifications into XDR encoder and decoder functions written in the Linux kernel's C coding style. The generator attempts to match the usual C coding style of the Linux kernel's SunRPC consumers. This approach is similar to the netlink code generator in tools/net/ynl . The maintainability benefits of machine-generated XDR code include: - Stronger type checking - Reduces the number of bugs introduced by human error - Makes the XDR code easier to audit and analyze - Enables rapid prototyping of new RPC-based protocols - Hardens the layering between protocol logic and marshaling - Makes it easier to add observability on demand - Unit tests might be built for both the tool and (automatically) for the generated code In addition, converting the XDR layer to use memory-safe languages such as Rust will be easier if much of the code can be converted automatically. Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: fix delegation_blocked() to block correctly for at least 30 secondsNeilBrown1-2/+3
The pair of bloom filtered used by delegation_blocked() was intended to block delegations on given filehandles for between 30 and 60 seconds. A new filehandle would be recorded in the "new" bit set. That would then be switch to the "old" bit set between 0 and 30 seconds later, and it would remain as the "old" bit set for 30 seconds. Unfortunately the code intended to clear the old bit set once it reached 30 seconds old, preparing it to be the next new bit set, instead cleared the *new* bit set before switching it to be the old bit set. This means that the "old" bit set is always empty and delegations are blocked between 0 and 30 seconds. This patch updates bd->new before clearing the set with that index, instead of afterwards. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: fix initial getattr on write delegationJeff Layton1-8/+25
At this point in compound processing, currentfh refers to the parent of the file, not the file itself. Get the correct dentry from the delegation stateid instead. Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: untangle code in nfsd4_deleg_getattr_conflict()NeilBrown1-69/+62
The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy. With this patch we: - properly handle non-nfsd leases. We must not assume flc_owner is a delegation unless fl_lmops == &nfsd_lease_mng_ops - move the main code out of the for loop - have a single exit which calls nfs4_put_stid() (and other exits which don't need to call that) [ jlayton: refactored on top of Neil's other patch: nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease ] Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()Scott Mayhew1-4/+4
This patch is intended to go on top of "nfsd: return -EINVAL when namelen is 0" from Li Lingfeng. Li's patch checks for 0, but we should be enforcing an upper bound as well. Note that if nfsdcld somehow gets an id > NFS4_OPAQUE_LIMIT in its database, it'll truncate it to NFS4_OPAQUE_LIMIT when it does the downcall anyway. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: return -EINVAL when namelen is 0Li Lingfeng1-0/+8
When we have a corrupted main.sqlite in /var/lib/nfs/nfsdcld/, it may result in namelen being 0, which will cause memdup_user() to return ZERO_SIZE_PTR. When we access the name.data that has been assigned the value of ZERO_SIZE_PTR in nfs4_client_to_reclaim(), null pointer dereference is triggered. [ T1205] ================================================================== [ T1205] BUG: KASAN: null-ptr-deref in nfs4_client_to_reclaim+0xe9/0x260 [ T1205] Read of size 1 at addr 0000000000000010 by task nfsdcld/1205 [ T1205] [ T1205] CPU: 11 PID: 1205 Comm: nfsdcld Not tainted 5.10.0-00003-g2c1423731b8d #406 [ T1205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ T1205] Call Trace: [ T1205] dump_stack+0x9a/0xd0 [ T1205] ? nfs4_client_to_reclaim+0xe9/0x260 [ T1205] __kasan_report.cold+0x34/0x84 [ T1205] ? nfs4_client_to_reclaim+0xe9/0x260 [ T1205] kasan_report+0x3a/0x50 [ T1205] nfs4_client_to_reclaim+0xe9/0x260 [ T1205] ? nfsd4_release_lockowner+0x410/0x410 [ T1205] cld_pipe_downcall+0x5ca/0x760 [ T1205] ? nfsd4_cld_tracking_exit+0x1d0/0x1d0 [ T1205] ? down_write_killable_nested+0x170/0x170 [ T1205] ? avc_policy_seqno+0x28/0x40 [ T1205] ? selinux_file_permission+0x1b4/0x1e0 [ T1205] rpc_pipe_write+0x84/0xb0 [ T1205] vfs_write+0x143/0x520 [ T1205] ksys_write+0xc9/0x170 [ T1205] ? __ia32_sys_read+0x50/0x50 [ T1205] ? ktime_get_coarse_real_ts64+0xfe/0x110 [ T1205] ? ktime_get_coarse_real_ts64+0xa2/0x110 [ T1205] do_syscall_64+0x33/0x40 [ T1205] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ T1205] RIP: 0033:0x7fdbdb761bc7 [ T1205] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 514 [ T1205] RSP: 002b:00007fff8c4b7248 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ T1205] RAX: ffffffffffffffda RBX: 000000000000042b RCX: 00007fdbdb761bc7 [ T1205] RDX: 000000000000042b RSI: 00007fff8c4b75f0 RDI: 0000000000000008 [ T1205] RBP: 00007fdbdb761bb0 R08: 0000000000000000 R09: 0000000000000001 [ T1205] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000042b [ T1205] R13: 0000000000000008 R14: 00007fff8c4b75f0 R15: 0000000000000000 [ T1205] ================================================================== Fix it by checking namelen. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 74725959c33c ("nfsd: un-deprecate nfsdcld") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Wrap async copy operations with trace pointsChuck Lever2-2/+72
Add an nfsd_copy_async_done to record the timestamp, the final status code, and the callback stateid of an async copy. Rename the nfsd_copy_do_async tracepoint to match that naming convention to make it easier to enable both of these with a single glob. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Clean up extra whitespace in trace_nfsd_copy_doneChuck Lever1-1/+1
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Record the callback stateid in copy tracepointsChuck Lever1-0/+12
Match COPY operations up with CB_OFFLOAD operations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Display copy stateids with conventional print formattingChuck Lever1-6/+6
Make it easier to grep for s2s COPY stateids in trace logs: Use the same display format in nfsd_copy_class as is used to display other stateids. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Limit the number of concurrent async COPY operationsChuck Lever4-2/+12
Nothing appears to limit the number of concurrent async COPY operations that clients can start. In addition, AFAICT each async COPY can copy an unlimited number of 4MB chunks, so can run for a long time. Thus IMO async COPY can become a DoS vector. Add a restriction mechanism that bounds the number of concurrent background COPY operations. Start simple and try to be fair -- this patch implements a per-namespace limit. An async COPY request that occurs while this limit is exceeded gets NFS4ERR_DELAY. The requesting client can choose to send the request again after a delay or fall back to a traditional read/write style copy. If there is need to make the mechanism more sophisticated, we can visit that in future patches. Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Async COPY result needs to return a write verifierChuck Lever1-15/+8
Currently, when NFSD handles an asynchronous COPY, it returns a zero write verifier, relying on the subsequent CB_OFFLOAD callback to pass the write verifier and a stable_how4 value to the client. However, if the CB_OFFLOAD never arrives at the client (for example, if a network partition occurs just as the server sends the CB_OFFLOAD operation), the client will never receive this verifier. Thus, if the client sends a follow-up COMMIT, there is no way for the client to assess the COMMIT result. The usual recovery for a missing CB_OFFLOAD is for the client to send an OFFLOAD_STATUS operation, but that operation does not carry a write verifier in its result. Neither does it carry a stable_how4 value, so the client /must/ send a COMMIT in this case -- which will always fail because currently there's still no write verifier in the COPY result. Thus the server needs to return a normal write verifier in its COPY result even if the COPY operation is to be performed asynchronously. If the server recognizes the callback stateid in subsequent OFFLOAD_STATUS operations, then obviously it has not restarted, and the write verifier the client received in the COPY result is still valid and can be used to assess a COMMIT of the copied data, if one is needed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: avoid races with wake_up_var()NeilBrown1-1/+4
wake_up_var() needs a barrier after the important change is made in the var and before wake_up_var() is called, else it is possible that a wake up won't be sent when it should. In each case here the var is changed in an "atomic" manner, so smb_mb__after_atomic() is sufficient. In one case the important change (removing the lease) is performed *after* the wake_up, which is backwards. The code survives in part because the wait_var_event is given a timeout. This patch adds the required barriers and calls destroy_delegation() *before* waking any threads waiting for the delegation to be destroyed. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: use clear_and_wake_up_bit()NeilBrown2-6/+2
nfsd has two places that open-code clear_and_wake_up_bit(). One has the required memory barriers. The other does not. Change both to use clear_and_wake_up_bit() so we have the barriers without the noise. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>