summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-09-25io_uring: don't unconditionally set plug->nowait = trueJens Axboe1-3/+0
This causes all the bios to be submitted with REQ_NOWAIT, which can be problematic on either btrfs or on file systems that otherwise use a mix of block devices where only some of them support it. For now, just remove the setting of plug->nowait = true. Reported-by: Dan Melnic <dmm@fb.com> Reported-by: Brian Foster <bfoster@redhat.com> Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25io_uring: ensure open/openat2 name is cleaned on cancelationJens Axboe1-0/+5
If we cancel these requests, we'll leak the memory associated with the filename. Add them to the table of ops that need cleaning, if REQ_F_NEED_CLEANUP is set. Cc: stable@vger.kernel.org Fixes: e62753e4e292 ("io_uring: call statx directly") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21io_uring: fix openat/openat2 unified prep handlingJens Axboe1-2/+4
A previous commit unified how we handle prep for these two functions, but this means that we check the allowed context (SQPOLL, specifically) later than we should. Move the ring type checking into the two parent functions, instead of doing it after we've done some setup work. Fixes: ec65fea5a8d7 ("io_uring: deduplicate io_openat{,2}_prep()") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21io_uring: mark statx/files_update/epoll_ctl as non-SQPOLLJens Axboe1-2/+4
These will naturally fail when attempted through SQPOLL, but either with -EFAULT or -EBADF. Make it explicit that these are not workable through SQPOLL and return -EINVAL, just like other ops that need to use ->files. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21io_uring: don't use retry based buffered reads for non-async bdevJens Axboe1-1/+5
Some block devices, like dm, bubble back -EAGAIN through the completion handler. We check for this in io_read(), but don't honor it for when we have copied the iov. Return -EAGAIN for this case before retrying, to force punt to io-wq. Fixes: bcf5a06304d6 ("io_uring: support true async buffered reads, if file provides it") Reported-by: Zorro Lang <zlang@redhat.com> Tested-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21io_uring: don't re-setup vecs/iter in io_resumit_prep() is already thereJens Axboe1-6/+10
If we already have mapped the necessary data for retry, then don't set it up again. It's a pointless operation, and we leak the iovec if it's a large (non-stack) vec. Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-14io_uring: don't run task work on an exiting taskJens Axboe1-0/+11
This isn't safe, and isn't needed either. We are guaranteed that any work we queue is on a live task (and will be run), or it goes to our backup io-wq threads if the task is exiting. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-14io_uring: drop 'ctx' ref on task work cancelationJens Axboe1-0/+2
If task_work ends up being marked for cancelation, we go through a cancelation helper instead of the queue path. In converting task_work to always hold a ctx reference, this path was missed. Make sure that io_req_task_cancel() puts the reference that is being held against the ctx. Fixes: 6d816e088c35 ("io_uring: hold 'ctx' reference around task_work queue + execute") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-13io_uring: grab any needed state during defer prepJens Axboe1-0/+2
Always grab work environment for deferred links. The assumption that we will be running it always from the task in question is false, as exiting tasks may mean that we're deferring this one to a thread helper. And at that point it's too late to grab the work environment. Fixes: debb85f496c9 ("io_uring: factor out grab_env() from defer_prep()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05io_uring: fix linked deferred ->files cancellationPavel Begunkov1-2/+23
While looking for ->files in ->defer_list, consider that requests there may actually be links. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05io_uring: fix cancel of deferred reqs with ->filesPavel Begunkov1-0/+27
While trying to cancel requests with ->files, it also should look for requests in ->defer_list, otherwise it might end up hanging a thread. Cancel all requests in ->defer_list up to the last request there with matching ->files, that's needed to follow drain ordering semantics. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05io_uring: fix explicit async read/write mapping for large segmentsJens Axboe1-3/+4
If we exceed UIO_FASTIOV, we don't handle the transition correctly between an allocated vec for requests that are queued with IOSQE_ASYNC. Store the iovec appropriately and re-set it in the iter iov in case it changed. Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls") Reported-by: Nick Hill <nick@nickhill.org> Tested-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02io_uring: no read/write-retry on -EAGAIN error and O_NONBLOCK marked fileJens Axboe1-0/+10
Actually two things that need fixing up here: - The io_rw_reissue() -EAGAIN retry is explicit to block devices and regular files, so don't ever attempt to do that on other types of files. - If we hit -EAGAIN on a nonblock marked file, don't arm poll handler for it. It should just complete with -EAGAIN. Cc: stable@vger.kernel.org Reported-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02io_uring: set table->files[i] to NULL when io_sqe_file_register failedJiufei Xue1-0/+1
While io_sqe_file_register() failed in __io_sqe_files_update(), table->files[i] still point to the original file which may freed soon, and that will trigger use-after-free problems. Cc: stable@vger.kernel.org Fixes: f3bd9dae3708 ("io_uring: fix memleak in __io_sqe_files_update()") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01io_uring: fix removing the wrong file in __io_sqe_files_update()Jiufei Xue1-1/+1
Index here is already the position of the file in fixed_file_table, we should not use io_file_from_index() again to get it. Otherwise, the wrong file which still in use may be released unexpectedly. Cc: stable@vger.kernel.org # v5.6 Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-27io_uring: don't bounce block based -EAGAIN retry off task_workJens Axboe1-20/+6
These events happen inline from submission, so there's no need to bounce them through the original task. Just set them up for retry and issue retry directly instead of going over task_work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-27io_uring: fix IOPOLL -EAGAIN retriesJens Axboe1-5/+9
This normally isn't hit, as polling is mostly done on NVMe with deep queue depths. But if we do run into request starvation, we need to ensure that retries are properly serialized. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-26io_uring: clear req->result on IOPOLL re-issueJens Axboe1-0/+1
Make sure we clear req->result, which was set to -EAGAIN for retry purposes, when moving it to the reissue list. Otherwise we can end up retrying a request more than once, which leads to weird results in the io-wq handling (and other spots). Cc: stable@vger.kernel.org Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-26io_uring: make offset == -1 consistent with preadv2/pwritev2Jens Axboe1-4/+9
The man page for io_uring generally claims were consistent with what preadv2 and pwritev2 accept, but turns out there's a slight discrepancy in how offset == -1 is handled for pipes/streams. preadv doesn't allow it, but preadv2 does. This currently causes io_uring to return -EINVAL if that is attempted, but we should allow that as documented. This change makes us consistent with preadv2/pwritev2 for just passing in a NULL ppos for streams if the offset is -1. Cc: stable@vger.kernel.org # v5.7+ Reported-by: Benedikt Ames <wisp3rwind@posteo.eu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-25io_uring: ensure read requests go through -ERESTART* transformationJens Axboe1-1/+2
We need to call kiocb_done() for any ret < 0 to ensure that we always get the proper -ERESTARTSYS (and friends) transformation done. At some point this should be tied into general error handling, so we can get rid of the various (mostly network) related commands that check and perform this substitution. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-25io_uring: don't use poll handler if file can't be nonblocking read/writtenJens Axboe1-1/+9
There's no point in using the poll handler if we can't do a nonblocking IO attempt of the operation, since we'll need to go async anyway. In fact this is actively harmful, as reading from eg pipes won't return 0 to indicate EOF. Cc: stable@vger.kernel.org # v5.7+ Reported-by: Benedikt Ames <wisp3rwind@posteo.eu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-25io_uring: fix imbalanced sqo_mm accountingJens Axboe1-7/+3
We do the initial accounting of locked_vm and pinned_vm before we have setup ctx->sqo_mm, which means we can end up having not accounted the memory at setup time, but still decrement it when we exit. This causes an imbalance in the accounting. Setup ctx->sqo_mm earlier in io_uring_create(), before we do the first accounting of mm->{locked,pinned}_vm. This also unifies the state grabbing for the ctx, and eliminates a failure case in io_sq_offload_start(). Fixes: f74441e6311a ("io_uring: account locked memory before potential error case") Reported-by: Robert M. Muncrief <rmuncrief@humanavance.com> Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Robert M. Muncrief <rmuncrief@humanavance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-25io_uring: revert consumed iov_iter bytes on errorJens Axboe1-0/+4
Some consumers of the iov_iter will return an error, but still have bytes consumed in the iterator. This is an issue for -EAGAIN, since we rely on a sane iov_iter state across retries. Fix this by ensuring that we revert consumed bytes, if any, if the file operations have consumed any bytes from iterator. This is similar to what generic_file_read_iter() does, and is always safe as we have the previous bytes count handy already. Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls") Reported-by: Dmitry Shulyak <yashulyak@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-23io-wq: fix hang after cancelling pending hashed workPavel Begunkov1-2/+19
Don't forget to update wqe->hash_tail after cancelling a pending work item, if it was hashed. Cc: stable@vger.kernel.org # 5.7+ Reported-by: Dmitry Shulyak <yashulyak@gmail.com> Fixes: 86f3cd1b589a1 ("io-wq: handle hashed writes in chains") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-23io_uring: don't recurse on tsk->sighand->siglock with signalfdJens Axboe1-6/+16
If an application is doing reads on signalfd, and we arm the poll handler because there's no data available, then the wakeup can recurse on the tasks sighand->siglock as the signal delivery from task_work_add() will use TWA_SIGNAL and that attempts to lock it again. We can detect the signalfd case pretty easily by comparing the poll->head wait_queue_head_t with the target task signalfd wait queue. Just use normal task wakeup for this case. Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20io_uring: kill extra iovec=NULL in import_iovec()Pavel Begunkov1-3/+1
If io_import_iovec() returns an error, return iovec is undefined and must not be used, so don't set it to NULL when failing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20io_uring: comment on kfree(iovec) checksPavel Begunkov1-0/+2
kfree() handles NULL pointers well, but io_{read,write}() checks it because of performance reasons. Leave a comment there for those who are tempted to patch it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20io_uring: fix racy req->flags modificationPavel Begunkov1-44/+17
Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and io_cqring_overflow_flush() are racy, because they might be called asynchronously. REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can be guaranteed that requests _currently_ marked inflight can't be overflown, the problem will be solved with removing the flag altogether. That's how the patch works, it removes inflight status of a request in io_cqring_fill_event() whenever it should be thrown into CQ-overflow list. That's Ok to do, because no opcode specific handling can be done after io_cqring_fill_event(), the same assumption as with "struct io_completion" patches. And it already have a good place for such cleanups, which is io_clean_op(). A nice side effect of this is removing this inflight check from the hot path. note on synchronisation: now __io_cqring_fill_event() may be taking two spinlocks simultaneously, completion_lock and inflight_lock. It's fine, because we never do that in reverse order, and CQ-overflow of inflight requests shouldn't happen often. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-19io_uring: use system_unbound_wq for ring exit workJens Axboe1-1/+7
We currently use system_wq, which is unbounded in terms of number of workers. This means that if we're exiting tons of rings at the same time, then we'll briefly spawn tons of event kworkers just for a very short blocking time as the rings exit. Use system_unbound_wq instead, which has a sane cap on the concurrency level. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-18io_uring: cleanup io_import_iovec() of pre-mapped requestJens Axboe1-14/+14
io_rw_prep_async() goes through a dance of clearing req->io, calling the iovec import, then re-setting req->io. Provide an internal helper that does the right thing without needing state tweaked to get there. This enables further cleanups in io_read, io_write, and io_resubmit_prep(), but that's left for another time. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-16io_uring: get rid of kiocb_wait_page_queue_init()Jens Axboe1-30/+11
The 5.9 merge moved this function io_uring, which means that we don't need to retain the generic nature of it. Clean up this part by removing redundant checks, and just inlining the small remainder in io_rw_should_retry(). No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-16io_uring: find and cancel head link async work on files exitJens Axboe1-4/+29
Commit f254ac04c874 ("io_uring: enable lookup of links holding inflight files") only handled 2 out of the three head link cases we have, we also need to lookup and cancel work that is blocked in io-wq if that work has a link that's holding a reference to the files structure. Put the "cancel head links that hold this request pending" logic into io_attempt_cancel(), which will to through the motions of finding and canceling head links that hold the current inflight files stable request pending. Cc: stable@vger.kernel.org Reported-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-16Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-blockLinus Torvalds1-153/+386
Pull io_uring fixes from Jens Axboe: "A few differerent things in here. Seems like syzbot got some more io_uring bits wired up, and we got a handful of reports and the associated fixes are in here. General fixes too, and a lot of them marked for stable. Lastly, a bit of fallout from the async buffered reads, where we now more easily trigger short reads. Some applications don't really like that, so the io_read() code now handles short reads internally, and got a cleanup along the way so that it's now easier to read (and documented). We're now passing tests that failed before" * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block: io_uring: short circuit -EAGAIN for blocking read attempt io_uring: sanitize double poll handling io_uring: internally retry short reads io_uring: retain iov_iter state over io_read/io_write calls task_work: only grab task signal lock when needed io_uring: enable lookup of links holding inflight files io_uring: fail poll arm on queue proc failure io_uring: hold 'ctx' reference around task_work queue + execute fs: RWF_NOWAIT should imply IOCB_NOIO io_uring: defer file table grabbing request cleanup for locked requests io_uring: add missing REQ_F_COMP_LOCKED for nested requests io_uring: fix recursive completion locking on oveflow flush io_uring: use TWA_SIGNAL for task_work uncondtionally io_uring: account locked memory before potential error case io_uring: set ctx sq/cq entry count earlier io_uring: Fix NULL pointer dereference in loop_rw_iter() io_uring: add comments on how the async buffered read retry works io_uring: io_async_buf_func() need not test page bit
2020-08-15io_uring: short circuit -EAGAIN for blocking read attemptJens Axboe1-1/+4
One case was missed in the short IO retry handling, and that's hitting -EAGAIN on a blocking attempt read (eg from io-wq context). This is a problem on sockets that are marked as non-blocking when created, they don't carry any REQ_F_NOWAIT information to help us terminate them instead of perpetually retrying. Fixes: 227c0c9673d8 ("io_uring: internally retry short reads") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15io_uring: sanitize double poll handlingJens Axboe1-9/+25
There's a bit of confusion on the matching pairs of poll vs double poll, depending on if the request is a pure poll (IORING_OP_POLL_ADD) or poll driven retry. Add io_poll_get_double() that returns the double poll waitqueue, if any, and io_poll_get_single() that returns the original poll waitqueue. With that, remove the argument to io_poll_remove_double(). Finally ensure that wait->private is cleared once the double poll handler has run, so that remove knows it's already been seen. Cc: stable@vger.kernel.org # v5.8 Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linuxLinus Torvalds3-62/+17
Pull 9p updates from Dominique Martinet: - some code cleanup - a couple of static analysis fixes - setattr: try to pick a fid associated with the file rather than the dentry, which might sometimes matter * tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux: 9p: Remove unneeded cast from memory allocation 9p: remove unused code in 9p net/9p: Fix sparse endian warning in trans_fd.c 9p: Fix memory leak in v9fs_mount 9p: retrieve fid from file when file instance exist.
2020-08-15Merge tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds3-2/+4
Pull cifs fixes from Steve French: "Three small cifs/smb3 fixes, one for stable fixing mkdir path with the 'idsfromsid' mount option" * tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: SMB3: Fix mkdir when idsfromsid configured on mount cifs: Convert to use the fallthrough macro cifs: Fix an error pointer dereference in cifs_mount()
2020-08-15Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds23-121/+2274
Pull NFS client updates from Trond Myklebust: "Stable fixes: - pNFS: Don't return layout segments that are being used for I/O - pNFS: Don't move layout segments off the active list when being used for I/O Features: - NFS: Add support for user xattrs through the NFSv4.2 protocol - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC - NFSv4.0 allow nconnect for v4.0 Bugfixes and cleanups: - nfs: ensure correct writeback errors are returned on close() - nfs: nfs_file_write() should check for writeback errors - nfs: Fix getxattr kernel panic and memory overflow - NFS: Fix the pNFS/flexfiles mirrored read failover code - SUNRPC: dont update timeout value on connection reset - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS - sunrpc: destroy rpc_inode_cachep after unregister_filesystem" * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits) NFS: Fix flexfiles read failover fs: nfs: delete repeated words in comments rpc_pipefs: convert comma to semicolon nfs: Fix getxattr kernel panic and memory overflow NFS: Don't return layout segments that are in use NFS: Don't move layouts to plh_return_segs list while in use NFS: Add layout segment info to pnfs read/write/commit tracepoints NFS: Add tracepoints for layouterror and layoutstats. NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close() SUNRPC dont update timeout value on connection reset nfs: nfs_file_write() should check for writeback errors nfs: ensure correct writeback errors are returned on close() NFSv4.2: xattr cache: get rid of cache discard work queue NFS: remove redundant initialization of variable result NFSv4.0 allow nconnect for v4.0 freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS sunrpc: destroy rpc_inode_cachep after unregister_filesystem NFSv4.2: add client side xattr caching. NFSv4.2: hook in the user extended attribute handlers NFSv4.2: add the extended attribute proc functions. ...
2020-08-14fs: autofs: delete repeated words in commentsRandy Dunlap1-2/+2
Drop duplicated words {the, at} in comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ian Kent <raven@themaw.net> Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14exec: restore EACCES of S_ISDIR execve()Kees Cook1-1/+3
Patch series "Fix S_ISDIR execve() errno". Fix an errno change for execve() of directories, noticed by Marc Zyngier. Along with the fix, include a regression test to avoid seeing this return in the future. This patch (of 2): The return code for attempting to execute a directory has always been EACCES. Adjust the S_ISDIR exec test to reflect the old errno instead of the general EISDIR for other kinds of "open" attempts on directories. Fixes: 633fb6ac3980 ("exec: move S_ISREG() check earlier") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Greg Kroah-Hartman <gregkh@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@google.com> Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-13SMB3: Fix mkdir when idsfromsid configured on mountSteve French1-0/+1
mkdir uses a compounded create operation which was not setting the security descriptor on create of a directory. Fix so mkdir now sets the mode and owner info properly when idsfromsid and modefromsid are configured on the mount. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> # v5.8 Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-08-13io_uring: internally retry short readsJens Axboe1-39/+70
We've had a few application cases of not handling short reads properly, and it is understandable as short reads aren't really expected if the application isn't doing non-blocking IO. Now that we retain the iov_iter over retries, we can implement internal retry pretty trivially. This ensures that we don't return a short read, even for buffered reads on page cache conflicts. Cleanup the deep nesting and hard to read nature of io_read() as well, it's much more straight forward now to read and understand. Added a few comments explaining the logic as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-13io_uring: retain iov_iter state over io_read/io_write callsJens Axboe1-66/+70
Instead of maintaining (and setting/remembering) iov_iter size and segment counts, just put the iov_iter in the async part of the IO structure. This is mostly a preparation patch for doing appropriate internal retries for short reads, but it also cleans up the state handling nicely and simplifies it quite a bit. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-13Merge tag 'for-5.9-tag' of ↵Linus Torvalds9-25/+65
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs updates from David Sterba: "One minor update, the rest are fixes that have arrived a bit late for the first batch. There are also some recent fixes for bugs that were discovered during the merge window and pop up during testing. User visible change: - show correct subvolume path in /proc/mounts for bind mounts Fixes: - fix compression messages when remounting with different level or compression algorithm - tree-log: fix some memory leaks on error handling paths - restore I_VERSION on remount - fix return values and error code mixups - fix umount crash with quotas enabled when removing sysfs files - fix trim range on a shrunk device" * tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: trim: fix underflow in trim length to prevent access beyond device boundary btrfs: fix return value mixup in btrfs_get_extent btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() btrfs: check correct variable after allocation in btrfs_backref_iter_alloc btrfs: make sure SB_I_VERSION doesn't get unset by remount btrfs: fix memory leaks after failure to lookup checksums during inode logging btrfs: don't show full path of bind mounts in subvol= btrfs: fix messages after changing compression level by remount btrfs: only search for left_info if there is no right_info in try_merge_free_space btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
2020-08-13Merge tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds15-19/+21
Pull xfs fixes from Darrick Wong: "Two small fixes that have come in during the past week: - Fix duplicated words in comments - Fix an ubsan complaint about null pointer arithmetic" * tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init xfs: delete duplicated words + other fixes
2020-08-13Merge tag 'exfat-for-5.9-rc1' of ↵Linus Torvalds10-116/+121
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - don't clear MediaFailure and VolumeDirty bit in volume flags if these were already set before mounting - write multiple dirty buffers at once in sync mode - remove unneeded EXFAT_SB_DIRTY bit set * tag 'exfat-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: retain 'VolumeFlags' properly exfat: optimize exfat_zeroed_cluster() exfat: add error check when updating dir-entries exfat: write multiple sectors at once exfat: remove EXFAT_SB_DIRTY flag
2020-08-12io_uring: enable lookup of links holding inflight filesJens Axboe1-10/+87
When a process exits, we cancel whatever requests it has pending that are referencing the file table. However, if a link is holding a reference, then we cannot find it by simply looking at the inflight list. Enable checking of the poll and timeout list to find the link, and cancel it appropriately. Cc: stable@vger.kernel.org Reported-by: Josef <josef.grieb@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-12Merge tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds14-103/+482
Pull ceph updates from Ilya Dryomov: "Xiubo has completed his work on filesystem client metrics, they are sent to all available MDSes once per second now. Other than that, we have a lot of fixes and cleanups all around the filesystem, including a tweak to cut down on MDS request resends in multi-MDS setups from Yanhu and fixups for SELinux symlink labeling and MClientSession message decoding from Jeff" * tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client: (22 commits) ceph: handle zero-length feature mask in session messages ceph: use frag's MDS in either mode ceph: move sb->wb_pagevec_pool to be a global mempool ceph: set sec_context xattr on symlink creation ceph: remove redundant initialization of variable mds ceph: fix use-after-free for fsc->mdsc ceph: remove unused variables in ceph_mdsmap_decode() ceph: delete repeated words in fs/ceph/ ceph: send client provided metric flags in client metadata ceph: periodically send perf metrics to MDSes ceph: check the sesion state and return false in case it is closed libceph: replace HTTP links with HTTPS ones ceph: remove unnecessary cast in kfree() libceph: just have osd_req_op_init() return a pointer ceph: do not access the kiocb after aio requests ceph: clean up and optimize ceph_check_delayed_caps() ceph: fix potential mdsc use-after-free crash ceph: switch to WARN_ON_ONCE in encode_supported_features() ceph: add global total_caps to count the mdsc's total caps number ceph: add check_session_state() helper and make it global ...
2020-08-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds37-315/+386
Merge more updates from Andrew Morton: - most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction, mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util, memory-hotplug, cleanups, uaccess, migration, gup, pagemap), - various other subsystems (alpha, misc, sparse, bitmap, lib, bitops, checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump, exec, kdump, rapidio, panic, kcov, kgdb, ipc). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits) mm/gup: remove task_struct pointer for all gup code mm: clean up the last pieces of page fault accountings mm/xtensa: use general page fault accounting mm/x86: use general page fault accounting mm/sparc64: use general page fault accounting mm/sparc32: use general page fault accounting mm/sh: use general page fault accounting mm/s390: use general page fault accounting mm/riscv: use general page fault accounting mm/powerpc: use general page fault accounting mm/parisc: use general page fault accounting mm/openrisc: use general page fault accounting mm/nios2: use general page fault accounting mm/nds32: use general page fault accounting mm/mips: use general page fault accounting mm/microblaze: use general page fault accounting mm/m68k: use general page fault accounting mm/ia64: use general page fault accounting mm/hexagon: use general page fault accounting mm/csky: use general page fault accounting ...
2020-08-12mm/gup: remove task_struct pointer for all gup codePeter Xu1-1/+1
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>