summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2017-05-01Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-blockLinus Torvalds29-248/+447
Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
2017-04-28blk-mq: unify hctx delay_work and run_workJens Axboe1-2/+1
The only difference between ->run_work and ->delay_work, is that the latter is used to defer running a queue. This is done by marking the queue stopped, and scheduling ->delay_work to run sometime in the future. While the queue is stopped, direct runs or runs through ->run_work will not run the queue. If we combine the handlers, then we need to handle two things: 1) If a delayed/stopped run is scheduled, then we should not run the queue before that has been completed. 2) If a queue is delayed/stopped, the handler needs to restart the queue. Normally a run of a queue with the stopped bit set would be a no-op. Case 1 is handled by modifying a currently pending queue run to the deadline set by the caller of blk_mq_delay_queue(). Subsequent attempts to queue a queue run will find the work item already pending, and direct runs will see a stopped queue as before. Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN, that tells the work handler that it should clear a stopped queue and run the handler. Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-28block: add kblock_mod_delayed_work_on()Jens Axboe1-0/+1
This modifies (or adds, if not currently pending) an existing delayed work item. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-28blk-mq: unify hctx delayed_run_work and run_workJens Axboe1-2/+1
They serve the exact same purpose. Get rid of the non-delayed work variant, and just run it without delay for the normal case. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-27Merge branch 'for-linus' of ↵Linus Torvalds1-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - fix orangefs handling of faults on write() - I'd missed that one back when orangefs was going through review. - readdir counterpart of "9p: cope with bogus responses from server in p9_client_{read,write}" - server might be lying or broken, and we'd better not overrun the kmalloc'ed buffer we are copying the results into. - NFS O_DIRECT read/write can leave iov_iter advanced by too much; that's what had been causing iov_iter_pipe() warnings davej had been seeing. - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should go in before 4.11. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: uapi: change the type of struct statx_timestamp.tv_nsec to unsigned fix nfs O_DIRECT advancing iov_iter too much p9_client_readdir() fix orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
2017-04-26uapi: change the type of struct statx_timestamp.tv_nsec to unsignedDmitry V. Levin1-6/+2
The comment asserting that the value of struct statx_timestamp.tv_nsec must be negative when statx_timestamp.tv_sec is negative, is wrong, as could be seen from the following example: #define _FILE_OFFSET_BITS 64 #include <assert.h> #include <fcntl.h> #include <stdio.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> #include <linux/stat.h> int main(void) { static const struct timespec ts[2] = { { .tv_nsec = UTIME_OMIT }, { .tv_sec = -2, .tv_nsec = 42 } }; assert(utimensat(AT_FDCWD, ".", ts, 0) == 0); struct stat st; assert(stat(".", &st) == 0); printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n", (long long) st.st_mtim.tv_sec, (unsigned long) st.st_mtim.tv_nsec); struct statx stx; assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0); printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n", (long long) stx.stx_mtime.tv_sec, (unsigned long) stx.stx_mtime.tv_nsec); return 0; } It expectedly prints: st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42 stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42 The more generic comment asserting that the value of struct statx_timestamp.tv_nsec might be negative is confusing to say the least. It contradicts both the struct stat.st_[acm]time_nsec tradition and struct timespec.tv_nsec requirements in utimensat syscall. If statx syscall ever returns a stx_[acm]time containing a negative tv_nsec that cannot be passed unmodified to utimensat syscall, it will cause an immense confusion. Fix this source of confusion by changing the type of struct statx_timestamp.tv_nsec from __s32 to __u32. Fixes: a528d35e8bfc ("statx: Add a system call to make enhanced file info available") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-api@vger.kernel.org cc: mtk.manpages@gmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26blk-mq: Add blk_mq_ops.show_rq()Bart Van Assche1-0/+8
This new callback function will be used in the next patch to show more information about SCSI requests. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26net: phy: fix auto-negotiation stall due to unavailable interruptAlexander Kochetkov1-0/+1
The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet cable was plugged before the Ethernet interface was brought up. The patch trigger PHY state machine to update link state if PHY was requested to do auto-negotiation and auto-negotiation complete flag already set. During power-up cycle the PHY do auto-negotiation, generate interrupt and set auto-negotiation complete flag. Interrupt is handled by PHY state machine but doesn't update link state because PHY is in PHY_READY state. After some time MAC bring up, start and request PHY to do auto-negotiation. If there are no new settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation. PHY continue to stay in auto-negotiation complete state and doesn't fire interrupt. At the same time PHY state machine expect that PHY started auto-negotiation and is waiting for interrupt from PHY and it won't get it. Fixes: 321beec5047a ("net: phy: Use interrupts when available in NOLINK state") Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Cc: stable <stable@vger.kernel.org> # v4.9+ Tested-by: Roger Quadros <rogerq@ti.com> Tested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes from David Miller: 1) Don't race in IPSEC dumps, from Yuejie Shi. 2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu. 3) Fix out of bounds access in ipv6 segment routing code, from David Lebrun. 4) Don't write into the header of cloned SKBs in smsc95xx driver, from James Hughes. 5) Several other drivers have this bug too, fix them. From Eric Dumazet. 6) Fix access to uninitialized data in TC action cookie code, from Wolfgang Bumiller. 7) Fix double free in IPV6 segment routing, again from David Lebrun. 8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern. 9) Fix use after free in qrtr code, from Dan Carpenter. 10) Don't double-destroy devices in ip6mr code, from Nikolay Aleksandrov. 11) Don't pass out-of-range TX queue indices into drivers, from Tushar Dave. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) netpoll: Check for skb->queue_mapping ip6mr: fix notification device destruction bpf, doc: update bpf maintainers entry net: qrtr: potential use after free in qrtr_sendmsg() bpf: Fix values type used in test_maps net: ipv6: RTF_PCPU should not be settable from userspace gso: Validate assumption of frag_list segementation kaweth: use skb_cow_head() to deal with cloned skbs ch9200: use skb_cow_head() to deal with cloned skbs lan78xx: use skb_cow_head() to deal with cloned skbs sr9700: use skb_cow_head() to deal with cloned skbs cx82310_eth: use skb_cow_head() to deal with cloned skbs smsc75xx: use skb_cow_head() to deal with cloned skbs ipv6: sr: fix double free of skb after handling invalid SRH MAINTAINERS: Add "B:" field for networking. net sched actions: allocate act cookie early qed: Fix issue in populating the PFC config paramters. qed: Fix possible system hang in the dcbnl-getdcbx() path. qed: Fix sending an invalid PFC error mask to MFW. qed: Fix possible error in populating max_tc field. ...
2017-04-21block: get rid of blk_integrity_revalidate()Ilya Dryomov1-2/+0
Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") introduced blk_integrity_revalidate(), which seems to assume ownership of the stable pages flag and unilaterally clears it if no blk_integrity profile is registered: if (bi->profile) disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; else disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; It's called from revalidate_disk() and rescan_partitions(), making it impossible to enable stable pages for drivers that support partitions and don't use blk_integrity: while the call in revalidate_disk() can be trivially worked around (see zram, which doesn't support partitions and hence gets away with zram_revalidate_disk()), rescan_partitions() can be triggered from userspace at any time. This breaks rbd, where the ceph messenger is responsible for generating/verifying CRCs. Since blk_integrity_{un,}register() "must" be used for (un)registering the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES setting there. This way drivers that call blk_integrity_register() and use integrity infrastructure won't interfere with drivers that don't but still want stable pages. Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.4+, needs backporting Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-21net: ipv6: RTF_PCPU should not be settable from userspaceDavid Ahern1-1/+1
Andrey reported a fault in the IPv6 route code: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880069809600 task.stack: ffff880062dc8000 RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975 RSP: 0018:ffff880062dced30 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006 RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018 RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0 Call Trace: ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212 ... Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit set. Flags passed to the kernel are blindly copied to the allocated rt6_info by ip6_route_info_create making a newly inserted route appear as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set and expects rt->dst.from to be set - which it is not since it is not really a per-cpu copy. The subsequent call to __ip6_dst_alloc then generates the fault. Fix by checking for the flag and failing with EINVAL. Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge tag 'mmc-v4.11-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - kmalloc sdio scratch buffer to make it DMA-friendly MMC host: - dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used - sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50 cards" * tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card mmc: dw_mmc: Don't allow Runtime PM for SDIO cards mmc: sdio: fix alignment issue in struct sdio_func
2017-04-21nvmet_fc: Rework target side abort handlingJames Smart1-9/+16
target transport: ---------------------- There are cases when there is a need to abort in-progress target operations (writedata) so that controller termination or errors can clean up. That can't happen currently as the abort is another target op type, so it can't be used till the running one finishes (and it may not). Solve by removing the abort op type and creating a separate downcall from the transport to the lldd to request an io to be aborted. The transport will abort ios on queue teardown or io errors. In general the transport tries to call the lldd abort only when the io state is idle. Meaning: ops that transmit data (readdata or rsp) will always finish their transmit (or the lldd will see a state on the link or initiator port that fails the transmit) and the done call for the operation will occur. The transport will wait for the op done upcall before calling the abort function, and as the io is idle, the io can be cleaned up immediately after the abort call; Similarly, ios that are not waiting for data or transmitting data must be in the nvmet layer being processed. The transport will wait for the nvmet layer completion before calling the abort function, and as the io is idle, the io can be cleaned up immediately after the abort call; As for ops that are waiting for data (writedata), they may be outstanding indefinitely if the lldd doesn't see a condition where the initiatior port or link is bad. In those cases, the transport will call the abort function and wait for the lldd's op done upcall for the operation, where it will then clean up the io. Additionally, if a lldd receives an ABTS and matches it to an outstanding request in the transport, A new new transport upcall was created to abort the outstanding request in the transport. The transport expects any outstanding op call (readdata or writedata) will completed by the lldd and the operation upcall made. The transport doesn't act on the reported abort (e.g. clean up the io) until an op done upcall occurs, a new op is attempted, or the nvmet layer completes the io processing. fcloop: ---------------------- Updated to support the new target apis. On fcp io aborts from the initiator, the loopback context is updated to NULL out the half that has completed. The initiator side is immediately called after the abort request with an io completion (abort status). On fcp io aborts from the target, the io is stopped and the initiator side sees it as an aborted io. Target side ops, perhaps in progress while the initiator side is done, continue but noop the data movement as there's no structure on the initiator side to reference. patch also contains: ---------------------- Revised lpfc to support the new abort api commonized rsp buffer syncing and nulling of private data based on calling paths. errors in op done calls don't take action on the fod. They're bad operations which implies the fod may be bad. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21nvmet_fc: add req_release to lldd apiJames Smart1-22/+35
With the advent of the opdone calls changing context, the lldd can no longer assume that once the op->done call returns for RSP operations that the request struct is no longer being accessed. As such, revise the lldd api for a req_release callback that the transport will call when the job is complete. This will also be used with abort cases. Fixed text in api header for change in io complete semantics. Revised lpfc to support the new req_release api. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21nvmet_fc: add target feature flags for upcall isr contextsJames Smart1-0/+16
Two new feature flags were added to control whether upcalls to the transport result in context switches or stay in the calling context. NVMET_FCTGTFEAT_CMD_IN_ISR: By default, if the flag is not set, the transport assumes the lldd is in a non-isr context and in the cpu context it should be for the io queue. As such, the cmd handler is called directly in the calling context. If the flag is set, indicating the upcall is an isr context, the transport mandates a transition to a workqueue. The workqueue assigned to the queue is used for the context. NVMET_FCTGTFEAT_OPDONE_IN_ISR By default, if the flag is not set, the transport assumes the lldd is in a non-isr context and in the cpu context it should be for the io queue. As such, the fcp operation done callback is called directly in the calling context. If the flag is set, indicating the upcall is an isr context, the transport mandates a transition to a workqueue. The workqueue assigned to the queue is used for the context. Updated lpfc for flags Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-21nvme: improve performance for virtual NVMe devicesHelen Koike1-0/+13
This change provides a mechanism to reduce the number of MMIO doorbell writes for the NVMe driver. When running in a virtualized environment like QEMU, the cost of an MMIO is quite hefy here. The main idea for the patch is provide the device two memory location locations: 1) to store the doorbell values so they can be lookup without the doorbell MMIO write 2) to store an event index. I believe the doorbell value is obvious, the event index not so much. Similar to the virtio specification, the virtual device can tell the driver (guest OS) not to write MMIO unless you are writing past this value. FYI: doorbell values are written by the nvme driver (guest OS) and the event index is written by the virtual device (host OS). The patch implements a new admin command that will communicate where these two memory locations reside. If the command fails, the nvme driver will work as before without any optimizations. Contributions: Eric Northup <digitaleric@google.com> Frank Swiderski <fes@google.com> Ted Tso <tytso@mit.edu> Keith Busch <keith.busch@intel.com> Just to give an idea on the performance boost with the vendor extension: Running fio [1], a stock NVMe driver I get about 200K read IOPs with my vendor patch I get about 1000K read IOPs. This was running with a null device i.e. the backing device simply returned success on every read IO request. [1] Running on a 4 core machine: fio --time_based --name=benchmark --runtime=30 --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k --randrepeat=false Signed-off-by: Rob Nelson <rlnelson@google.com> [mlin: port for upstream] Signed-off-by: Ming Lin <mlin@kernel.org> [koike: updated for upstream] Signed-off-by: Helen Koike <helen.koike@collabora.co.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2017-04-20blk-mq: Fix poll_stat for new size-based bucketing.Stephen Bates1-1/+4
Fixes an issue where the size of the poll_stat array in request_queue does not match the size expected by the new size based bucketing for IO completion polling. Fixes: 720b8ccc4500 ("blk-mq: Add a polling specific stats function") Signed-off-by: Stephen Bates <sbates@raithlin.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20block: remove the errors field from struct requestChristoph Hellwig2-12/+7
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20blktrace: remove the unused block_rq_abort tracepointChristoph Hellwig1-34/+10
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20block: add a error_count field to struct requestChristoph Hellwig1-0/+1
This is for the legacy floppy and ataflop drivers that currently abuse ->errors for this purpose. It's stashed away in a union to not grow the struct size, the other fields are either used by modern drivers for different purposes or the I/O scheduler before queing the I/O to drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20blk-mq: remove the error argument to blk_mq_complete_requestChristoph Hellwig1-1/+1
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20scsi: introduce a result field in struct scsi_requestChristoph Hellwig2-1/+2
This passes on the scsi_cmnd result field to users of passthrough requests. Currently we abuse req->errors for this purpose, but that field will go away in its current form. Note that the old IDE code abuses the errors field in very creative ways and stores all kinds of different values in it. I didn't dare to touch this magic, so the abuses are brought forward 1:1. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20block: remove the blk_execute_rq return valueChristoph Hellwig1-1/+1
The function only returns -EIO if rq->errors is non-zero, which is not very useful and lets a large number of callers ignore the return value. Just let the callers figure out their error themselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20bdi: Drop 'parent' argument from bdi_register[_va]()Jan Kara1-5/+4
Drop 'parent' argument of bdi_register() and bdi_register_va(). It is always NULL. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20block: Remove unused functionsJan Kara1-5/+0
Now that all backing_dev_info structure are allocated separately, we can drop some unused functions. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20fs: Remove SB_I_DYNBDI flagJan Kara1-3/+0
Now that all bdi structures filesystems use are properly refcounted, we can remove the SB_I_DYNBDI flag. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20nfs: Convert to separately allocated bdiJan Kara1-1/+0
Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Anna Schumaker <anna.schumaker@netapp.com> CC: linux-nfs@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20coda: Convert to separately allocated bdiJan Kara1-1/+0
Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Jan Harkes <jaharkes@cs.cmu.edu> CC: coda@cs.cmu.edu CC: codalist@coda.cs.cmu.edu Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20mtd: Convert to dynamically allocated bdi infrastructureJan Kara1-5/+0
MTD already allocates backing_dev_info dynamically. Convert it to use generic infrastructure for this including proper refcounting. We drop mtd->backing_dev_info as its only use was to pass mtd_bdi pointer from one file into another and if we wanted to keep that in a clean way, we'd have to make mtd hold and drop bdi reference as needed which seems pointless for passing one global pointer... CC: David Woodhouse <dwmw2@infradead.org> CC: Brian Norris <computersforpeace@gmail.com> CC: linux-mtd@lists.infradead.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20fs: Provide infrastructure for dynamic BDIs in filesystemsJan Kara2-1/+7
Provide helper functions for setting up dynamically allocated backing_dev_info structures for filesystems and cleaning them up on superblock destruction. CC: linux-mtd@lists.infradead.org CC: linux-nfs@vger.kernel.org CC: Petr Vandrovec <petr@vandrovec.name> CC: linux-nilfs@vger.kernel.org CC: cluster-devel@redhat.com CC: osd-dev@open-osd.org CC: codalist@coda.cs.cmu.edu CC: linux-afs@lists.infradead.org CC: ecryptfs@vger.kernel.org CC: linux-cifs@vger.kernel.org CC: ceph-devel@vger.kernel.org CC: linux-btrfs@vger.kernel.org CC: v9fs-developer@lists.sourceforge.net CC: lustre-devel@lists.lustre.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20bdi: Provide bdi_register_va() and bdi_alloc()Jan Kara1-0/+6
Add function that registers bdi and takes va_list instead of variable number of arguments. Add bdi_alloc() as simple wrapper for NUMA-unaware users allocating BDI. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19block: Inline blk_rq_set_prio()Bart Van Assche1-14/+0
Since only a single caller remains, inline blk_rq_set_prio(). Initialize req->ioprio even if no I/O priority has been set in the bio nor in the I/O context. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Adam Manzanares <adam.manzanares@wdc.com> Tested-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19block: Export blk_init_request_from_bio()Bart Van Assche1-0/+1
Export this function such that it becomes available to block drivers. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Matias Bjørling <m@bjorling.me> Cc: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19block: remove blk_end_request_curChristoph Hellwig1-1/+0
This function is not used anywhere in the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19block: remove blk_end_request_err and __blk_end_request_errChristoph Hellwig1-2/+0
Both functions are entirely unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19block, bfq: add full hierarchical scheduling and cgroups supportArianna Avanzini1-1/+1
Add complete support for full hierarchical scheduling, with a cgroups interface. Full hierarchical scheduling is implemented through the 'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues associated with processes, and groups are represented in general by entities. Given the bfq_queues associated with the processes belonging to a given group, the entities representing these queues are sons of the entity representing the group. At higher levels, if a group, say G, contains other groups, then the entity representing G is the parent entity of the entities representing the groups in G. Hierarchical scheduling is performed as follows: if the timestamps of a leaf entity (i.e., of a bfq_queue) change, and such a change lets the entity become the next-to-serve entity for its parent entity, then the timestamps of the parent entity are recomputed as a function of the budget of its new next-to-serve leaf entity. If the parent entity belongs, in its turn, to a group, and its new timestamps let it become the next-to-serve for its parent entity, then the timestamps of the latter parent entity are recomputed as well, and so on. When a new bfq_queue must be set in service, the reverse path is followed: the next-to-serve highest-level entity is chosen, then its next-to-serve child entity, and so on, until the next-to-serve leaf entity is reached, and the bfq_queue that this entity represents is set in service. Writeback is accounted for on a per-group basis, i.e., for each group, the async I/O requests of the processes of the group are enqueued in a distinct bfq_queue, and the entity associated with this queue is a child of the entity associated with the group. Weights can be assigned explicitly to groups and processes through the cgroups interface, differently from what happens, for single processes, if the cgroups interface is not used (as explained in the description of the previous patch). In particular, since each node has a full scheduler, each group can be assigned its own weight. Signed-off-by: Fabio Checconi <fchecconi@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-18mmc: sdio: fix alignment issue in struct sdio_funcHeiner Kallweit1-1/+1
Certain 64-bit systems (e.g. Amlogic Meson GX) require buffers to be used for DMA to be 8-byte-aligned. struct sdio_func has an embedded small DMA buffer not meeting this requirement. When testing switching to descriptor chain mode in meson-gx driver SDIO is broken therefore. Fix this by allocating the small DMA buffer separately as kmalloc ensures that the returned memory area is properly aligned for every basic data type. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Helmut Klein <hgkr.klein@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-04-18Merge branch 'linus' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following problems: - regression in new XTS/LRW code when used with async crypto - long-standing bug in ahash API when used with certain algos - bogus memory dereference in async algif_aead with certain algos" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algif_aead - Fix bogus request dereference in completion function crypto: ahash - Fix EINPROGRESS notification callback crypto: lrw - Fix use-after-free on EINPROGRESS crypto: xts - Fix use-after-free on EINPROGRESS
2017-04-17nbd: add a flag to destroy an nbd device on disconnectJosef Bacik1-1/+5
For ease of management it would be nice for users to specify that the device node for a nbd device is destroyed once it is disconnected and there are no more users. Add a client flag and enable this operation to happen. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a status netlink commandJosef Bacik1-0/+25
Allow users to query the status of existing nbd devices. Right now this only returns whether or not the device is connected, but could be extended in the future to include more information. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: handle dead connectionsJosef Bacik1-0/+1
Sometimes we like to upgrade our server without making all of our clients freak out and reconnect. This patch provides a way to specify a dead connection timeout to allow us to pause all requests and wait for new connections to be opened. With this in place I can take down the nbd server for less than the dead connection timeout time and bring it back up and everything resumes gracefully. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: multicast dead link notificationsJosef Bacik1-2/+4
Provide a mechanism to notify userspace that there's been a link problem on a NBD device. This will allow userspace to re-establish a connection and provide the new socket to the device without disrupting the device. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a reconfigure netlink commandJosef Bacik1-0/+1
We want to be able to reconnect dead connections to existing block devices, so add a reconfigure netlink command. We will also allow users to change their timeout on the fly, but everything else will require a disconnect and reconnect. You won't be able to add more connections either, simply replace dead connections with new more lively connections. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a basic netlink interfaceJosef Bacik1-0/+69
The existing ioctl interface for configuring NBD devices is a bit cumbersome and hard to extend. The other problem is we leave a userspace app sitting in it's syscall until the device disconnects, which is less than ideal. This patch introduces a netlink interface for adding and disconnecting nbd devices. This has the benefits of being easily extendable without breaking older userspace applications, and allows us to configure a nbd device without leaving a userspace app sitting waiting for the device to disconnect. With this interface we also gain the ability to configure more devices than are preallocated at insmod time. We also have gained the ability to not specify a particular device and be provided one for us so that userspace doesn't need to find a free device to configure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16Merge tag 'armsoc-fixes' of ↵Linus Torvalds1-8/+14
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Again, a batch that's been sitting a couple of weeks, mostly because I anticipated a bit more material but it didn't show up -- which is good. These are all your garden variety fixes for ARM platforms. The most visible issue fixed here is probably the SMP reset issue on OMAP, the rest are minor stuff" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: allwinner: a64: add pmu0 regs for USB PHY ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer reset: add exported __reset_control_get, return NULL if optional ARM: orion5x: only call into phylib when available ARM: omap2+: Revert omap-smp.c changes resetting CPU1 during boot ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend ARM: dts: ti: fix PCI bus dtc warnings ARM: dts: am335x-baltos: disable EEE for Atheros 8035 PHY ARM: dts: OMAP3: Fix MFG ID EEPROM ARM: sun8i: a33: add operating-points-v2 property to all nodes ARM: sun8i: a33: remove highest OPP to fix CPU crashes
2017-04-16Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+28
Pull block fixes from Jens Axboe: "Four small fixes. Three of them fix the same error in NVMe, in loop, fc, and rdma respectively. The last fix from Ming fixes a regression in this series, where our bvec gap logic was wrong and causes an oops on NVMe for certain conditions" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix bio_will_gap() for first bvec with offset nvme-fc: Fix sqsize wrong assignment based on ctrl MQES capability nvme-rdma: Fix sqsize wrong assignment based on ctrl MQES capability nvme-loop: Fix sqsize wrong assignment based on ctrl MQES capability
2017-04-16lightnvm: allow to init targets on factory modeJavier González2-1/+6
Target initialization has two responsibilities: creating the target partition and instantiating the target. This patch enables to create a factory partition (e.g., do not trigger recovery on the given target). This is useful for target development and for being able to restore the device state at any moment in time without requiring a full-device erase. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16lightnvm: rename scrambler controller hintJavier González1-1/+1
According to the OCSSD 1.2 specification, the 0x200 hint enables the media scrambler for the read/write opcode, providing that the controller has been correctly configured by the firmware. Rename the macro to represent this meaning. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16lightnvm: submit erases using the I/O pathJavier González1-5/+3
Until now erases have been submitted as synchronous commands through a dedicated erase function. In order to enable targets implementing asynchronous erases, refactor the erase path so that it uses the normal async I/O submission functions. If a target requires sync I/O, it can implement it internally. Also, adapt rrpc to use the new erase path. Signed-off-by: Javier González <javier@cnexlabs.com> Fixed spelling error. Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-14Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a small update to xpad driver to recognize yet another gamepad, and another change making sure userio.h is exported" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - add support for Razer Wildcat gamepad uapi: add missing install of userio.h