summaryrefslogtreecommitdiff
path: root/drivers/nvme/target
AgeCommit message (Collapse)AuthorFilesLines
2021-01-18nvmet: set right status on error in id-ns handlerChaitanya Kulkarni1-2/+6
The function nvmet_execute_identify_ns() doesn't set the status if call to nvmet_find_namespace() fails. In that case we set the status of the request to the value return by the nvmet_copy_sgl(). Set the status to NVME_SC_INVALID_NS and adjust the code such that request will have the right status on nvmet_find_namespace() failure. Without this patch :- NVME Identify Namespace 3: nsze : 0 ncap : 0 nuse : 0 nsfeat : 0 nlbaf : 0 flbas : 0 mc : 0 dpc : 0 dps : 0 nmic : 0 rescap : 0 fpi : 0 dlfeat : 0 nawun : 0 nawupf : 0 nacwu : 0 nabsn : 0 nabo : 0 nabspf : 0 noiob : 0 nvmcap : 0 mssrl : 0 mcl : 0 msrc : 0 nsattr : 0 nvmsetid: 0 anagrpid: 0 endgid : 0 nguid : 00000000000000000000000000000000 eui64 : 0000000000000000 lbaf 0 : ms:0 lbads:0 rp:0 (in use) With this patch-series :- feb3b88b501e (HEAD -> nvme-5.11) nvmet: remove extra variable in identify ns 6302aa67210a nvmet: remove extra variable in id-desclist ed57951da453 nvmet: remove extra variable in smart log nsid be384b8c24dc nvmet: set right status on error in id-ns handler NVMe status: INVALID_NS: The namespace or the format of that namespace is invalid(0xb) Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-14nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANYIsrael Rukshin1-8/+8
When setting port traddr to INADDR_ANY, the listening cm_id->device is NULL. The associate IB device is known only when a connect request event arrives, so checking T10-PI device capability should be done at this stage. Fixes: b09160c3996c ("nvmet-rdma: add metadata/T10-PI support") Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-06nvmet-rdma: Fix list_del corruption on queue establishment failureIsrael Rukshin1-0/+10
When a queue is in NVMET_RDMA_Q_CONNECTING state, it may has some requests at rsp_wait_list. In case a disconnect occurs at this state, no one will empty this list and will return the requests to free_rsps list. Normally nvmet_rdma_queue_established() free those requests after moving the queue to NVMET_RDMA_Q_LIVE state, but in this case __nvmet_rdma_queue_disconnect() is called before. The crash happens at nvmet_rdma_free_rsps() when calling list_del(&rsp->free_list), because the request exists only at the wait list. To fix the issue, simply clear rsp_wait_list when destroying the queue. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-06nvme-fcloop: Fix sscanf type and list_first_entry_or_null warningsJames Smart1-3/+4
Kernel robot had the following warnings: >> fcloop.c:1506:6: warning: %x in format string (no. 1) requires >> 'unsigned int *' but the argument type is 'signed int *'. >> [invalidScanfArgType_int] >> if (sscanf(buf, "%x:%d:%d", &opcode, &starting, &amount) != 3) >> ^ Resolve by changing opcode from and int to an unsigned int and >> fcloop.c:1632:32: warning: Uninitialized variable: lport [uninitvar] >> ret = __wait_localport_unreg(lport); >> ^ >> fcloop.c:1615:28: warning: Uninitialized variable: nport [uninitvar] >> ret = __remoteport_unreg(nport, rport); >> ^ These aren't actual issues as the values are assigned prior to use. It appears the tool doesn't understand list_first_entry_or_null(). Regardless, quiet the tool by initializing the pointers to NULL at declaration. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+2
Pull rdma updates from Jason Gunthorpe: "A smaller set of patches, nothing stands out as being particularly major this cycle. The biggest item would be the new HIP09 HW support from HNS, otherwise it was pretty quiet for new work here: - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw, cxgb4, mlx4 and mlx5 - Bug fixes and polishing for the new rts ULP - Cleanup of uverbs checking for allowed driver operations - Use sysfs_emit all over the place - Lots of bug fixes and clarity improvements for hns - hip09 support for hns - NDR and 50/100Gb signaling rates - Remove dma_virt_ops and go back to using the IB DMA wrappers - mlx5 optimizations for contiguous DMA regions" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (147 commits) RDMA/cma: Don't overwrite sgid_attr after device is released RDMA/mlx5: Fix MR cache memory leak RDMA/rxe: Use acquire/release for memory ordering RDMA/hns: Simplify AEQE process for different types of queue RDMA/hns: Fix inaccurate prints RDMA/hns: Fix incorrect symbol types RDMA/hns: Clear redundant variable initialization RDMA/hns: Fix coding style issues RDMA/hns: Remove unnecessary access right set during INIT2INIT RDMA/hns: WARN_ON if get a reserved sl from users RDMA/hns: Avoid filling sl in high 3 bits of vlan_id RDMA/hns: Do shift on traffic class when using RoCEv2 RDMA/hns: Normalization the judgment of some features RDMA/hns: Limit the length of data copied between kernel and userspace RDMA/mlx4: Remove bogus dev_base_lock usage RDMA/uverbs: Fix incorrect variable type RDMA/core: Do not indicate device ready when device enablement fails RDMA/core: Clean up cq pool mechanism RDMA/core: Update kernel documentation for ib_create_named_qp() MAINTAINERS: SOFT-ROCE: Change Zhu Yanjun's email address ...
2020-12-16Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds8-35/+147
Pull block driver updates from Jens Axboe: "Nothing major in here: - NVMe pull request from Christoph: - nvmet passthrough improvements (Chaitanya Kulkarni) - fcloop error injection support (James Smart) - read-only support for zoned namespaces without Zone Append (Javier González) - improve some error message (Minwoo Im) - reject I/O to offline fabrics namespaces (Victor Gladkov) - PCI queue allocation cleanups (Niklas Schnelle) - remove an unused allocation in nvmet (Amit Engel) - a Kconfig spelling fix (Colin Ian King) - nvme_req_qid simplication (Baolin Wang) - MD pull request from Song: - Fix race condition in md_ioctl() (Dae R. Jeong) - Initialize read_slot properly for raid10 (Kevin Vigor) - Code cleanup (Pankaj Gupta) - md-cluster resync/reshape fix (Zhao Heming) - Move null_blk into its own directory (Damien Le Moal) - null_blk zone and discard improvements (Damien Le Moal) - bcache race fix (Dongsheng Yang) - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang, Lutz Pogrell, Md Haris Iqbal) - lightnvm NULL pointer deref fix (tangzhenhao) - sr in_interrupt() removal (Sebastian Andrzej Siewior) - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included as it made it easier for them to funnel the feature through the block driver tree. - Follow up fixes (Colin Ian King)" * tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits) block: drop dead assignments in loop_init() sr: Remove in_interrupt() usage in sr_init_command(). sr: Switch the sector size back to 2048 if sr_read_sector() changed it. cdrom: Reset sector_size back it is not 2048. drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c null_blk: Move driver into its own directory null_blk: Allow controlling max_hw_sectors limit null_blk: discard zones on reset null_blk: cleanup discard handling null_blk: Improve implicit zone close null_blk: improve zone locking block: Align max_hw_sectors to logical blocksize null_blk: Fail zone append to conventional zones null_blk: Fix zone size initialization bcache: fix race between setting bdev state to none and new write request direct to backing block/rnbd: fix a null pointer dereference on dev->blk_symlink_name block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name block/rnbd: call kobject_put in the failure path Documentation/ABI/rnbd-srv: add document for force_close block/rnbd-srv: close a mapped device from server side. ...
2020-12-07nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock classMing Lei1-0/+10
Set nvme-loop's lock class via blk_mq_hctx_set_fq_lock_class for avoiding lockdep possible recursive locking, then we can remove the dynamically allocated lock class for each flush queue, finally we can avoid horrible SCSI probe delay. This way may not address situation in which one nvme-loop is backed on another nvme-loop. However, in reality, people seldom uses this way for test. Even though someone played in this way, it is just one recursive locking false positive, no real deadlock issue. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Qian Cai <cai@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: switch partition lookup to use struct block_deviceChristoph Hellwig1-10/+10
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01nvmet: fix a spelling mistake "incuding" -> "including" in KconfigColin Ian King1-1/+1
There is a spelling mistake in the Kconfig help text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvmet: make sure discovery change log event is protectedMax Gurtovoy1-0/+1
Generation counter is protected by nvmet_config_sem. Make sure the callers that call functions that might change it, are calling it properly. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvmet: remove unused ctrl->cqsAmit2-14/+2
remove unused cqs from nvmet_ctrl struct this will reduce the allocated memory. Signed-off-by: Amit <amit.engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvmet: use inline bio for passthru fast pathChaitanya Kulkarni2-3/+10
In nvmet_passthru_execute_cmd() which is a high frequency function it uses bio_alloc() which leads to memory allocation from the fs pool for each I/O. For NVMeoF nvmet_req we already have inline_bvec allocated as a part of request allocation that can be used with preallocated bio when we already know the size of request before bio allocation with bio_alloc(), which we already do. Introduce a bio member for the nvmet_req passthru anon union. In the fast path, check if we can get away with inline bvec and bio from nvmet_req with bio_init() call before actually allocating from the bio_alloc(). This will be useful to avoid any new memory allocation under high memory pressure situation and get rid of any extra work of allocation (bio_alloc()) vs initialization (bio_init()) when transfer len is < NVMET_MAX_INLINE_DATA_LEN that user can configure at compile time. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvmet: use blk_rq_bio_prep instead of blk_rq_append_bioChaitanya Kulkarni1-6/+2
The function blk_rq_append_bio() is a genereric API written for all types driver (having bounce buffers) and different context (where request is already having a bio i.e. rq->bio != NULL). It does mainly three things: calculating the segments, bounce queue and if req->bio == NULL call blk_rq_bio_prep() or handle low level merge() case. The NVMe PCIe and fabrics transports currently does not use queue bounce mechanism. In order to find this for each request processing in the passthru blk_rq_append_bio() does extra work in the fast path for each request. When I ran I/Os with different block sizes on the passthru controller I found that we can reuse the req->sg_cnt instead of iterating over the bvecs to find out nr_segs in blk_rq_append_bio(). This calculation in blk_rq_append_bio() is a duplication of work given that we have the value in req->sg_cnt. (correct me here if I'm wrong). With NVMe passthru request based driver we allocate fresh request each time, so every call to blk_rq_append_bio() rq->bio will be NULL i.e. we don't really need the second condition in the blk_rq_append_bio() and the resulting error condition in the caller of blk_rq_append_bio(). So for NVMeOF passthru driver recalculating the segments, bounce check and ll_back_merge code is not needed such that we can get away with the minimal version of the blk_rq_append_bio() which removes the error check in the fast path along with extra variable in nvmet_passthru_map_sg(). This patch updates the nvmet_passthru_map_sg() such that it does only appending the bio to the request in the context of the NVMeOF Passthru driver. Following are perf numbers :- With current implementation (blk_rq_append_bio()) :- ---------------------------------------------------- + 5.80% 0.02% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd + 5.44% 0.01% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd + 4.88% 0.00% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd + 5.44% 0.01% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd + 4.86% 0.01% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd + 5.17% 0.00% kworker/0:2-eve [nvmet] [k] nvmet_passthru_execute_cmd With this patch using blk_rq_bio_prep() :- ---------------------------------------------------- + 3.14% 0.02% kworker/0:2-eve [nvmet] [k] nvmet_passthru_execute_cmd + 3.26% 0.01% kworker/0:2-eve [nvmet] [k] nvmet_passthru_execute_cmd + 5.37% 0.01% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd + 5.18% 0.02% kworker/0:2-eve [nvmet] [k] nvmet_passthru_execute_cmd + 4.84% 0.02% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd + 4.87% 0.01% kworker/0:2-mm_ [nvmet] [k] nvmet_passthru_execute_cmd Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvmet: remove op_flags for passthru commandsChaitanya Kulkarni1-7/+1
For passthru commands setting op_flags has no meaning. Remove the code that sets the op flags in nvmet_passthru_map_sg(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvme: split nvme_alloc_request()Chaitanya Kulkarni1-1/+1
Right now nvme_alloc_request() allocates a request from block layer based on the value of the qid. When qid set to NVME_QID_ANY it used blk_mq_alloc_request() else blk_mq_alloc_request_hctx(). The function nvme_alloc_request() is called from different context, The only place where it uses non NVME_QID_ANY value is for fabrics connect commands :- nvme_submit_sync_cmd() NVME_QID_ANY nvme_features() NVME_QID_ANY nvme_sec_submit() NVME_QID_ANY nvmf_reg_read32() NVME_QID_ANY nvmf_reg_read64() NVME_QID_ANY nvmf_reg_write32() NVME_QID_ANY nvmf_connect_admin_queue() NVME_QID_ANY nvme_submit_user_cmd() NVME_QID_ANY nvme_alloc_request() nvme_keep_alive() NVME_QID_ANY nvme_alloc_request() nvme_timeout() NVME_QID_ANY nvme_alloc_request() nvme_delete_queue() NVME_QID_ANY nvme_alloc_request() nvmet_passthru_execute_cmd() NVME_QID_ANY nvme_alloc_request() nvmf_connect_io_queue() QID __nvme_submit_sync_cmd() nvme_alloc_request() With passthru nvme_alloc_request() now falls into the I/O fast path such that blk_mq_alloc_request_hctx() is never gets called and that adds additional branch check in fast path. Split the nvme_alloc_request() into nvme_alloc_request() and nvme_alloc_request_qid(). Replace each call of the nvme_alloc_request() with NVME_QID_ANY param with a call to newly added nvme_alloc_request() without NVME_QID_ANY. Replace a call to nvme_alloc_request() with QID param with a call to newly added nvme_alloc_request() and nvme_alloc_request_qid() based on the qid value set in the __nvme_submit_sync_cmd(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvmet: add passthru io timeout value attrChaitanya Kulkarni3-1/+23
NVMeOF controller in the passsthru mode is capable of handling wide set of I/O commands including vender specific passhtru io comands. The vendor specific I/O commands are used to read the large drive logs and can take longer than default NVMe commands, i.e. for passthru requests the timeout value may differ from the passthru controller's default timeout values (nvme-core:io_timeout). Add a configfs attribute so that user can set the io timeout values. In case if this configfs value is not set nvme_alloc_request() will set the NVME_IO_TIMEOUT value when request queuedata is NULL. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvmet: add passthru admin timeout value attrChaitanya Kulkarni3-0/+27
NVMeOF controller in the passsthru mode is capable of handling wide set of admin commands including vender specific passhtru admin comands. The vendor specific admin commands are used to read the large drive logs and can take longer than default NVMe commands, i.e. for passthru requests the timeout value may differ from the passthru controller's default timeout values (nvme-core:admin_timeout). Add a configfs attribute so that user can set the admin timeout values. In case if this configfs value is not set nvme_alloc_request() will set the ADMIN_TIMEOUT value when request queuedata is NULL. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvme: use consistent macro name for timeoutChaitanya Kulkarni1-1/+1
This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to make consistent with NVME_IO_TIMEOUT. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01nvme-fcloop: add sysfs attribute to inject command dropJames Smart1-2/+79
Add sysfs attribute to specify parameters for dropping a command. The attribute takes a string of: <opcode>:<starting a what instance>:<number of times> Opcode is formatted as lower 8 bits are opcode. If a fabrics opcode, a bit above bits 7:0 will be set. Once set, each sqe is looked at. If the opcode matches the running instance count is updated. If the instance count is in the range of where to drop (based on starting and # of times), then drop the command by not passing it to the target layer. Signed-off-by: James Smart <james.smart@broadcom.com>
2020-11-17RDMA/core: remove use of dma_virt_opsChristoph Hellwig1-1/+2
Use the ib_dma_* helpers to skip the DMA translation instead. This removes the last user if dma_virt_ops and keeps the weird layering violation inside the RDMA core instead of burderning the DMA mapping subsystems with it. This also means the software RDMA drivers now don't have to mess with DMA parameters that are not relevant to them at all, and that in the future we can use PCI P2P transfers even for software RDMA, as there is no first fake layer of DMA mapping that the P2P DMA support. Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-27nvmet: fix a NULL pointer dereference when tracing the flush commandChaitanya Kulkarni2-16/+9
When target side trace in turned on and flush command is issued from the host it results in the following Oops. [ 856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068 [ 856.790686] #PF: supervisor read access in kernel mode [ 856.791262] #PF: error_code(0x0000) - not-present page [ 856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0 [ 856.792527] Oops: 0000 [#1] SMP NOPTI [ 856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G OE 5.9.0nvme-5.9+ #71 [ 856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214 [ 856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet] [ 856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0 [ 856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246 [ 856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000 [ 856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034 [ 856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe [ 856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034 [ 856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440 [ 856.802667] FS: 00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000 [ 856.803635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0 [ 856.805283] Call Trace: [ 856.805613] nvmet_req_init+0x27c/0x480 [nvmet] [ 856.806200] nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop] [ 856.806862] blk_mq_dispatch_rq_list+0x123/0x7b0 [ 856.807459] ? kvm_sched_clock_read+0x14/0x30 [ 856.808025] __blk_mq_sched_dispatch_requests+0xc7/0x170 [ 856.808708] blk_mq_sched_dispatch_requests+0x30/0x60 [ 856.809372] __blk_mq_run_hw_queue+0x70/0x100 [ 856.809935] __blk_mq_delay_run_hw_queue+0x156/0x170 [ 856.810574] blk_mq_run_hw_queue+0x86/0xe0 [ 856.811104] blk_mq_sched_insert_request+0xef/0x160 [ 856.811733] blk_execute_rq+0x69/0xc0 [ 856.812212] ? blk_mq_rq_ctx_init+0xd0/0x230 [ 856.812784] nvme_execute_passthru_rq+0x57/0x130 [nvme_core] [ 856.813461] nvme_submit_user_cmd+0xeb/0x300 [nvme_core] [ 856.814099] nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core] [ 856.814752] blkdev_ioctl+0x1dc/0x2c0 [ 856.815197] block_ioctl+0x3f/0x50 [ 856.815606] __x64_sys_ioctl+0x84/0xc0 [ 856.816074] do_syscall_64+0x33/0x40 [ 856.816533] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 856.817168] RIP: 0033:0x7f6a222ed107 [ 856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8 [ 856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107 [ 856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003 [ 856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005 [ 856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3 [ 856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900 [ 856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel Move the nvmet_req_init() tracepoint after we parse the command in nvmet_req_init() so that we can get rid of the duplicate nvmet_find_namespace() call. Rename __assign_disk_name() -> __assign_req_name(). Now that we call tracepoint after parsing the command simplify the newly added __assign_req_name() which fixes this bug. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: don't use BLK_MQ_REQ_NOWAIT for passthruChaitanya Kulkarni1-1/+1
By default, we set the passthru request allocation flag such that it returns the error in the following code path and we fail the I/O when BLK_MQ_REQ_NOWAIT is used for request allocation :- nvme_alloc_request()  blk_mq_alloc_request()   blk_mq_queue_enter()    if (flag & BLK_MQ_REQ_NOWAIT)         return -EBUSY; <-- return if busy. On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where the controller is perfectly healthy and not in a degraded state. Block layer request allocation does allow us to wait instead of immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not used. This has shown to fix the I/O error problem reported under heavy random write workload. Remove the BLK_MQ_REQ_NOWAIT parameter for passthru request allocation which resolves this issue. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: cleanup nvmet_passthru_map_sg()Logan Gunthorpe1-3/+4
Clean up some confusing elements of nvmet_passthru_map_sg() by returning early if the request is greater than the maximum bio size. This allows us to drop the sg_cnt variable. This should not result in any functional change but makes the code clearer and more understandable. The original code allocated a truncated bio then would return EINVAL when bio_add_pc_page() filled that bio. The new code just returns EINVAL early if this would happen. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Suggested-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: limit passthru MTDS by BIO_MAX_PAGESLogan Gunthorpe1-1/+8
nvmet_passthru_map_sg() only supports mapping a single BIO, not a chain so the effective maximum transfer should also be limitted by BIO_MAX_PAGES (presently this works out to 1MB). For PCI passthru devices the max_sectors would typically be more limitting than BIO_MAX_PAGES, but this may not be true for all passthru devices. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: fix uninitialized work for zero katozhenwei pi1-1/+2
When connecting a controller with a zero kato value using the following command line nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0 the warning below can be reproduced: WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90 with trace: mod_delayed_work_on+0x59/0x90 nvmet_update_cc+0xee/0x100 [nvmet] nvmet_execute_prop_set+0x72/0x80 [nvmet] nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp] nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp] ... This is caused by queuing up an uninitialized work. Althrough the keep-alive timer is disabled during allocating the controller (fixed in 0d3b6a8d213a), ka_work still has a chance to run (called by nvmet_start_ctrl). Fixes: 0d3b6a8d213a ("nvmet: Disable keep-alive timer when kato is cleared to 0h") Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07nvme-loop: don't put ctrl on nvme_init_ctrl errorChaitanya Kulkarni1-2/+2
The function nvme_init_ctrl() gets the ctrl reference & when it fails it does put the ctrl reference in the error unwind code. When creating loop ctrl in nvme_loop_create_ctrl() if nvme_init_ctrl() returns non zero (i.e. error) value it jumps to the "out_put_ctrl" label which calls nvme_put_ctrl(), that will lead to douple ctrl put in error unwind path. Update nvme_loop_create_ctrl() such that this patch removes the "out_put_ctrl" label, add a new "out" label after nvme_put_ctrl() in error unwind path and jump to newly added label when nvme_init_ctrl() call retuns an error. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet-fc: fix missing check for no hostport structJames Smart1-1/+1
A hostport port pointer is allowed to be NULL as it is not allocated if the lldd does not support the new interfaces for NVME LS request support. The hostport free routine validates the handle but forgot to validate the hostport pointer. Validate the hostport pointer before using it to validate the handle. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet: add passthru ZNS supportChaitanya Kulkarni1-0/+16
In the default passthru implementation NVMeOF target passthru ctrl is not capable of handling Zoned Namespaces (ZNS). Update the nvmet_parse_pasthru_admin_cmd() to allow NVME_ID_CNS_CS_CTRL/NVME_CIS_ZNS and NVME_ID_CNS_CS_NS/NVME_CIS_ZNS. With this addition NVMeOF Passthru allows Zoned Namespaces. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet: handle keep-alive timer when kato is modified by a set features cmdAmit Engel3-2/+6
A user may modify the kato by a set features cmd. To properly deal with races or a kato value of 0 (no keep alive enabled) change nvmet_set_feat_kato to first disable the timer, then set the value and then re-enable the timer. Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet-tcp: have queue io_work context run on sock incoming cpuMark Wunderlich1-11/+10
No real good need to spread queues artificially. Usually the target will serve multiple hosts, and it's better to run on the socket incoming cpu for better affinitization rather than spread queues on all online cpus. We rely on RSS to spread the work around sufficiently. Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme: lift the file open code from nvme_ctrl_get_by_pathChaitanya Kulkarni1-11/+16
Lift opening the file open/close code from nvme_ctrl_get_by_path into the caller, just keeping a simple nvme_ctrl_from_file() helper. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: refactored a bit, split the bug fixes into a separate prep patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2020-09-22Merge tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+2
Pull block fixes from Jens Axboe: "A few NVMe fixes, and a dasd write zero fix" * tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block: nvmet: get transport reference for passthru ctrl nvme-core: get/put ctrl and transport module in nvme_dev_open/release() nvme-tcp: fix kconfig dependency warning when !CRYPTO nvme-pci: disable the write zeros command for Intel 600P/P3100 s390/dasd: Fix zero write for FBA devices
2020-09-17nvmet: get transport reference for passthru ctrlChristoph Hellwig1-0/+2
Grab a reference to the transport driver to ensure it can't be unloaded while a passthrough controller is active. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Reported-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2020-09-04Merge tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-blockLinus Torvalds2-3/+11
Pull block fixes from Jens Axboe: "A bit larger than usual this week, mostly due to the NVMe fixes arriving late for -rc3 and hence didn't make last weeks pull request. - NVMe: - instance leak and io boundary fixes from Keith - fc locking fix from Christophe - various tcp/rdma reset during traffic fixes from Sagi - pci use-after-free fix from Tong - tcp target null deref fix from Ziye - Locking fix for partition removal (Christoph) - Ensure bdi->io_pages is always set (me) - Fixup for hd struct reference (Ming) - Fix for zero length bvecs (Ming) - Two small blk-iocost fixes (Tejun)" * tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block: block: allow for_each_bvec to support zero len bvec blk-stat: make q->stats->lock irqsafe blk-iocost: ioc_pd_free() shouldn't assume irq disabled block: fix locking in bdev_del_partition block: release disk reference in hd_struct_free_work block: ensure bdi->io_pages is always initialized nvme-pci: cancel nvme device request before disabling nvme: only use power of two io boundaries nvme: fix controller instance leak nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()' nvme: Fix NULL dereference for pci nvme controllers nvme-rdma: fix reset hang if controller died in the middle of a reset nvme-rdma: fix timeout handler nvme-rdma: serialize controller teardown sequences nvme-tcp: fix reset hang if controller died in the middle of a reset nvme-tcp: fix timeout handler nvme-tcp: serialize controller teardown sequences nvme: have nvme_wait_freeze_timeout return if it timed out nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
2020-08-28nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'Christophe JAILLET1-2/+2
The way 'spin_lock()' and 'spin_lock_irqsave()' are used is not consistent in this function. Use 'spin_lock_irqsave()' also here, as there is no guarantee that interruptions are disabled at that point, according to surrounding code. Fixes: a97ec51b37ef ("nvmet_fc: Rework target side abort handling") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2020-08-28nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pduZiye Yang1-1/+9
When handling commands without in-capsule data, we assign the ttag assuming we already have the queue commands array allocated (based on the queue size information in the connect data payload). However if the connect itself did not send the connect data in-capsule we have yet to allocate the queue commands,and we will assign a bogus ttag and suffer a NULL dereference when we receive the corresponding h2cdata pdu. Fix this by checking if we already allocated commands before dereferencing it when handling h2cdata, if we didn't, its for sure a connect and we should use the preallocated connect command. Signed-off-by: Ziye Yang <ziye.yang@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2020-08-24Merge tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-blockLinus Torvalds4-9/+25
Pull block fixes from Jens Axboe: - NVMe pull request from Sagi: - nvme completion rework from Christoph and Chao that mostly came from a bit of divergence of how we classify errors related to pathing/retry etc. - nvmet passthru fixes from Chaitanya - minor nvmet fixes from Amit and I - mpath round-robin path selection fix from Martin - ignore noiob for zoned devices from Keith - minor nvme-fc fix from Tianjia" - BFQ cgroup leak fix (Dmitry) - block layer MAINTAINERS addition (Geert) - fix null_blk FUA checking (Hou) - get_max_io_size() size fix (Keith) - fix block page_is_mergeable() for compound pages (Matthew) - discard granularity fixes (Ming) - IO scheduler ordering fix (Ming) - misc fixes * tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block: (31 commits) null_blk: fix passing of REQ_FUA flag in null_handle_rq nvmet: Disable keep-alive timer when kato is cleared to 0h nvme: redirect commands on dying queue nvme: just check the status code type in nvme_is_path_error nvme: refactor command completion nvme: rename and document nvme_end_request nvme: skip noiob for zoned devices nvme-pci: fix PRP pool size nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth nvme: Use spin_lock_irq() when taking the ctrl->lock nvmet: call blk_mq_free_request() directly nvmet: fix oops in pt cmd execution nvmet: add ns tear down label for pt-cmd handling nvme: multipath: round-robin: eliminate "fallback" variable nvme: multipath: round-robin: fix single non-optimized path case nvme-fc: Fix wrong return value in __nvme_fc_init_request() nvmet-passthru: Reject commands with non-sgl flags set nvmet: fix a memory leak blkcg: fix memleak for iolatency MAINTAINERS: Add missing header files to BLOCK LAYER section ...
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva4-5/+4
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-21nvmet: Disable keep-alive timer when kato is cleared to 0hAmit Engel1-0/+6
Based on nvme spec, when keep alive timeout is set to zero the keep-alive timer should be disabled. Signed-off-by: Amit Engel <amit.engel@dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21nvme: rename and document nvme_end_requestChristoph Hellwig1-1/+1
nvme_end_request is a bit misnamed, as it wraps around the blk_mq_complete_* API. It's semantics also are non-trivial, so give it a more descriptive name and add a comment explaining the semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21nvmet: call blk_mq_free_request() directlyChaitanya Kulkarni1-3/+3
Instead of calling blk_put_request() which calls blk_mq_free_request(), call blk_mq_free_request() directly for NVMeOF passthru. This is to mainly avoid an extra function call in the completion path nvmet_passthru_req_done(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21nvmet: fix oops in pt cmd executionChaitanya Kulkarni1-3/+3
In the existing NVMeOF Passthru core command handling on failure of nvme_alloc_request() it errors out with rq value set to NULL. In the error handling path it calls blk_put_request() without checking if rq is set to NULL or not which produces following Oops:- [ 1457.346861] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 1457.347838] #PF: supervisor read access in kernel mode [ 1457.348464] #PF: error_code(0x0000) - not-present page [ 1457.349085] PGD 0 P4D 0 [ 1457.349402] Oops: 0000 [#1] SMP NOPTI [ 1457.349851] CPU: 18 PID: 10782 Comm: kworker/18:2 Tainted: G OE 5.8.0-rc4nvme-5.9+ #35 [ 1457.350951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214 [ 1457.352347] Workqueue: events nvme_loop_execute_work [nvme_loop] [ 1457.353062] RIP: 0010:blk_mq_free_request+0xe/0x110 [ 1457.353651] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8 [ 1457.355975] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282 [ 1457.356636] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 [ 1457.357526] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000 [ 1457.358416] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000 [ 1457.359317] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000 [ 1457.360424] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8 [ 1457.361322] FS: 0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000 [ 1457.362337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1457.363058] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0 [ 1457.363973] Call Trace: [ 1457.364296] nvmet_passthru_execute_cmd+0x150/0x2c0 [nvmet] [ 1457.364990] process_one_work+0x24e/0x5a0 [ 1457.365493] ? __schedule+0x353/0x840 [ 1457.365957] worker_thread+0x3c/0x380 [ 1457.366426] ? process_one_work+0x5a0/0x5a0 [ 1457.366948] kthread+0x135/0x150 [ 1457.367362] ? kthread_create_on_node+0x60/0x60 [ 1457.367934] ret_from_fork+0x22/0x30 [ 1457.368388] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corer [ 1457.368414] ata_piix crc32c_intel virtio_pci libata virtio_ring serio_raw t10_pi virtio floppy dm_] [ 1457.380849] CR2: 0000000000000000 [ 1457.381288] ---[ end trace c6cab61bfd1f68fd ]--- [ 1457.381861] RIP: 0010:blk_mq_free_request+0xe/0x110 [ 1457.382469] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8 [ 1457.384749] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282 [ 1457.385393] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 [ 1457.386264] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000 [ 1457.387142] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000 [ 1457.388029] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000 [ 1457.388914] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8 [ 1457.389798] FS: 0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000 [ 1457.390796] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1457.391508] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0 [ 1457.392525] Kernel panic - not syncing: Fatal exception [ 1457.394138] Kernel Offset: disabled [ 1457.394677] ---[ end Kernel panic - not syncing: Fatal exception ]--- We fix this Oops by adding a new goto label out_put_req and reordering the blk_put_request call to avoid calling blk_put_request() with rq value is set to NULL. Here we also update the rest of the code accordingly. Fixes: 06b7164dfdc0 ("nvmet: add passthru code to process commands") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21nvmet: add ns tear down label for pt-cmd handlingChaitanya Kulkarni1-4/+5
In the current implementation before submitting the passthru cmd we may come across error e.g. getting ns from passthru controller, allocating a request from passthru controller, etc. For all the failure cases it only uses single goto label fail_out. In the target code, we follow the pattern to have a separate label for each error out the case when setting up multiple things before the actual action. This patch follows the same pattern and renames generic fail_out label to out_put_ns and updates the error out cases in the nvmet_passthru_execute_cmd() where it is needed. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21nvmet-passthru: Reject commands with non-sgl flags setLogan Gunthorpe1-0/+8
Any command with a non-SGL flag set (like fuse flags) should be rejected. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-21nvmet: fix a memory leakSagi Grimberg1-0/+1
We forgot to free new_model_number Fixes: 013b7ebe5a0d ("nvmet: make ctrl model configurable") Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-29nvme-loop: remove extra variable in create ctrlChaitanya Kulkarni1-8/+6
We can call the nvme_change_ctrl_state() directly and have WARN_ON_ONCE(1) call instead of having to use an extra variable which matches the name of the function. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-loop: set ctrl state connecting after initChaitanya Kulkarni1-0/+3
When creating a loop controller (ctrl) in nvme_loop_create_ctrl() -> nvme_init_ctrl() we set the ctrl state to NVME_CTRL_NEW. Prior to [1] NVME_CTRL_NEW state was allowed in nvmf_check_ready() for fabrics command type connect. Now, this fails in the following code path for fabrics connect command when creating admin queue :- nvme_loop_create_ctrl() nvme_loo_configure_admin_queue() nvmf_connect_admin_queue() __nvme_submit_sync_cmd() blk_execute_rq() nvme_loop_queue_rq() nvmf_check_ready() # echo "transport=loop,nqn=fs" > /dev/nvme-fabrics [ 6047.741327] nvmet: adding nsid 1 to subsystem fs [ 6048.756430] nvme nvme1: Connect command failed, error wo/DNR bit: 880 We need to set the ctrl state to NVME_CTRL_CONNECTING after :- nvme_loop_create_ctrl() nvme_init_ctrl() so that the above mentioned check for nvmf_check_ready() will return true. This patch sets the ctrl state to connecting after we init the ctrl in nvme_loop_create_ctrl() nvme_init_ctrl() . [1] commit aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Fixes: aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: introduce the passthru Kconfig optionChaitanya Kulkarni1-0/+12
This patch updates KConfig file for the NVMeOF target where we add new option so that user can selectively enable/disable passthru code. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [logang@deltatee.com: fixed some of the wording in the help message] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: introduce the passthru configfs interfaceLogan Gunthorpe2-0/+100
When CONFIG_NVME_TARGET_PASSTHRU as 'passthru' directory will be added to each subsystem. The directory is similar to a namespace and has two attributes: device_path and enable. The user must set the path to the nvme controller's char device and write '1' to enable the subsystem to use passthru. Any given subsystem is prevented from enabling both a regular namespace and the passthru device. If one is enabled, enabling the other will produce an error. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: Add passthru enable/disable helpersLogan Gunthorpe4-1/+111
This patch adds helper functions which are used in the NVMeOF configfs when the user is configuring the passthru subsystem. Here we ensure that only one subsys is assigned to each nvme_ctrl by using an xarray on the cntlid. The subsystem's version number is overridden by the passed through controller's version. However, if that version is less than 1.2.1, then we bump the advertised version to that and print a warning in dmesg. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>