block: init flush rq ref count to 1

We discovered a problem in newer kernels where a disconnect of a NBD device while the flush request was pending would result in a hang. This is because the blk mq timeout handler does if (!refcount_inc_not_zero(&rq->ref)) return true; to determine if it's ok to run the timeout handler for the request. Flush_rq's don't have a ref count set, so we'd skip running the timeout handler for this request and it would just sit there in limbo forever. Fix this by always setting the refcount of any request going through blk_init_rq() to 1. I tested this with a nbd-server that dropped flush requests to verify that it hung, and then tested with this patch to verify I got the timeout as expected and the error handling kicked in. Thanks, Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
author: Josef Bacik <josef@toxicpanda.com> 2019-03-07 21:37:18 +0000
committer: Jens Axboe <axboe@kernel.dk> 2019-07-10 09:00:57 -0600
commit: b554db147feea39617b533ab6bca247c91c6198a (patch)
tree: 2041b2db3c965062f63d0315af213f385d3ccbd0 /block/blk-core.c
parent: cdc5ffc4100549654e19e6f068cf1fc0871a85c2 (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/block/blk-core.c b/block/blk-core.c
index 5d1fc8e17dd1..edd009213f5b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -117,6 +117,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
 	rq->internal_tag = -1;
 	rq->start_time_ns = ktime_get_ns();
 	rq->part = NULL;
+	refcount_set(&rq->ref, 1);
 }
 EXPORT_SYMBOL(blk_rq_init);
author	Josef Bacik <josef@toxicpanda.com>	2019-03-07 21:37:18 +0000
committer	Jens Axboe <axboe@kernel.dk>	2019-07-10 09:00:57 -0600
commit	b554db147feea39617b533ab6bca247c91c6198a (patch)
tree	2041b2db3c965062f63d0315af213f385d3ccbd0 /block/blk-core.c
parent	cdc5ffc4100549654e19e6f068cf1fc0871a85c2 (diff)