block: loop: support DIO & AIO

There are at least 3 advantages to use direct I/O and AIO on read/write loop's backing file: 1) double cache can be avoided, then memory usage gets decreased a lot 2) not like user space direct I/O, there isn't cost of pinning pages 3) avoid context switch for obtaining good throughput - in buffered file read, random I/O top throughput is often obtained only if they are submitted concurrently from lots of tasks; but for sequential I/O, most of times they can be hit from page cache, so concurrent submissions often introduce unnecessary context switch and can't improve throughput much. There was such discussion[1] to use non-blocking I/O to improve the problem for application. - with direct I/O and AIO, concurrent submissions can be avoided and random read throughput can't be affected meantime xfstests(-g auto, ext4) is basically passed when running with direct I/O(aio), one exception is generic/232, but it failed in loop buffered I/O(4.2-rc6-next-20150814) too. Follows the fio test result for performance purpose: 4 jobs fio test inside ext4 file system over loop block 1) How to run - KVM: 4 VCPUs, 2G RAM - linux kernel: 4.2-rc6-next-20150814(base) with the patchset - the loop block is over one image on SSD. - linux psync, 4 jobs, size 1500M, ext4 over loop block - test result: IOPS from fio output 2) Throughput(IOPS) becomes a bit better with direct I/O(aio) ------------------------------------------------------------- test cases |randread |read |randwrite |write | ------------------------------------------------------------- base |8015 |113811 |67442 |106978 ------------------------------------------------------------- base+loop aio |8136 |125040 |67811 |111376 ------------------------------------------------------------- - somehow, it should be caused by more page cache avaiable for application or one extra page copy is avoided in case of direct I/O 3) context switch - context switch decreased by ~50% with loop direct I/O(aio) compared with loop buffered I/O(4.2-rc6-next-20150814) 4) memory usage from /proc/meminfo ------------------------------------------------------------- | Buffers | Cached ------------------------------------------------------------- base | > 760MB | ~950MB ------------------------------------------------------------- base+loop direct I/O(aio) | < 5MB | ~1.6GB ------------------------------------------------------------- - so there are much more page caches available for application with direct I/O [1] https://lwn.net/Articles/612483/ Signed-off-by: Ming Lei <ming.lei@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
author: Ming Lei <ming.lei@canonical.com> 2015-08-17 10:31:51 +0800
committer: Jens Axboe <axboe@fb.com> 2015-09-23 11:01:16 -0600
commit: bc07c10a3603a5ab3ef01ba42b3d41f9ac63d1b6 (patch)
tree: 1ebe0510f1b1f707635861e1e773b9176fbe0490 /drivers/block/loop.h
parent: ab1cb278bc7027663adbfb0b81404f8398437e11 (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index d1de2217c09a..fb2237c73e61 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -69,6 +69,8 @@ struct loop_cmd {
 	struct kthread_work work;
 	struct request *rq;
 	struct list_head list;
+	bool use_aio;           /* use AIO interface to handle I/O */
+	struct kiocb iocb;
 };
 
 /* Support for loadable transfer modules */
author	Ming Lei <ming.lei@canonical.com>	2015-08-17 10:31:51 +0800
committer	Jens Axboe <axboe@fb.com>	2015-09-23 11:01:16 -0600
commit	bc07c10a3603a5ab3ef01ba42b3d41f9ac63d1b6 (patch)
tree	1ebe0510f1b1f707635861e1e773b9176fbe0490 /drivers/block/loop.h
parent	ab1cb278bc7027663adbfb0b81404f8398437e11 (diff)