summaryrefslogtreecommitdiff
path: root/drivers/block/loop.c
AgeCommit message (Collapse)AuthorFilesLines
2013-02-22loopdev: fix a deadlockGuo Chao1-2/+0
bd_mutex and lo_ctl_mutex can be held in different order. Path #1: blkdev_open blkdev_get __blkdev_get (hold bd_mutex) lo_open (hold lo_ctl_mutex) Path #2: blkdev_ioctl lo_ioctl (hold lo_ctl_mutex) lo_set_capacity (hold bd_mutex) Lockdep does not report it, because path #2 actually holds a subclass of lo_ctl_mutex. This subclass seems creep into the code by mistake. The patch author actually just mentioned it in the changelog, see commit f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see: http://marc.info/?l=linux-kernel&m=123806169129727&w=2 Path #2 hold bd_mutex to call bd_set_size(), I've protected it with i_mutex in a previous patch, so drop bd_mutex at this site. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-12-17Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+10
Pull block driver update from Jens Axboe: "Now that the core bits are in, here are the driver bits for 3.8. The branch contains: - A huge pile of drbd bits that were dumped from the 3.7 merge window. Following that, it was both made perfectly clear that there is going to be no more over-the-wall pulls and how the situation on individual pulls can be improved. - A few cleanups from Akinobu Mita for drbd and cciss. - Queue improvement for loop from Lukas. This grew into adding a generic interface for waiting/checking an even with a specific lock, allowing this to be pulled out of md and now loop and drbd is also using it. - A few fixes for xen back/front block driver from Roger Pau Monne. - Partition improvements from Stephen Warren, allowing partiion UUID to be used as an identifier." * 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits) drbd: update Kconfig to match current dependencies drbd: Fix drbdsetup wait-connect, wait-sync etc... commands drbd: close race between drbd_set_role and drbd_connect drbd: respect no-md-barriers setting also when changed online via disk-options drbd: Remove obsolete check drbd: fixup after wait_even_lock_irq() addition to generic code loop: Limit the number of requests in the bio list wait: add wait_event_lock_irq() interface xen-blkfront: free allocated page xen-blkback: move free persistent grants code block: partition: msdos: provide UUIDs for partitions init: reduce PARTUUID min length to 1 from 36 block: store partition_meta_info.uuid as a string cciss: use check_signature() cciss: cleanup bitops usage drbd: use copy_highpage drbd: if the replication link breaks during handshake, keep retrying drbd: check return of kmalloc in receive_uuids drbd: Broadcast sync progress no more often than once per second drbd: don't try to clear bits once the disk has failed ...
2012-11-30loop: Limit the number of requests in the bio listLukas Czerner1-0/+10
Currently there is not limitation of number of requests in the loop bio list. This can lead into some nasty situations when the caller spawns tons of bio requests taking huge amount of memory. This is even more obvious with discard where blkdev_issue_discard() will submit all bios for the range and wait for them to finish afterwards. On really big loop devices and slow backing file system this can lead to OOM situation as reported by Dave Chinner. With this patch we will wait in loop_make_request() if the number of bios in the loop bio list would exceed 'nr_congestion_on'. We'll wake up the process as we process the bios form the list. Some threshold hysteresis is in place to avoid high frequency oscillation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30loop: Make explicit loop device destruction lazyDave Chinner1-2/+15
xfstests has always had random failures of tests due to loop devices failing to be torn down and hence leaving filesytems that cannot be unmounted. This causes test runs to immediately stop. Over the past 6 or 7 years we've added hacks like explicit unmount -d commands for loop mounts, losetup -d after unmount -d fails, etc, but still the problems persist. Recently, the frequency of loop related failures increased again to the point that xfstests 259 will reliably fail with a stray loop device that was not torn down. That is despite the fact the test is above as simple as it gets - loop 5 or 6 times running mkfs.xfs with different paramters: lofile=$(losetup -f) losetup $lofile "$testfile" "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!" sync losetup -d $lofile And losteup -d $lofile is failing with EBUSY on 1-3 of these loops every time the test is run. Turns out that blkid is running simultaneously with losetup -d, and so it sees an elevated reference count and returns EBUSY. But why is blkid running? It's obvious, isn't it? udev has decided to try and find out what is on the block device as a result of a creation notification. And it is racing with mkfs, so might still be scanning the device when mkfs finishes and we try to tear it down. So, make losetup -d force autoremove behaviour. That is, when the last reference goes away, tear down the device. xfstests wants it *gone*, not causing random teardown failures when we know that all the operations the tests have specifically run on the device have completed and are no longer referencing the loop device. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-21userns: Convert loop to use kuid_t instead of uid_tEric W. Biederman1-2/+2
Cc: Jens Axboe <jaxboe@fusionio.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-07-14blk: fix wrong idr_pre_get() error check in loop.cSilva Paulo1-5/+3
The idr_pre_get() function never returns a value < 0. It returns 0 (no memory) or 1 (OK). Reported-by: Silva Paulo <psdasilva@yahoo.com> [ Rewrote Silva's patch, but attributing it to Silva anyway - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-20block: remove the second argument of k[un]map_atomic()Cong Wang1-8/+8
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-02-08loop: zero fill bio instead of return -EIO for partial readDave Young1-11/+13
commit 8268f5a741 ("deny partial write for loop dev fd") tried to fix the loop device partial read information leak problem. But it changed the semantics of read behavior. When we read beyond the end of the device we should get 0 bytes, which is normal behavior, we should not just return -EIO Instead of returning -EIO, zero out the bio to avoid information leak in case of partail read. Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: Dmitry Monakhov <dmonakhov@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-01-03fs: move code out of buffer.cAl Viro1-1/+0
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export kill_bdev as well, so brd doesn't have to open code it. Reduce buffer_head.h requirement accordingly. Removed a rather large comment from invalidate_bdev, as it looked a bit obsolete to bother moving. The small comment replacing it says enough. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-02loop: Fix discard_alignment default settingLukas Czerner1-1/+1
discard_alignment is not relevant to the loop driver since it is supposed to be set as a workaround for the old sector 63 alignments. So set it to zero rather than block size. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-25loop: fix loop block driver discard and encryption commentDave Young1-1/+1
The loop driver does not support discard if encryption is enabled, fix the comment. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-16loop: cleanup set_status interfaceDmitry Monakhov1-15/+29
1) Anyone who has read access to loopdev has permission to call set_status and may change important parameters such as lo_offset, lo_sizelimit and so on, which contradicts to read access pattern and definitely equals to write access pattern. 2) Add lo_offset over i_size check to prevent blkdev_size overflow. ##Testcase_bagin #dd if=/dev/zero of=./file bs=1k count=1 #losetup /dev/loop0 ./file /* userspace_application */ struct loop_info64 loinf; fd = open("/dev/loop0", O_RDONLY); ioctl(fd, LOOP_GET_STATUS64, &loinf); /* Set offset to any value which is bigger than i_size, and sizelimit * to nonzero value*/ loinf.lo_offset = 4096*1024; loinf.lo_sizelimit = 1024; ioctl(fd, LOOP_SET_STATUS64, &loinf); /* After this loop device will have size similar to 0x7fffffffffxxxx */ #blockdev --getsz /dev/loop0 ##OUTPUT: 36028797018955968 ##Testcase_end [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-16loop: prevent information leak after failed readDmitry Monakhov1-1/+2
If read was not fully successful we have to fail whole bio to prevent information leak of old pages ##Testcase_begin dd if=/dev/zero of=./file bs=1M count=1 losetup /dev/loop0 ./file -o 4096 truncate -s 0 ./file # OOps loop offset is now beyond i_size, so read will silently fail. # So bio's pages would not be cleared, may which result in information leak. hexdump -C /dev/loop0 ##testcase_end Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-04Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds1-7/+104
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits) virtio-blk: use ida to allocate disk index hpsa: add small delay when using PCI Power Management to reset for kump cciss: add small delay when using PCI Power Management to reset for kump xen/blkback: Fix two races in the handling of barrier requests. xen/blkback: Check for proper operation. xen/blkback: Fix the inhibition to map pages when discarding sector ranges. xen/blkback: Report VBD_WSECT (wr_sect) properly. xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests. xen-blkfront: plug device number leak in xlblk_init() error path xen-blkfront: If no barrier or flush is supported, use invalid operation. xen-blkback: use kzalloc() in favor of kmalloc()+memset() xen-blkback: fixed indentation and comments xen-blkfront: fix a deadlock while handling discard response xen-blkfront: Handle discard requests. xen-blkback: Implement discard requests ('feature-discard') xen-blkfront: add BLKIF_OP_DISCARD and discard request struct drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd() drivers/block/loop.c: emit uevent on auto release drivers/block/cpqarray.c: use pci_dev->revision loop: always allow userspace partitions and optionally support automatic scanning ... Fic up trivial header file includsion conflict in drivers/block/loop.c
2011-10-24Merge branch 'for-linus' into for-3.2/coreJens Axboe1-112/+23
2011-10-19Merge branch 'v3.1-rc10' into for-3.2/coreJens Axboe1-91/+206
Conflicts: block/blk-core.c include/linux/blkdev.h Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-17loop: remove the incorrect write_begin/write_end shortcutChristoph Hellwig1-112/+23
Currently the loop device tries to call directly into write_begin/write_end instead of going through ->write if it can. This is a fairly nasty shortcut as write_begin and write_end are only callbacks for the generic write code and expect to be called with filesystem specific locks held. This code currently causes various issues for clustered filesystems as it doesn't take the required cluster locks, and it also causes issues for XFS as it doesn't properly lock against the swapext ioctl as called by the defragmentation tools. This in case causes data corruption if defragmentation hits a busy loop device in the wrong time window, as reported by RH QA. The reason why we have this shortcut is that it saves a data copy when doing a transformation on the loop device, which is the technical term for using cryptoloop (or an XOR transformation). Given that cryptoloop has been deprecated in favour of dm-crypt my opinion is that we should simply drop this shortcut instead of finding complicated ways to to introduce a formal interface for this shortcut. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-09-21drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()Ayan George1-3/+4
If the loop device is associated (lo->lo_state == Lo_bound), it will have a valid bdev pointed to by lo->lo_device. There is no reason to ever pass an additional block_device pointer. Signed-off-by: Ayan George <ayan.george@canonical.com> Cc: Phillip Susi <psusi@cfl.rr.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-09-21drivers/block/loop.c: emit uevent on auto releasePhillip Susi1-1/+1
The loopback driver failed to emit the change uevent when auto releasing the device. Fixed lo_release() to pass the bdev to loop_clr_fd() so it can emit the event. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ayan George <ayan@ayan.net> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-09-12block: remove support for bio remapping from ->make_requestChristoph Hellwig1-3/+2
There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23loop: always allow userspace partitions and optionally support automatic ↵Kay Sievers1-4/+45
scanning Automatic partition scanning can be requested individually per loop device during its setup by setting LO_FLAGS_PARTSCAN. By default, no partition tables are scanned. Userspace can now always add and remove partitions from all loop devices, regardless if the in-kernel partition scanner is enabled or not. The needed partition minor numbers are allocated from the extended minors space, the main loop device numbers will continue to match the loop minors, regardless of the number of partitions used. # grep . /sys/class/block/loop1/loop/* /sys/block/loop1/loop/autoclear:0 /sys/block/loop1/loop/backing_file:/home/kay/data/stuff/part.img /sys/block/loop1/loop/offset:0 /sys/block/loop1/loop/partscan:1 /sys/block/loop1/loop/sizelimit:0 # ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 Aug 14 20:22 /dev/loop0 brw-rw---- 1 root disk 7, 1 Aug 14 20:23 /dev/loop1 brw-rw---- 1 root disk 259, 0 Aug 14 20:23 /dev/loop1p1 brw-rw---- 1 root disk 259, 1 Aug 14 20:23 /dev/loop1p2 brw-rw---- 1 root disk 7, 99 Aug 14 20:23 /dev/loop99 brw-rw---- 1 root disk 259, 2 Aug 14 20:23 /dev/loop99p1 brw-rw---- 1 root disk 259, 3 Aug 14 20:23 /dev/loop99p2 crw------T 1 root root 10, 237 Aug 14 20:22 /dev/loop-control Cc: Karel Zak <kzak@redhat.com> Cc: Davidlohr Bueso <dave@gnu.org> Acked-By: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19loop: add discard support for loop devicesLukas Czerner1-0/+54
This commit adds discard support for loop devices. Discard is usually supported by SSD and thinly provisioned devices as a method for reclaiming unused space. This is no different than trying to reclaim back space which is not used by the file system on the image, but it still occupies space on the host file system. We can do the reclamation on file system which does support hole punching. So when discard request gets to the loop driver we can translate that to punch a hole to the underlying file, hence reclaim the free space. This is very useful for trimming down the size of the image to only what is really used by the file system on that image. Fstrim may be used for that purpose. It has been tested on ext4, xfs and btrfs with the image file systems ext4, ext3, xfs and btrfs. ext4, or ext6 image on ext4 file system has some problems but it seems that ext4 punch hole implementation is somewhat flawed and it is unrelated to this commit. Also this is a very good method of validating file systems punch hole implementation. Note that when encryption is used, discard support is disabled, because using it might leak some information useful for possible attacker. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31loop: fix deadlock when sysfs and LOOP_CLR_FD race against each otherKay Sievers1-2/+4
LOOP_CLR_FD takes lo->lo_ctl_mutex and tries to remove the loop sysfs files. Sysfs calls show() and waits for lo->lo_ctl_mutex. LOOP_CLR_FD waits for show() to finish to remove the sysfs file. cat /sys/class/block/loop0/loop/backing_file mutex_lock_nested+0x176/0x350 ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] loop_attr_do_show_backing_file+0x2f/0xd0 [loop] dev_attr_show+0x1b/0x60 ? sysfs_read_file+0x86/0x1a0 ? __get_free_pages+0x12/0x50 sysfs_read_file+0xaf/0x1a0 ioctl(LOOP_CLR_FD): wait_for_common+0x12c/0x180 ? try_to_wake_up+0x2a0/0x2a0 wait_for_completion+0x18/0x20 sysfs_deactivate+0x178/0x180 ? sysfs_addrm_finish+0x43/0x70 ? sysfs_addrm_start+0x1d/0x20 sysfs_addrm_finish+0x43/0x70 sysfs_hash_and_remove+0x85/0xa0 sysfs_remove_group+0x59/0x100 loop_clr_fd+0x1dc/0x3f0 [loop] lo_ioctl+0x223/0x7a0 [loop] Instead of taking the lo_ctl_mutex from sysfs code, take the inner lo->lo_lock, to protect the access to the backing_file data. Thanks to Tejun for help debugging and finding a solution. Cc: Milan Broz <mbroz@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop ↵Kay Sievers1-17/+10
devices Instead of unconditionally creating a fixed number of dead loop devices which need to be investigated by storage handling services, even when they are never used, we allow distros start with 0 loop devices and have losetup(8) and similar switch to the dynamic /dev/loop-control interface instead of searching /dev/loop%i for free devices. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31loop: add management interface for on-demand device allocationKay Sievers1-4/+116
Loop devices today have a fixed pre-allocated number of usually 8. The number can only be changed at module init time. To find a free device to use, /dev/loop%i needs to be scanned, and all devices need to be opened until a free one is possibly found. This adds a new /dev/loop-control device node, that allows to dynamically find or allocate a free device, and to add and remove loop devices from the running system: LOOP_CTL_ADD adds a specific device. Arg is the number of the device. It returns the device i or a negative error code. LOOP_CTL_REMOVE removes a specific device, Arg is the number the device. It returns the device i or a negative error code. LOOP_CTL_GET_FREE finds the next unbound device or allocates a new one. No arg is given. It returns the device i or a negative error code. The loop kernel module gets automatically loaded when /dev/loop-control is accessed the first time. The alias specified in the module, instructs udev to create this 'dead' device node, even when the module is not loaded. Example: cfd = open("/dev/loop-control", O_RDWR); # add a new specific loop device err = ioctl(cfd, LOOP_CTL_ADD, devnr); # remove a specific loop device err = ioctl(cfd, LOOP_CTL_REMOVE, devnr); # find or allocate a free loop device to use devnr = ioctl(cfd, LOOP_CTL_GET_FREE); sprintf(loopname, "/dev/loop%i", devnr); ffd = open("backing-file", O_RDWR); lfd = open(loopname, O_RDWR); err = ioctl(lfd, LOOP_SET_FD, ffd); Cc: Tejun Heo <tj@kernel.org> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31loop: replace linked list of allocated devices with an idr indexKay Sievers1-72/+80
Replace the linked list, that keeps track of allocated devices, with an idr index to allow a more efficient lookup of devices. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-27loop: export module parametersNamhyung Kim1-3/+14
Export 'max_loop' and 'max_part' parameters to sysfs so user can know that how many devices are allowed and how many partitions are supported. If 'max_loop' is 0, there is no restriction on the number of loop devices. User can create/use the devices as many as minor numbers available. If 'max_part' is 0, it means simply the device doesn't support partitioning. Also note that 'max_part' can be adjusted to power of 2 minus 1 form if needed. User should check this value after the module loading if he/she want to use that number correctly (i.e. fdisk, mknod, etc.). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-24loop: handle on-demand devices correctlyNamhyung Kim1-4/+4
When finding or allocating a loop device, loop_probe() did not take partition numbers into account so that it can result to a different device. Consider following example: $ sudo modprobe loop max_part=15 $ ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0 brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1 brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2 brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3 brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4 brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5 brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6 brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7 $ sudo mknod /dev/loop8 b 7 128 $ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img $ sudo losetup -a /dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img) $ ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0 brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1 brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128 brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1 brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2 brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3 brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2 brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3 brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4 brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5 brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6 brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7 brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8 After this patch, /dev/loop8 - instead of /dev/loop128 - was accessed correctly. In addition, 'range' passed to blk_register_region() should include all range of dev_t that LOOP_MAJOR can address. It does not need to be limited by partition numbers unless 'max_loop' param was specified. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-24loop: limit 'max_part' module param to DISK_MAX_PARTSNamhyung Kim1-0/+3
The 'max_part' parameter controls the number of maximum partition a loop block device can have. However if a user specifies very large value it would exceed the limitation of device minor number and can cause a kernel panic (or, at least, produce invalid device nodes in some cases). On my desktop system, following command kills the kernel. On qemu, it triggers similar oops but the kernel was alive: $ sudo modprobe loop max_part0000 ------------[ cut here ]------------ kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65! invalid opcode: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: loop(+) Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs RIP: 0010:[<ffffffff8113ce61>] [<ffffffff8113ce61>] internal_create_group= +0x2a/0x170 RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246 RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4 RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878 RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000 R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50 R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800 FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000= 00 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c= 0) Stack: ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b 0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868 0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8 Call Trace: [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12 [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16 [<ffffffff8116a090>] blk_register_queue+0x47/0xf7 [<ffffffff8116f527>] add_disk+0xdf/0x290 [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop] [<ffffffffa0006000>] ? 0xffffffffa0005fff [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e [<ffffffff81096804>] sys_init_module+0x9c/0x1e0 [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb= 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe = 48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49 RIP [<ffffffff8113ce61>] internal_create_group+0x2a/0x170 RSP <ffff880007b3fde8> ---[ end trace a123eb592043acad ]--- Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/coreJens Axboe1-13/+0
Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: remove per-queue pluggingJens Axboe1-13/+0
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-04Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/coreTejun Heo1-5/+0
This merge creates two set of conflicts. One is simple context conflicts caused by removal of throtl_scheduled_delayed_work() in for-linus and removal of throtl_shutdown_timer_wq() in for-2.6.39/core. The other is caused by commit 255bb490c8 (block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()) in for-linus crashing with FLUSH reimplementation in for-2.6.39/core. The conflict isn't trivial but the resolution is straight-forward. * __blk_run_queue() calls in flush_end_io() and flush_data_end_io() should be called with @force_kblockd set to %true. * elv_insert() in blk_kick_flush() should use %ELEVATOR_INSERT_REQUEUE. Both changes are to avoid invoking ->request_fn() directly from request completion path and closely match the changes in the commit 255bb490c8. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-03-03block: kill loop_mutexPetr Uzel1-5/+0
Following steps lead to deadlock in kernel: dd if=/dev/zero of=img bs=512 count=1000 losetup -f img mkfs.ext2 /dev/loop0 mount -t ext2 -o loop /dev/loop0 mnt umount mnt/ Stacktrace: [<c102ec04>] irq_exit+0x36/0x59 [<c101502c>] smp_apic_timer_interrupt+0x6b/0x75 [<c127f639>] apic_timer_interrupt+0x31/0x38 [<c101df88>] mutex_spin_on_owner+0x54/0x5b [<fe2250e9>] lo_release+0x12/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a4da5>] fput+0xd5/0x1aa [<fe2250cf>] loop_clr_fd+0x1a9/0x1b1 [loop] [<fe225110>] lo_release+0x39/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a59d9>] deactivate_locked_super+0x17/0x36 [<c10b6f37>] sys_umount+0x27e/0x2a5 [<c10b6f69>] sys_oldumount+0xb/0xe [<c1002897>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff Regression since 2a48fc0ab24241755dc9, which introduced the private loop_mutex as part of the BKL removal process. As per [1], the mutex can be safely removed. [1] http://www.gossamer-threads.com/lists/linux/kernel/1341930 Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172 Signed-off-by: Petr Uzel <petr.uzel@suse.cz> Cc: stable@kernel.org Reviewed-by: Nikanth Karthikesan <knikanth@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02loop: No need to initialize ->queue_lock explicitly before calling ↵Vivek Goyal1-3/+0
blk_cleanup_queue() Now we initialize ->queue_lock at queue allocation time so driver does not have to worry about initializing it before calling blk_cleanup_queue(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-01-19loop: queue_lock NULL pointer derefence in blk_throtl_exitSergey Senozhatsky1-0/+3
Performing $ sudo mount -o loop -o umask=0 /dev/sdb1 /mnt/ mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so $ sudo modprobe -r loop results in oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [<ffffffff812479d4>] do_raw_spin_lock+0x14/0x122 Process modprobe (pid: 6189, threadinfo ffff88009a898000, task ffff880154a88000) Call Trace: [<ffffffff81486788>] _raw_spin_lock_irq+0x4a/0x51 [<ffffffff8123404b>] ? blk_throtl_exit+0x3b/0xa0 [<ffffffff8105b120>] ? cancel_delayed_work_sync+0xd/0xf [<ffffffff8123404b>] blk_throtl_exit+0x3b/0xa0 [<ffffffff81229bc8>] blk_release_queue+0x21/0x65 [<ffffffff8123bb06>] kobject_release+0x51/0x66 [<ffffffff8123bab5>] ? kobject_release+0x0/0x66 [<ffffffff8123ce1e>] kref_put+0x43/0x4d [<ffffffff8123ba27>] kobject_put+0x47/0x4b [<ffffffff8122717c>] blk_cleanup_queue+0x56/0x5b [<ffffffffa01c3824>] loop_exit+0x68/0x844 [loop] [<ffffffff8107cccc>] sys_delete_module+0x1e8/0x25b [<ffffffff814864c9>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81002112>] system_call_fastpath+0x16/0x1b because of an attempt to acquire NULL queue_lock. I added the same lines as in blk_queue_make_request - index 44e18c0..49e6a54 100644`fall back to embedded per-queue lock'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-20Fix compile warnings due to missing removal of a 'ret' variableJens Axboe1-1/+1
Commit a8adbe3 forgot to remove the return variable, kill it. drivers/block/loop.c: In function 'lo_splice_actor': drivers/block/loop.c:398: warning: unused variable 'ret' [...] fs/nfsd/vfs.c: In function 'nfsd_splice_actor': fs/nfsd/vfs.c:848: warning: unused variable 'ret' Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-17fs/splice: Pull buf->ops->confirm() from splice_from_pipe actorsMichał Mirosław1-4/+0
This patch pulls calls to buf->ops->confirm() from all actors passed (also indirectly) to splice_from_pipe_feed(). Is avoiding the call to buf->ops->confirm() while splice()ing to /dev/null is an intentional optimization? No other user does that and this will remove this special case. Against current linux.git 6313e3c21743cc88bb5bd8aa72948ee1e83937b6. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-11-10block: remove REQ_HARDBARRIERChristoph Hellwig1-6/+0
REQ_HARDBARRIER is dead now, so remove the leftovers. What's left at this point is: - various checks inside the block layer. - sanity checks in bio based drivers. - now unused bio_empty_barrier helper. - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while, but Xen really needs to sort out it's barrier situaton. - setting of ordered tags in uas - dead code copied from old scsi drivers. - scsi different retry for barriers - it's dead and should have been removed when flushes were converted to FS requests. - blktrace handling of barriers - removed. Someone who knows blktrace better should add support for REQ_FLUSH and REQ_FUA, though. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-27loop: Properly clear sysfs in autoclear modeMilan Broz1-1/+1
In autoclear mode bdev is NULL but the sysfs entry should be destroyed otherwise this warning appears: WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0x82/0x95() sysfs: cannot create duplicate filename '/devices/virtual/block/loop0/loop' Fixes commit ee86273062cbb310665fe49e1f1937d2cf85b0b9 Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-26mm: strictly nested kmap_atomic()Peter Zijlstra1-2/+2
Ensure kmap_atomic() usage is strictly nested Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-22Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-10/+10
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...
2010-10-22Merge branch 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-0/+101
* 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block: (95 commits) cciss: fix PCI IDs for new Smart Array controllers drbd: add race-breaker to drbd_go_diskless drbd: use dynamic_dev_dbg to optionally log uuid changes dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set drbd: cleanup: change "<= 0" to "== 0" drbd: relax the grace period of the md_sync timer again drbd: add some more explicit drbd_md_sync drbd: drop wrong debug asserts, fix recently introduced race drbd: cleanup useless leftover warn/error printk's drbd: add explicit drbd_md_sync to drbd_resync_finished drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED drbd: fix for possible deadlock on IO error during resync drbd: fix unlikely access after free and list corruption drbd: fix for spurious fullsync (uuids rotated too fast) drbd: allow for explicit resync-finished notifications drbd: preparation commit, using full state in receive_state() drbd: drbd_send_ack_dp must not rely on header information drbd: Fix regression in recv_bm_rle_bits (compressed bitmap) drbd: Fixed a stupid copy and paste error drbd: Allow larger values for c-fill-target. ... Fix up trivial conflict in drivers/block/ataflop.c due to BKL removal
2010-10-05block: autoconvert trivial BKL users to private mutexArnd Bergmann1-5/+6
The block device drivers have all gained new lock_kernel calls from a recent pushdown, and some of the drivers were already using the BKL before. This turns the BKL into a set of per-driver mutexes. Still need to check whether this is safe to do. file=$1 name=$2 if grep -q lock_kernel ${file} ; then if grep -q 'include.*linux.mutex.h' ${file} ; then sed -i '/include.*<linux\/smp_lock.h>/d' ${file} else sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file} fi sed -i ${file} \ -e "/^#include.*linux.mutex.h/,$ { 1,/^\(static\|int\|long\)/ { /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex); } }" \ -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \ -e '/[ ]*cycle_kernel_lock();/d' else sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \ -e '/cycle_kernel_lock()/d' fi Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-09-10block/loop: implement REQ_FLUSH/FUA supportTejun Heo1-9/+9
Deprecate REQ_HARDBARRIER and implement REQ_FLUSH/FUA instead. Also, instead of checking file->f_op->fsync() directly, look at the value of vfs_fsync() and ignore -EINVAL return. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush()Tejun Heo1-1/+1
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler blk_queue_flush(). blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a device has write cache and can flush it, it should set REQ_FLUSH. If the device can handle FUA writes, it should also set REQ_FUA. All blk_queue_ordered() users are converted. * ORDERED_DRAIN is mapped to 0 which is the default value. * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH. * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10block/loop: queue ordered mode should be DRAIN_FLUSHTejun Heo1-1/+1
loop implements FLUSH using fsync but was incorrectly setting its ordered mode to DRAIN. Change it to DRAIN_FLUSH. In practice, this doesn't change anything as loop doesn't make use of the block layer ordered implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23loop: add some basic read-only sysfs attributesMilan Broz1-0/+101
Create /sys/block/loopX/loop directory and provide these attributes: - backing_file - autoclear - offset - sizelimit This loop directory is present only if loop device is configured. To be used in util-linux-ng (and possibly elsewhere like udev rules) where code need to get loop attributes from kernel (and not store duplicate info in userspace). Moreover loop ioctls are not even able to provide full backing file info because of buffer limits. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23BLOCK: fix bio.bi_rw handlingJiri Slaby1-1/+1
Return of the bi_rw tests is no longer bool after commit 74450be1. But results of such tests are stored in bools. This doesn't fit in there for some compilers (gcc 4.5 here), so either use !! magic to get real bools or use ulong where the result is assigned somewhere. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: push down BKL into .open and .releaseArnd Bergmann1-0/+5
The open and release block_device_operations are currently called with the BKL held. In order to change that, we must first make sure that all drivers that currently rely on this have no regressions. This blindly pushes the BKL into all .open and .release operations for all block drivers to prepare for the next step. The drivers can subsequently replace the BKL with their own locks or remove it completely when it can be shown that it is not needed. The functions blkdev_get and blkdev_put are the only remaining users of the big kernel lock in the block layer, besides a few uses in the ioctl code, none of which need to serialize with blkdev_{get,put}. Most of these two functions is also under the protection of bdev->bd_mutex, including the actual calls to ->open and ->release, and the common code does not access any global data structures that need the BKL. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: remove q->prepare_flush_fn completelyFUJITA Tomonori1-1/+1
This removes q->prepare_flush_fn completely (changes the blk_queue_ordered API). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>