Age | Commit message (Collapse) | Author | Files | Lines |
|
The test btrfs/011 triggers a rcu warning
Reviewed-by: Anand Jain <anand.jain@oracle.com>
===============================
[ INFO: suspicious RCU usage. ]
4.4.0-rc1-default+ #286 Tainted: G W
-------------------------------
fs/btrfs/volumes.c:1977 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
4 locks held by btrfs/28786:
0: (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+...}, at: [<ffffffffa00bc785>] btrfs_dev_replace_finishing+0x45/0xa00 [btrfs]
1: (uuid_mutex){+.+.+.}, at: [<ffffffffa00bc84f>] btrfs_dev_replace_finishing+0x10f/0xa00 [btrfs]
2: (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa00bc868>] btrfs_dev_replace_finishing+0x128/0xa00 [btrfs]
3: (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa00bc87d>] btrfs_dev_replace_finishing+0x13d/0xa00 [btrfs]
stack backtrace:
CPU: 0 PID: 28786 Comm: btrfs Tainted: G W 4.4.0-rc1-default+ #286
Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS ASNBCPT1.86C.0031.B00.1006301607 06/30/2010
0000000000000001 ffff8800a07dfb48 ffffffff8141d47b 0000000000000001
0000000000000001 0000000000000000 ffff8801464a4f00 ffff8800a07dfb78
ffffffff810cd883 ffff880146eb9400 ffff8800a3698600 ffff8800a33fe220
Call Trace:
[<ffffffff8141d47b>] dump_stack+0x4f/0x74
[<ffffffff810cd883>] lockdep_rcu_suspicious+0x103/0x140
[<ffffffffa0071261>] btrfs_rm_dev_replace_remove_srcdev+0x111/0x130 [btrfs]
[<ffffffff810d354d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81449536>] ? __percpu_counter_sum+0x66/0x80
[<ffffffffa00bcc15>] btrfs_dev_replace_finishing+0x4d5/0xa00 [btrfs]
[<ffffffffa00bc96e>] ? btrfs_dev_replace_finishing+0x22e/0xa00 [btrfs]
[<ffffffffa00a8795>] ? btrfs_scrub_dev+0x415/0x6d0 [btrfs]
[<ffffffffa003ea69>] ? btrfs_start_transaction+0x9/0x20 [btrfs]
[<ffffffffa00bda79>] btrfs_dev_replace_start+0x339/0x590 [btrfs]
[<ffffffff81196aa5>] ? __might_fault+0x95/0xa0
[<ffffffffa0078638>] btrfs_ioctl_dev_replace+0x118/0x160 [btrfs]
[<ffffffff811409c6>] ? stack_trace_call+0x46/0x70
[<ffffffffa007c914>] ? btrfs_ioctl+0x24/0x1770 [btrfs]
[<ffffffffa007ce43>] btrfs_ioctl+0x553/0x1770 [btrfs]
[<ffffffff811409c6>] ? stack_trace_call+0x46/0x70
[<ffffffff811d6eb1>] ? do_vfs_ioctl+0x21/0x5a0
[<ffffffff811d6f1c>] do_vfs_ioctl+0x8c/0x5a0
[<ffffffff811e3336>] ? __fget_light+0x86/0xb0
[<ffffffff811e3369>] ? __fdget+0x9/0x20
[<ffffffff811d7451>] ? SyS_ioctl+0x21/0x80
[<ffffffff811d7483>] SyS_ioctl+0x53/0x80
[<ffffffff81b1efd7>] entry_SYSCALL_64_fastpath+0x12/0x6f
This is because of unprotected use of rcu_dereference in
btrfs_scratch_superblocks. We can't add rcu locks around the whole
function because we read the superblock.
The fix will use the rcu string buffer directly without the rcu locking.
Thi is safe as the device will not go away in the meantime. We're
holding the device list mutexes.
Restructuring the code to narrow down the rcu section turned out to be
impossible, we need to call filp_open (through update_dev_time) on the
buffer and this could call kmalloc/__might_sleep. We could call kstrdup
with GFP_ATOMIC but it's not absolutely necessary.
Fixes: 12b1c2637b6e (Btrfs: enhance btrfs_scratch_superblock to scratch all superblocks)
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
xfstests/011 failed in node with small_size filesystem.
Can be reproduced by following script:
DEV_LIST="/dev/vdd /dev/vde"
DEV_REPLACE="/dev/vdf"
do_test()
{
local mkfs_opt="$1"
local size="$2"
dmesg -c >/dev/null
umount $SCRATCH_MNT &>/dev/null
echo mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}"
mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1
mount "${DEV_LIST[0]}" $SCRATCH_MNT
echo -n "Writing big files"
dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1
for ((i = 1; i <= size; i++)); do
echo -n .
/bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1
done
echo
echo Start replace
btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || {
dmesg
return 1
}
return 0
}
# Set size to value near fs size
# for example, 1897 can trigger this bug in 2.6G device.
#
./do_test "-d raid1 -m raid1" 1897
System will report replace fail with following warning in dmesg:
[ 134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started
[ 135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28
[ 135.543505] ------------[ cut here ]------------
[ 135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440()
[ 135.545276] Modules linked in:
[ 135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 #256
[ 135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 135.547798] ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000
[ 135.548774] ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4
[ 135.549774] ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70
[ 135.550758] Call Trace:
[ 135.551086] [<ffffffff817fe7b5>] dump_stack+0x44/0x55
[ 135.551737] [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0
[ 135.552487] [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20
[ 135.553211] [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440
[ 135.554051] [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0
[ 135.554722] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
[ 135.555506] [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0
[ 135.556304] [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580
[ 135.557009] [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
[ 135.557855] [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70
[ 135.558669] [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90
[ 135.559374] [<ffffffff81202124>] SyS_ioctl+0x74/0x80
[ 135.559987] [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f
[ 135.560842] ---[ end trace 2a5c1fc3205abbdd ]---
Reason:
When big data writen to fs, the whole free space will be allocated
for data chunk.
And operation as scrub need to set_block_ro(), and when there is
only one metadata chunk in system(or other metadata chunks
are all full), the function will try to allocate a new chunk,
and failed because no space in device.
Fix:
When set_block_ro failed for metadata chunk, it is not a problem
because scrub_lock paused commit_trancaction in same time, and
metadata are always cowed, so the on-the-fly writepages will not
write data into same place with scrub/replace.
Let replace continue in this case is no problem.
Tested by above script, and xfstests/011, plus 100 times xfstests/070.
Changelog v1->v2:
1: Add detail comments in source and commit-message.
2: Add dmesg detail into commit-message.
3: Limit return value of -ENOSPC to be passed.
All suggested by: Filipe Manana <fdmanana@gmail.com>
Suggested-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
I've accidentally picked an already used number for the enhanced usage
filter represented by BTRFS_BALANCE_ARGS_USAGE_RANGE, clashing with
BTRFS_BALANCE_ARGS_CONVERT. Introduced during the development phase,
no backward compatibility issues.
Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: bc3094673f22 ("btrfs: extend balance filter usage to take minimum and maximum")
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We were using only 1 transaction unit when attempting to delete an unused
block group but in reality we need 3 + N units, where N corresponds to the
number of stripes. We were accounting only for the addition of the orphan
item (for the block group's free space cache inode) but we were not
accounting that we need to delete one block group item from the extent
tree, one free space item from the tree of tree roots and N device extent
items from the device tree.
While one unit is not enough, it worked most of the time because for each
single unit we are too pessimistic and assume an entire tree path, with
the highest possible heigth (8), needs to be COWed with eventual node
splits at every possible level in the tree, so there was usually enough
reserved space for removing all the items and adding the orphan item.
However after adding the orphan item, writepages() can by called by the VM
subsystem against the btree inode when we are under memory pressure, which
causes writeback to start for the nodes we COWed before, this forces the
operation to remove the free space item to COW again some (or all of) the
same nodes (in the tree of tree roots). Even without writepages() being
called, we could fail with ENOSPC because these items are located in
multiple trees and one of them might have a higher heigth and require
node/leaf splits at many levels, exhausting all the reserved space before
removing all the items and adding the orphan.
In the kernel 4.0 release, commit 3d84be799194 ("Btrfs: fix BUG_ON in
btrfs_orphan_add() when delete unused block group"), we attempted to fix
a BUG_ON due to ENOSPC when trying to add the orphan item by making the
cleaner kthread reserve one transaction unit before attempting to remove
the block group, but this was not enough. We had a couple user reports
still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on
a 4.2-rc6 kernel for example:
http://www.spinics.net/lists/linux-btrfs/msg46070.html
So fix this by reserving all the necessary units of metadata.
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Fixes: 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
It's possible to reach a state where the cleaner kthread isn't able to
start a transaction to delete an unused block group due to lack of enough
free metadata space and due to lack of unallocated device space to allocate
a new metadata block group as well. If this happens try to use space from
the global block group reserve just like we do for unlink operations, so
that we don't reach a permanent state where starting a transaction for
filesystem operations (file creation, renames, etc) keeps failing with
-ENOSPC. Such an unfortunate state was observed on a machine where over
a dozen unused data block groups existed and the cleaner kthread was
failing to delete them due to ENOSPC error when attempting to start a
transaction, and even running balance with a -dusage=0 filter failed with
ENOSPC as well. Also unmounting and mounting again the filesystem didn't
help. Allowing the cleaner kthread to use the global block reserve to
delete the unused data block groups fixed the problem.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
btrfs_alloc_dummy_root() return an error pointer on failure, it never
returns NULL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The calculation of range length in btrfs_sync_file leads to signed
overflow. This was caught by PaX gcc SIZE_OVERFLOW plugin.
https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
The fsync call passes 0 and LLONG_MAX, the range length does not fit to
loff_t and overflows, but the value is converted to u64 so it silently
works as expected.
The minimal fix is a typecast to u64, switching functions to take
(start, end) instead of (start, len) would be more intrusive.
Coccinelle script found that there's one more opencoded calculation of
the length.
<smpl>
@@
loff_t start, end;
@@
* end - start
</smpl>
CC: stable@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Ted and Namjae have reported that truncated pages don't get timely
reclaimed after being truncated in data=journal mode. The following test
triggers the issue easily:
for (i = 0; i < 1000; i++) {
pwrite(fd, buf, 1024*1024, 0);
fsync(fd);
fsync(fd);
ftruncate(fd, 0);
}
The reason is that journal_unmap_buffer() finds that truncated buffers
are not journalled (jh->b_transaction == NULL), they are part of
checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
been already written out (!buffer_dirty(bh)). We clean such buffers but
we leave them in the checkpoint list. Since checkpoint transaction holds
a reference to the journal head, these buffers cannot be released until
the checkpoint transaction is cleaned up. And at that point we don't
call release_buffer_page() anymore so pages detached from mapping are
lingering in the system waiting for reclaim to find them and free them.
Fix the problem by removing buffers from transaction checkpoint lists
when journal_unmap_buffer() finds out they don't have to be there
anymore.
Reported-and-tested-by: Namjae Jeon <namjae.jeon@samsung.com>
Fixes: de1b794130b130e77ffa975bb58cb843744f9ae5
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.
When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended. This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).
Some older kernels might have written pre-1970 dates with 1,1 in the
extra bits. This patch treats those incorrectly-encoded dates as
pre-1970, instead of post-2311, until kernel 4.20 is released.
Hopefully by then e2fsck will have fixed up the bad data.
Also add a comment explaining the encoding of ext4's extra {a,c,m}time
bits.
Signed-off-by: David Turner <novalis@novalis.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Mark Harris <mh8928@yahoo.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
Cc: stable@vger.kernel.org
|
|
A truncated cb_compound request will cause the client to decode null or
data from a previous callback for nfs4.1 backchannel case, or uninitialized
data for the nfs4.0 case. This is because the path through
svc_process_common() advances the request's iov_base and decrements iov_len
without adjusting the overall xdr_buf's len field. That causes
xdr_init_decode() to set up the xdr_stream with an incorrect length in
nfs4_callback_compound().
Fixing this for the nfs4.1 backchannel case first requires setting the
correct iov_len and page_len based on the length of received data in the
same manner as the nfs4.0 case.
Then the request's xdr_buf length can be adjusted for both cases based upon
the remaining iov_len and page_len.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
it when the nfs_client is freed. A decoding or server bug can then find
and try to put that first nfs_client which would lead to a crash.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: d6870312659d ("nfs4client: convert to idr_alloc()")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When LAYOUTGET gets NFS4ERR_DELAY, we currently will wait 15s before
retrying the call. That is a _very_ long time, so add a timeout value to
struct nfs4_layoutget and pass nfs4_async_handle_error a pointer to it.
This allows the RPC engine to use a sliding delay window, instead of a
15s delay.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit 1ca843a2d2 "nfs: Fix GETATTR bitmap verification" has check
the bitmap after decoding success, but decode_attr_fs_locations forgets
cleanup the FATTR4_WORD0_FS_LOCATIONS bits.
decode_getfattr_attrs always return -EIO when meeting FS_LOCATIONS now.
ls: cannot access /mnt/referal: Input/output error
ls: cannot access /mnt/replicas: Input/output error
total 32
drwxr-xr-x. 7 root root 8192 Nov 16 20:36 pnfs
??????????? ? ? ? ? ? referal
??????????? ? ? ? ? ? replicas
v2: clear the bit earlier
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
NFS v4.2 operations can work outside of pNFS, so dprintk() output
shouldn't be placed under NFSDBG_PNFS.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The NFS CLONE_RANGE defintion was wrong and thus never worked. Fix this
by simply using the btrfs ioctl defintion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Originally CLONE didn't allow for intra-file clones, but we recently
updated the spec to support this feature which is also supported by
local Linux file systems.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Without this for example 64-bit binaries on typical amd64 distributions
would not be able to use ioctls on NFS. For now this only affects clones.
Additionally ->compat_ioctl is defined even for non-compat builds, so
get rid of the pointless ifdef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Currently we pass uninitialized stack garbage in the count parameter.
The value is usually large enought to clone whole files and thus let
simple tests pass, but it makes the tests for range clones very unhappy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:
int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<<30;
fallocate(r2, 0, 0, n);
sendfile(r1, r2, 0, n);
Add cond_resched() into __splice_from_pipe() to fix the problem.
CC: Dmitry Vyukov <dvyukov@google.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:
int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<<30;
fallocate(r2, 0, 0, n);
sendfile(r1, r2, 0, n);
There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.
CC: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The thing got broken back in 2002 - sysvfs does *not* have inline
symlinks; even short ones have bodies stored in the first block
of file. sysv_symlink() handles that correctly; unfortunately,
attempting to look an existing symlink up will end up confusing
them for inline symlinks, and interpret the block number containing
the body as the body itself.
Nobody has noticed until now, which says something about the level
of testing sysvfs gets ;-/
Cc: stable@vger.kernel.org # all of them, not that anyone cared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Merge misc fixes from Andrew Morton:
"A bunch of fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG
slub: avoid irqoff/on in bulk allocation
slub: create new ___slab_alloc function that can be called with irqs disabled
mm: fix up sparse warning in gfpflags_allow_blocking
ocfs2: fix umask ignored issue
PM/OPP: add entry in MAINTAINERS
kernel/panic.c: turn off locks debug before releasing console lock
kernel/signal.c: unexport sigsuspend()
kasan: fix kmemleak false-positive in kasan_module_alloc()
fat: fix fake_offset handling on error path
mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes
mm/page-writeback.c: initialize m_dirty to avoid compile warning
various: fix pci_set_dma_mask return value checking
mm: loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
mm: vmalloc: don't remove inexistent guard hole in remove_vm_area()
tools/vm/page-types.c: support KPF_IDLE
ncpfs: don't allow negative timeouts
configfs: allow dynamic group creation
MAINTAINERS: add Moritz as reviewer for FPGA Manager Framework
slab.h: sprinkle __assume_aligned attributes
|
|
New created file's mode is not masked with umask, and this makes umask not
work for ocfs2 volume.
Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For the root directory, . and .. are faked (using dir_emit_dots()) and
ctx->pos is reset from 2 to 0.
A corrupted root directory could cause fat_get_entry() to fail, but
->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos
rewound to 0), so any following calls to ->iterate() continue to return
the same entries again and again.
The result is that userspace will never see the end of the directory,
causing e.g. 'ls' to hang in a getdents() loop.
[hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset]
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard.weinberger@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole
punch code. These problems are in the routine remove_inode_hugepages and
mostly occur in the case where there are holes in the range of pages to be
removed. These holes could be the result of a previous hole punch or
simply sparse allocation. The current code could access pages outside the
specified range.
remove_inode_hugepages handles both hole punch and truncate operations.
Page index handling was fixed/cleaned up so that the loop index always
matches the page being processed. The code now only makes a single pass
through the range of pages as it was determined page faults could not race
with truncate. A cond_resched() was added after removing up to
PAGEVEC_SIZE pages.
Some totally unnecessary code in hugetlbfs_fallocate() that remained from
early development was also removed.
Tested with fallocate tests submitted here:
http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/
And, some ftruncate tests under development
Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This code causes a static checker warning because it's a user controlled
variable where we cap the upper bound but not the lower bound. Let's
return an -EINVAL for negative timeouts.
[akpm@linux-foundation.org: remove unneeded `else']
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: David Howells <dhowells@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patchset introduces IIO software triggers, offers a way of configuring
them via configfs and adds the IIO hrtimer based interrupt source to be used
with software triggers.
The architecture is now split in 3 parts, to remove all IIO trigger specific
parts from IIO configfs core:
(1) IIO configfs - creates the root of the IIO configfs subsys.
(2) IIO software triggers - software trigger implementation, dynamically
creating /config/iio/triggers group.
(3) IIO hrtimer trigger - is the first interrupt source for software triggers
(with syfs to follow). Each trigger type can implement its own set of
attributes.
Lockdep seems to be happy with the locking in configfs patch.
This patch (of 5):
We don't want to hardcode default groups at subsystem
creation time. We export:
* configfs_register_group
* configfs_unregister_group
to allow drivers to programatically create/destroy groups
later, after module init time.
This is needed for IIO configfs support.
(akpm: the other 4 patches to be merged via the IIO tree)
Signed-off-by: Daniel Baluta <daniel.baluta@intel.com>
Suggested-by: Lars-Peter Clausen <lars@metafoo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Adriana Reus <adriana.reus@intel.com>
Cc: Cristina Opriceana <cristina.opriceana@gmail.com>
Cc: Peter Meerwald <pmeerw@pmeerw.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- A collection of crash and deadlock fixes for DAX that are also tagged
for -stable. We will look to re-enable DAX pmd mappings in 4.5, but
for now 4.4 and -stable should disable it by default.
- A fixup to ext2 and ext4 to mirror the same warning emitted by XFS
when mounting with "-o dax"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
block: protect rw_page against device teardown
mm, dax: fix DAX deadlocks (COW fault)
dax: disable pmd mappings
ext2, ext4: warn when mounting with dax enabled
|
|
Fix use after free crashes like the following:
general protection fault: 0000 [#1] SMP
Call Trace:
[<ffffffffa0050216>] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem]
[<ffffffffa0050ba2>] pmem_rw_page+0x42/0x80 [nd_pmem]
[<ffffffff8128fd90>] bdev_read_page+0x50/0x60
[<ffffffff812972f0>] do_mpage_readpage+0x510/0x770
[<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
[<ffffffff811d86dc>] ? lru_cache_add+0x1c/0x50
[<ffffffff81297657>] mpage_readpages+0x107/0x170
[<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
[<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
[<ffffffff8129058d>] blkdev_readpages+0x1d/0x20
[<ffffffff811d615f>] __do_page_cache_readahead+0x28f/0x310
[<ffffffff811d6039>] ? __do_page_cache_readahead+0x169/0x310
[<ffffffff811c5abd>] ? pagecache_get_page+0x2d/0x1d0
[<ffffffff811c76f6>] filemap_fault+0x396/0x530
[<ffffffff811f816e>] __do_fault+0x4e/0xf0
[<ffffffff811fce7d>] handle_mm_fault+0x11bd/0x1b50
Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
[willy: symmetry fixups]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
While dax pmd mappings are functional in the nominal path they trigger
kernel crashes in the following paths:
BUG: unable to handle kernel paging request at ffffea0004098000
IP: [<ffffffff812362f7>] follow_trans_huge_pmd+0x117/0x3b0
[..]
Call Trace:
[<ffffffff811f6573>] follow_page_mask+0x2d3/0x380
[<ffffffff811f6708>] __get_user_pages+0xe8/0x6f0
[<ffffffff811f7045>] get_user_pages_unlocked+0x165/0x1e0
[<ffffffff8106f5b1>] get_user_pages_fast+0xa1/0x1b0
kernel BUG at arch/x86/mm/gup.c:131!
[..]
Call Trace:
[<ffffffff8106f34c>] gup_pud_range+0x1bc/0x220
[<ffffffff8106f634>] get_user_pages_fast+0x124/0x1b0
BUG: unable to handle kernel paging request at ffffea0004088000
IP: [<ffffffff81235f49>] copy_huge_pmd+0x159/0x350
[..]
Call Trace:
[<ffffffff811fad3c>] copy_page_range+0x34c/0x9f0
[<ffffffff810a0daf>] copy_process+0x1b7f/0x1e10
[<ffffffff810a11c1>] _do_fork+0x91/0x590
All of these paths are interpreting a dax pmd mapping as a transparent
huge page and making the assumption that the pfn is covered by the
memmap, i.e. that the pfn has an associated struct page. PTE mappings
do not suffer the same fate since they have the _PAGE_SPECIAL flag to
cause the gup path to fault. We can do something similar for the PMD
path, or otherwise defer pmd support for cases where a struct page is
available. For now, 4.4-rc and -stable need to disable dax pmd support
by default.
For development the "depends on BROKEN" line can be removed from
CONFIG_FS_DAX_PMD.
Cc: <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’:
fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in
this function
If the jump to label "error" is taken, "ret" will indeed be
uninitialized, and random stack data may be printed by the debug code.
Fixes: 102f4d900c9c8f5e ("FS-Cache: Handle a write to the page immediately beyond the EOF marker")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Similar to XFS warn when mounting DAX while it is still considered under
development. Also, aspects of the DAX implementation, for example
synchronization against multiple faults and faults causing block
allocation, depend on the correct implementation in the filesystem. The
maturity of a given DAX implementation is filesystem specific.
Cc: <stable@vger.kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-ext4@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
Pull chrome platform updates from Olof Johansson:
"Here's the branch of chrome platform changes for v4.4. Some have been
queued up for the full 4.3 release cycle since I forgot to send them
in for that round (rebased early on to deal with fixes conflicts).
Most of these enable EC communication stuff -- Pixel 2015 support,
enabling building for ARM64 platforms, and a few fixes for memory
leaks.
There's also a patch in here to allow reading/writing the verified
boot context, which depends on a sysfs patch acked by Greg"
* tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
platform/chrome: Fix i2c-designware adapter name
platform/chrome: Support reading/writing the vboot context
sysfs: Support is_visible() on binary attributes
platform/chrome: cros_ec: Fix possible leak in led_rgb_store()
platform/chrome: cros_ec: Fix leak in sequence_store()
platform/chrome: Enable Chrome platforms on 64-bit ARM
platform/chrome: cros_ec_dev - Add a platform device ID table
platform/chrome: cros_ec_lpc - Add support for Google Pixel 2
platform/chrome: cros_ec_lpc - Use existing function to check EC result
platform/chrome: Make depends on MFD_CROS_EC instead CROS_EC_PROTO
Revert "platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"This series contains HCH's changes to absorb configfs attribute
->show() + ->store() function pointer usage from it's original
tree-wide consumers, into common configfs code.
It includes usb-gadget, target w/ drivers, netconsole and ocfs2
changes to realize the improved simplicity, that now renders the
original include/target/configfs_macros.h CPP magic for fabric drivers
and others, unnecessary and obsolete.
And with common code in place, new configfs attributes can be added
easier than ever before.
Note, there are further improvements in-flight from other folks for
v4.5 code in configfs land, plus number of target fixes for post -rc1
code"
In the meantime, a new user of the now-removed old configfs API came in
through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce
an abstraction for System Trace Module devices").
This merge resolution comes from Alexander Shishkin, who updated his stm
class tracing abstraction to account for the removal of the old
show_attribute and store_attribute methods in commit 517982229f78
("configfs: remove old API") from this pull. As Alexander says about
that patch:
"There's no need to keep an extra wrapper structure per item and the
awkward show_attribute/store_attribute item ops are no longer needed.
This patch converts policy code to the new api, all the while making
the code quite a bit smaller and easier on the eyes.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>"
That patch was folded into the merge so that the tree should be fully
bisectable.
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
configfs: remove old API
ocfs2/cluster: use per-attribute show and store methods
ocfs2/cluster: move locking into attribute store methods
netconsole: use per-attribute show and store methods
target: use per-attribute show and store methods
spear13xx_pcie_gadget: use per-attribute show and store methods
dlm: use per-attribute show and store methods
usb-gadget/f_serial: use per-attribute show and store methods
usb-gadget/f_phonet: use per-attribute show and store methods
usb-gadget/f_obex: use per-attribute show and store methods
usb-gadget/f_uac2: use per-attribute show and store methods
usb-gadget/f_uac1: use per-attribute show and store methods
usb-gadget/f_mass_storage: use per-attribute show and store methods
usb-gadget/f_sourcesink: use per-attribute show and store methods
usb-gadget/f_printer: use per-attribute show and store methods
usb-gadget/f_midi: use per-attribute show and store methods
usb-gadget/f_loopback: use per-attribute show and store methods
usb-gadget/ether: use per-attribute show and store methods
usb-gadget/f_acm: use per-attribute show and store methods
usb-gadget/f_hid: use per-attribute show and store methods
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr cleanups from Al Viro.
* 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
f2fs: xattr simplifications
squashfs: xattr simplifications
9p: xattr simplifications
xattr handlers: Pass handler to operations instead of flags
jffs2: Add missing capability check for listing trusted xattrs
hfsplus: Remove unused xattr handler list operations
ubifs: Remove unused security xattr handler
vfs: Fix the posix_acl_xattr_list return value
vfs: Check attribute names in posix acl xattr handers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- three fixes tagged for -stable including a crash fix, simple
performance tweak, and an invalid i/o error.
- build regression fix for the nvdimm unit tests
- nvdimm documentation update
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix __dax_pmd_fault crash
libnvdimm: documentation clarifications
libnvdimm, pmem: fix size trim in pmem_direct_access()
libnvdimm, e820: fix numa node for e820-type-12 pmem ranges
tools/testing/nvdimm, acpica: fix flag rename build breakage
|
|
Now that the xattr handler is passed to the xattr handler operations, we
have access to the attribute name prefix, so simplify
f2fs_xattr_generic_list.
Also, f2fs_xattr_advise_list is only ever called for
f2fs_xattr_advise_handler; there is no need to double check for that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <chao2.yu@samsung.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Now that the xattr handler is passed to the xattr handler operations, we
have access to the attribute name prefix, so simplify the squashfs xattr
handlers a bit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Now that the xattr handler is passed to the xattr handler operations, we
can use the same get and set operations for the user, trusted, and security
xattr namespaces. In those namespaces, we can access the full attribute
name by "reattaching" the name prefix the vfs has skipped for us. Add a
xattr_full_name helper to make this obvious in the code.
For the "system.posix_acl_access" and "system.posix_acl_default"
attributes, handler->prefix is the full attribute name; the suffix is the
empty string.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: v9fs-developer@lists.sourceforge.net
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example. In some oprations, it would be useful to also have
access to the handler prefix. To allow that, pass a pointer to the handler
to operations instead of the flags value alone.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The vfs checks if a task has the appropriate access for get and set
operations, but it cannot do that for the list operation; the file system
must check for that itself.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The list operations can never be called; they are even documented to be
unused.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Ubifs installs a security xattr handler in sb->s_xattr but doesn't use the
generic_{get,set,list,remove}xattr inode operations needed for processing
this list of attribute handlers; the handler is never called. Instead,
ubifs uses its own xattr handlers which also process security xattrs.
Remove the dead code.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: linux-mtd@lists.infradead.org
Cc: Subodh Nijsure <snijsure@grid-net.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When a filesystem that contains POSIX ACLs is mounted without ACL support
(-o noacl), the appropriate behavior is not to list any existing POSIX ACL
xattrs. The return value for list xattr handlers in this case is 0, not an
error code: several filesystems that use the POSIX ACL xattr handlers do
not expect the list operation to fail.
Symlinks cannot have ACLs, so posix_acl_xattr_list will never be called for
symlinks in the first place.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The get and set operations of the POSIX ACL xattr handlers failed to check
the attribute names, so all names with "system.posix_acl_access" or
"system.posix_acl_default" as a prefix were accepted. Reject invalid names
from now on.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull SMB3 updates from Steve French:
"A collection of SMB3 patches adding some reliability features
(persistent and resilient handles) and improving SMB3 copy offload.
I will have some additional patches for SMB3 encryption and SMB3.1.1
signing (important security features), and also for improving SMB3
persistent handle reconnection (setting ChannelSequence number e.g.)
that I am still working on but wanted to get this set in since they
can stand alone"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Allow copy offload (CopyChunk) across shares
Add resilienthandles mount parm
[SMB3] Send durable handle v2 contexts when use of persistent handles required
[SMB3] Display persistenthandles in /proc/mounts for SMB3 shares if enabled
[SMB3] Enable checking for continuous availability and persistent handle support
[SMB3] Add parsing for new mount option controlling persistent handles
Allow duplicate extents in SMB3 not just SMB3.1.1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes and cleanups from Chris Mason:
"Some of this got cherry-picked from a github repo this week, but I
verified the patches.
We have three small scrub cleanups and a collection of fixes"
* 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: Use fs_info directly in btrfs_delete_unused_bgs
btrfs: Fix lost-data-profile caused by balance bg
btrfs: Fix lost-data-profile caused by auto removing bg
btrfs: Remove len argument from scrub_find_csum
btrfs: Reduce unnecessary arguments in scrub_recheck_block
btrfs: Use scrub_checksum_data and scrub_checksum_tree_block for scrub_recheck_block_checksum
btrfs: Reset sblock->xxx_error stats before calling scrub_recheck_block_checksum
btrfs: scrub: setup all fields for sblock_to_check
btrfs: scrub: set error stats when tree block spanning stripes
Btrfs: fix race when listing an inode's xattrs
Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow
Btrfs: fix race leading to incorrect item deletion when dropping extents
Btrfs: fix sleeping inside atomic context in qgroup rescan worker
Btrfs: fix race waiting for qgroup rescan worker
btrfs: qgroup: exit the rescan worker during umount
Btrfs: fix extent accounting for partial direct IO writes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
"There are several patches from Ilya fixing RBD allocation lifecycle
issues, a series adding a nocephx_sign_messages option (and associated
bug fixes/cleanups), several patches from Zheng improving the
(directory) fsync behavior, a big improvement in IO for direct-io
requests when striping is enabled from Caifeng, and several other
small fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: clear msg->con in ceph_msg_release() only
libceph: add nocephx_sign_messages option
libceph: stop duplicating client fields in messenger
libceph: drop authorizer check from cephx msg signing routines
libceph: msg signing callouts don't need con argument
libceph: evaluate osd_req_op_data() arguments only once
ceph: make fsync() wait unsafe requests that created/modified inode
ceph: add request to i_unsafe_dirops when getting unsafe reply
libceph: introduce ceph_x_authorizer_cleanup()
ceph: don't invalidate page cache when inode is no longer used
rbd: remove duplicate calls to rbd_dev_mapping_clear()
rbd: set device_type::release instead of device::release
rbd: don't free rbd_dev outside of the release callback
rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
libceph: use local variable cursor instead of &msg->cursor
libceph: remove con argument in handle_reply()
ceph: combine as many iovec as possile into one OSD request
ceph: fix message length computation
ceph: fix a comment typo
rbd: drop null test before destroy functions
|
|
Since 4.3 introduced devm_memremap_pages() the pfns handled by DAX may
optionally have a struct page backing. When a mapped pfn reaches
vmf_insert_pfn_pmd() it fails with a crash signature like the following:
kernel BUG at mm/huge_memory.c:905!
[..]
Call Trace:
[<ffffffff812a73ba>] __dax_pmd_fault+0x2ea/0x5b0
[<ffffffffa01a4182>] xfs_filemap_pmd_fault+0x92/0x150 [xfs]
[<ffffffff811fbe02>] handle_mm_fault+0x312/0x1b50
Fix this by falling back to 4K mappings in the pfn_valid() case. Longer
term, vmf_insert_pfn_pmd() needs to grow support for architectures that
can provide a 'pmd_special' capability.
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|