Age | Commit message (Collapse) | Author | Files | Lines |
|
NFSv4.1 has built-in trunking support that allows a client to determine
whether two connections to two different IP addresses are actually to
the same server. NFSv4.0 does not, but RFC 7931 attempts to provide
clients a means to do this, basically by performing a SETCLIENTID to one
address and confirming it with a SETCLIENTID_CONFIRM to the other.
Linux clients since 05f4c350ee02 "NFS: Discover NFSv4 server trunking
when mounting" implement a variation on this suggestion. It is possible
that other clients do too.
This depends on the clientid and verifier not being accepted by an
unrelated server. Since both are 64-bit values, that would be very
unlikely if they were random numbers. But they aren't:
knfsd generates the 64-bit clientid by concatenating the 32-bit boot
time (in seconds) and a counter. This makes collisions between
clientids generated by the same server extremely unlikely. But
collisions are very likely between clientids generated by servers that
boot at the same time, and it's quite common for multiple servers to
boot at the same time. The verifier is a concatenation of the
SETCLIENTID time (in seconds) and a counter, so again collisions between
different servers are likely if multiple SETCLIENTIDs are done at the
same time, which is a common case.
Therefore recent NFSv4.0 clients may decide two different servers are
really the same, and mount a filesystem from the wrong server.
Fortunately the Linux client, since 55b9df93ddd6 "nfsv4/v4.1: Verify the
client owner id during trunking detection", only does this when given
the non-default "migration" mount option.
The fault is really with RFC 7931, and needs a client fix, but in the
meantime we can mitigate the chance of these collisions by randomizing
the starting value of the counters used to generate clientids and
verifiers.
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If we are using v4.1+, then we can send notification when contended
locks become free. Inform the client of that fact.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
As defined in RFC 5661, section 18.16.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
It's possible for a client to call in on a lock that is blocked for a
long time, but discontinue polling for it. A malicious client could
even set a lock on a file, and then spam the server with failing lock
requests from different lockowners that pile up in a DoS attack.
Add the blocked lock structures to a per-net namespace LRU when hashing
them, and timestamp them. If the lock request is not revisited after a
lease period, we'll drop it under the assumption that the client is no
longer interested.
This also gives us a mechanism to clean up these objects at server
shutdown time as well.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Create a new per-lockowner+per-inode structure that contains a
file_lock. Have nfsd4_lock add this structure to the lockowner's list
prior to setting the lock. Then call the vfs and request a blocking lock
(by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
back, then we dequeue the block structure and free it. When the next
lock request comes in, we'll look for an existing block for the same
filehandle and dequeue and reuse it if there is one.
When the lock comes free (a'la an lm_notify call), we dequeue it
from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
inform the client that it should retry the lock request.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Add the encoding/decoding for CB_NOTIFY_LOCK operations.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
By design notifier can be registered once only, however nfsd registers
the same inetaddr notifiers per net-namespace. When this happen it
corrupts list of notifiers, as result some notifiers can be not called
on proper event, traverse on list can be cycled forever, and second
unregister can access already freed memory.
Cc: stable@vger.kernel.org
fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Support Remote Invalidation. A private message is exchanged with
the client upon RDMA transport connect that indicates whether
Send With Invalidation may be used by the server to send RPC
replies. The invalidate_rkey is arbitrarily chosen from among
rkeys present in the RPC-over-RDMA header's chunk lists.
Send With Invalidate improves performance only when clients can
recognize, while processing an RPC reply, that an rkey has already
been invalidated. That has been submitted as a separate change.
In the future, the RPC-over-RDMA protocol might support Remote
Invalidation properly. The protocol needs to enable signaling
between peers to indicate when Remote Invalidation can be used
for each individual RPC.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Prepare to receive an RDMA-CM private message when handling a new
connection attempt, and send a similar message as part of connection
acceptance.
Both sides can communicate their various implementation limits.
Implementations that don't support this sideband protocol ignore it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Introduce data structure used by both client and server to exchange
implementation details during RDMA/CM connection establishment.
This is an experimental out-of-band exchange between Linux
RPC-over-RDMA Version One implementations, replacing the deprecated
CCP (see RFC 5666bis). The purpose of this extension is to enable
prototyping of features that might be introduced in a subsequent
version of RPC-over-RDMA.
Suggested by Christoph Hellwig and Devesh Sharma.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Message from syslogd@klimt at Aug 18 17:00:37 ...
kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping: (null) index:0x0
Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
Aug 18 17:00:37 klimt kernel: RIP: 0010:[<ffffffffa05c21c1>] svc_rdma_sendto+0x641/0x820 [rpcrdma]
send_reply() assigns its page argument as the first page of ctxt. On
error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
which does a put_page() on that very page. No need to do that again
as svc_rdma_sendto exits.
Fixes: 3e1eeb980822 ("svcrdma: Close connection when a send error occurs")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.
However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.
This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.
krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.
Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
nfserr is big-endian, so we should convert it to host-endian before
printing it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We already have that info in the client pointer. No need to pass around
a copy.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We currently can hit a deadlock (of sorts) when trying to use flexfiles
layouts with XFS. XFS will call break_layout when something wants to
write to the file. In the case of the (super-simple) flexfiles layout
driver in knfsd, the MDS and DS are the same machine.
The client can get a layout and then issue a v3 write to do its I/O. XFS
will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
be issued to the client. The client however can't return the layout
until the v3 WRITE completes, but XFS won't allow the write to proceed
until the layout is returned.
Christoph says:
XFS only cares about block-like layouts where the client has direct
access to the file blocks. I'd need to look how to propagate the
flag into break_layout, but in principle we don't need to do any
recalls on truncate ever for file and flexfile layouts.
If we're never going to recall the layout, then we don't even need to
set the lease at all. Just skip doing so on flexfiles layouts by
adding a new flag to struct nfsd4_layout_ops and skipping the lease
setting and removal when that flag is true.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
rsc_lookup steals the passed-in memory to avoid doing an allocation of
its own, so we can't just pass in a pointer to memory that someone else
is using.
If we really want to avoid allocation there then maybe we should
preallocate somwhere, or reference count these handles.
For now we should revert.
On occasion I see this on my server:
kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: events do_cache_clean [sunrpc]
kernel: task: ffff8808541d8000 task.stack: ffff880854344000
kernel: RIP: 0010:[<ffffffff811e7075>] [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP: 0018:ffff880854347d70 EFLAGS: 00010246
kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
kernel: FS: 0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
kernel: Stack:
kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
kernel: Call Trace:
kernel: [<ffffffffa013cf76>] rsc_free+0x16/0x90 [auth_rpcgss]
kernel: [<ffffffffa013d006>] rsc_put+0x16/0x30 [auth_rpcgss]
kernel: [<ffffffffa0406f60>] cache_clean+0x2e0/0x300 [sunrpc]
kernel: [<ffffffffa04073ee>] do_cache_clean+0xe/0x70 [sunrpc]
kernel: [<ffffffff8109a70f>] process_one_work+0x1ff/0x3b0
kernel: [<ffffffff8109b15c>] worker_thread+0x2bc/0x4a0
kernel: [<ffffffff8109aea0>] ? rescuer_thread+0x3a0/0x3a0
kernel: [<ffffffff810a0ba4>] kthread+0xe4/0xf0
kernel: [<ffffffff8169c47f>] ret_from_fork+0x1f/0x40
kernel: [<ffffffff810a0ac0>] ? kthread_stop+0x110/0x110
kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff <0f> 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
kernel: RIP [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP <ffff880854347d70>
kernel: ---[ end trace 3fdec044969def26 ]---
It seems to be most common after a server reboot where a client has been
using a Kerberos mount, and reconnects to continue its workload.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci
driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
depend on the block layer (which used to be an implicit dependency
through BLK_DEV_NVME).
Otherwise you get various errors from the kbuild test robot random
config testing when that happens to hit a configuration with BLOCK
device support disabled.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jay Freyensee <james_p_freyensee@linux.intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO fixes from Greg KH:
"Here are a few small IIO fixes for 4.8-rc6.
Nothing major, full details are in the shortlog, all of these have
been in linux-next with no reported issues"
* tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio:core: fix IIO_VAL_FRACTIONAL sign handling
iio: ensure ret is initialized to zero before entering do loop
iio: accel: kxsd9: Fix scaling bug
iio: accel: bmc150: reset chip at init time
iio: fix pressure data output unit in hid-sensor-attributes
tools:iio:iio_generic_buffer: fix trigger-less mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.
All of these resolve minor issues that have been reported, and all
have been in linux-next with no reported issues"
* tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
xhci: fix null pointer dereference in stop command timeout function
usb: dwc3: pci: fix build warning on !PM_SLEEP
usb: gadget: prevent potenial null pointer dereference on skb->len
usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
usb: phy: phy-generic: Check clk_prepare_enable() error
usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
Revert "usb: dwc3: gadget: always decrement by 1"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"nvdimm fixes for v4.8, two of them are tagged for -stable:
- Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise,
DAX pmd mappings end up with an uncached pgprot, and unusable
performance for the device-dax interface. The device-dax interface
appeared in 4.7 so this is tagged for -stable.
- Fix a couple VM_BUG_ON() checks in the show_smaps() path to
understand DAX pmd entries. This fix is tagged for -stable.
- Fix a mis-merge of the nfit machine-check handler to flip the
polarity of an if() to match the final version of the patch that
Vishal sent for 4.8-rc1. Without this the nfit machine check
handler never detects / inserts new 'badblocks' entries which
applications use to identify lost portions of files.
- For test purposes, fix the nvdimm_clear_poison() path to operate on
legacy / simulated nvdimm memory ranges. Without this fix a test
can set badblocks, but never clear them on these ranges.
- Fix the range checking done by dax_dev_pmd_fault(). This is not
tagged for -stable since this problem is mitigated by specifying
aligned resources at device-dax setup time.
These patches have appeared in a next release over the past week. The
recent rebase you can see in the timestamps was to drop an invalid fix
as identified by the updated device-dax unit tests [1]. The -mm
touches have an ack from Andrew"
[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm: allow legacy (e820) pmem region to clear bad blocks
nfit, mce: Fix SPA matching logic in MCE handler
mm: fix cache mode of dax pmd mappings
mm: fix show_smap() for zone_device-pmd ranges
dax: fix mapping size check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Mostly driver bugfixes, but also a few cleanups which are nice to have
out of the way"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rk3x: Restore clock settings at resume time
i2c: Spelling s/acknowedge/acknowledge/
i2c: designware: save the preset value of DW_IC_SDA_HOLD
Documentation: i2c: slave-interface: add note for driver development
i2c: mux: demux-pinctrl: run properly with multiple instances
i2c: bcm-kona: fix inconsistent indenting
i2c: rcar: use proper device with dma_mapping_error
i2c: sh_mobile: use proper device with dma_mapping_error
i2c: mux: demux-pinctrl: invalidate properly when switching fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull fscrypto fixes fromTed Ts'o:
"Fix some brown-paper-bag bugs for fscrypto, including one one which
allows a malicious user to set an encryption policy on an empty
directory which they do not own"
* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fscrypto: require write access to mount to set encryption policy
fscrypto: only allow setting encryption policy on directories
fscrypto: add authorization check for setting encryption policy
|
|
Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem. This was handled correctly by f2fs but not by ext4. Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files. This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.
In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.
This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later. It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.
This patch restores the !S_ISDIR() check that was present in older
kernels.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access. This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).
Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy. This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.
(*) Or also on any regular file, for f2fs v4.6 and later and ext4
v4.8-rc1 and later; a separate bug fix is coming for that.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation
where legacy pmem is being used or a pmem region created by using memmap
kernel parameter, the injected bad blocks are not cleared due to
nvdimm_clear_poison() failing from lack of ndctl function pointer. In
this case we need to just return as handled and allow the bad blocks to
be cleared rather than fail.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The check for a 'pmem' type SPA in the MCE handler was inverted due to a
merge/rebase error.
Fixes: 6839a6d nfit: do an ARS scrub on hitting a latent media error
Cc: linux-acpi@vger.kernel.org
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage. DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).
track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range. While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not. So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.
Cc: <stable@vger.kernel.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
currently results in the following VM_BUG_ONs:
kernel BUG at mm/huge_memory.c:1105!
task: ffff88045f16b140 task.stack: ffff88045be14000
RIP: 0010:[<ffffffff81268f9b>] [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
[..]
Call Trace:
[<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
[<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
[<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
[<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
[<ffffffff81307656>] show_smap+0xa6/0x2b0
kernel BUG at fs/proc/task_mmu.c:585!
RIP: 0010:[<ffffffff81306469>] [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
Call Trace:
[<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
[<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
[<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
[<ffffffff81307696>] show_smap+0xa6/0x2b0
These locations are sanity checking page flags that must be set for an
anonymous transparent huge page, but are not set for the zone_device
pages associated with dax mappings.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Pull virtio fixes from Michael Tsirkin:
"This includes a couple of bugfixs for virtio.
The virtio console patch is actually also in x86/tip targeting 4.9
because it helps vmap stacks, but it also fixes IOMMU_PLATFORM which
was added in 4.8, and it seems important not to ship that in a broken
configuration"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_console: Stop doing DMA on the stack
virtio: mark vring_dma_dev() static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"This includes a PM QoS framework fix from Tejun to prevent interrupts
from being enabled unexpectedly during early boot and a cpufreq
documentation fix.
Specifics:
- If the PM QoS framework invokes cancel_delayed_work_sync() during
early boot, it will enable interrupts which is not expected at that
point, so prevent it from happening (Tejun Heo)
- Fix cpufreq statistic documentation to follow a recent change in
behavior that forgot to update it as appropriate (Jean Delvare)"
* tag 'pm-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq-stats: Minor documentation fix
PM / QoS: avoid calling cancel_delayed_work_sync() during early boot
|
|
* pm-core-fixes:
PM / QoS: avoid calling cancel_delayed_work_sync() during early boot
* pm-cpufreq-fixes:
cpufreq-stats: Minor documentation fix
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Some GPIO fixes that have been boiling the last two weeks or so.
Nothing special, I'm trying to sort out some Kconfig business and
Russell needs a fix in for -his SA1100 rework.
Summary:
- Revert a pointless attempt to add an include to solve the UM allyes
compilation problem.
- Make the mcp23s08 depend on OF_GPIO as it uses it and doesn't
compile properly without it.
- Fix a probing problem for ucb1x00"
* tag 'gpio-v4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: sa1100: fix irq probing for ucb1x00
gpio: mcp23s08: make driver depend on OF_GPIO
Revert "gpio: include <linux/io-mapping.h> in gpiolib-of"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
"This fixes a deadlock when fuse, direct I/O and loop device are
combined"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: direct-io: don't dirty ITER_BVEC pages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fix from Miklos Szeredi:
"This fixes a regression caused by the last pull request"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix workdir creation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"I'm not proud of how long it took me to track down that one liner in
btrfs_sync_log(), but the good news is the patches I was trying to
blame for these problems were actually fine (sorry Filipe)"
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress
btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
btrfs: do not decrease bytes_may_use when replaying extents
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We've got quite a few fixes at this time, and all are stable patches.
syzkaller strikes back again (episode 19 or so), and we had to plug
some holes in ALSA core part (mostly timer).
In addition, a couple of FireWire audio fixes for the invalid copy
user calls in locks, and a few quirks for HD-audio and USB-audio as
usual are included"
* tag 'sound-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: rawmidi: Fix possible deadlock with virmidi registration
ALSA: timer: Fix zero-division by continue of uninitialized instance
ALSA: timer: fix NULL pointer dereference in read()/ioctl() race
ALSA: fireworks: accessing to user space outside spinlock
ALSA: firewire-tascam: accessing to user space outside spinlock
ALSA: hda - Enable subwoofer on Dell Inspiron 7559
ALSA: hda - Add headset mic quirk for Dell Inspiron 5468
ALSA: usb-audio: Add sample rate inquiry quirk for B850V3 CP2114
ALSA: timer: fix NULL pointer dereference on memory allocation failure
ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE
|
|
virtio_console uses a small DMA buffer for control requests. Move
that buffer into heap memory.
Doing virtio DMA on the stack is normally okay on non-DMA-API virtio
systems (which is currently most of them), but it breaks completely
if the stack is virtually mapped.
Tested by typing both directions using picocom aimed at /dev/hvc0.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
|
|
We get 1 warning when building kernel with W=1:
drivers/virtio/virtio_ring.c:170:16: warning: no previous prototype for 'vring_dma_dev' [-Wmissing-prototypes]
In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
so this patch marks this function with 'static'.
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- smp_mb__before_spinlock() changed to smp_mb() on arm64 since the
generic definition to smp_wmb() is not sufficient
- avoid a recursive loop with the graph tracer by using using
preempt_(enable|disable)_notrace in _percpu_(read|write)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: use preempt_disable_notrace in _percpu_read/write
arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Don't alias user region to other regions below PAGE_OFFSET from
Paul Mackerras
- Fix again csum_partial_copy_generic() on 32-bit from Christophe
Leroy
- Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
Fixes for code merged this cycle:
- Fix crash on releasing compound PE from Gavin Shan
- Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
- Fix little endian build with CONFIG_KEXEC=n from Thiago Jung
Bauermann"
* tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
powerpc/32: Fix again csum_partial_copy_generic()
powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE
powerpc/powernv: Fix crash on releasing compound PE
powerpc/xics/opal: Fix processor numbers in OPAL ICP
powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
|
|
Pull ARM fixes from Russell King:
"A few ARM fixes:
- Robin Murphy noticed that the non-secure privileged entry was
relying on undefined behaviour, which needed to be fixed.
- Vladimir Murzin noticed that prov-v7 fails to build for MMUless
configurations because a required header file wasn't included.
- A bunch of fixes for StrongARM regressions found while testing
4.8-rc on such platforms"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: sa1100: clear reset status prior to reboot
ARM: 8600/1: Enforce some NS-SVC initialisation
ARM: 8599/1: mm: pull asm/memory.h explicitly
ARM: sa1100: register clocks early
ARM: sa1100: fix 3.6864MHz clock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v4.8-rc6
Unfortunately we have a bogus dwc3 patch leaked through the cracks and
got merged into Linus' HEAD. That patch ended up causing off-by-1 error
in our TRB accounting logic. Thankfully John Youn found out the problem
and we provided a revert to the bogus dwc3 patch in no time.
Apart from this off-by-1 error, we have two fixes to the Renesas drivers,
a small fix to our generic phy driver, a NULL pointer dereference fix for
f_eem and a build warning fix in dwc3.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-linus
Peter writes:
Fix the possible kernel panic when the hardware signal is bad for chipidea udc.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Second set of IIO fixes for the 4.8 cycle.
We have a big rework of the kxsd9 driver queued up behind the fix below and
a fix for a recent fix that was marked for stable.
Hence this fix series is perhaps a little more urgent than average for IIO.
* core
- a fix for a fix in the last set. The recent fix for blocking ops when
! task running left a path (unlikely one) in which the function return
value was not set - so initialise it to 0.
- The IIO_TYPE_FRACTIONAL code previously didn't cope with negative
fractions. Turned out a fix for this was in Analog's tree but hadn't made
it upstream.
* bmc150
- reset chip at init time. At least one board out there ends up coming up
in an unstable state due to noise during power up. The reset does no
harm on other boards.
* kxsd9
- Fix a bug in the reported scaling due to failing to set the integer
part to 0.
* hid-sensors-pressure
- Output was in the wrong units to comply with the IIO ABI.
* tools
- iio_generic_buffer: Fix the trigger-less mode by ensuring we don't fault
out for having no trigger when we explicitly said we didn't want to have
one.
|
|
When debug preempt or preempt tracer is enabled, preempt_count_add/sub()
can be traced by function and function graph tracing, and
preempt_disable/enable() would call preempt_count_add/sub(), so in Ftrace
subsystem we should use preempt_disable/enable_notrace instead.
In the commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap
like events do") the function this_cpu_read() was added to
trace_graph_entry(), and if this_cpu_read() calls preempt_disable(), graph
tracer will go into a recursive loop, even if the tracing_on is
disabled.
So this patch change to use preempt_enable/disable_notrace instead in
this_cpu_read().
Since Yonghui Yang helped a lot to find the root cause of this problem,
so also add his SOB.
Signed-off-by: Yonghui Yang <mark.yang@spreadtrum.com>
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
to a full barrier, such that prior stores are ordered with respect to
loads and stores occuring inside the critical section.
Unfortunately, the core code defines the barrier as smp_wmb(), which
is insufficient to provide the required ordering guarantees when used in
conjunction with our load-acquire-based spinlock implementation.
This patch overrides the arm64 definition of smp_mb__before_spinlock()
to map to a full smp_mb().
Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Problems with the signal integrity of the high speed USB data lines or
noise on reference ground lines can cause the i.MX6 USB controller to
violate USB specs and exhibit unexpected behavior.
It was observed that USBi_UI interrupts were triggered first and when
isr_setup_status_phase was called, ci->status was NULL, which lead to a
NULL pointer dereference kernel panic.
This patch fixes the kernel panic, emits a warning once and returns
-EPIPE to halt the device and let the host get stalled.
It also adds a comment to point people, who are experiencing this issue,
to their USB hardware design.
Cc: <stable@vger.kernel.org> #4.1+
Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
|
|
The cpufreq-stats code can no longer be built as a module, so it now
appears with square brackets in menuconfig.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 1aefc75b2449 (cpufreq: stats: Make the stats code non-modular)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|