Age | Commit message (Collapse) | Author | Files | Lines |
|
... using the biggest hammer we have. This is essentially a weaponized
version of the timeout-based wedging Chris added in
commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 22 11:56:25 2017 +0100
drm/i915: Break modeset deadlocks on reset
Because defense-in-depth is good it's good to still have both. Also
note that with the locking change we can now restrict this a lot (old
gpus and special testing only), so this doesn't kill the TDR benefits
on at least anything remotely modern.
And futuremore with a few tricks it should be possible to make a much
more educated guess about whether an atomic commit is stuck waiting on
the gpu (atomic_t counting the pending i915_sw_fence used by the
atomic modeset code should do it), so we can improve this.
But for now just start with something that is guaranteed to recover
faster, for much better CI througput.
This defacto reverts TDR on these platforms, but there's not really a
single commit to specify as the sole offender.
v2: Add a debug message to explain what's going on. We can't DRM_ERROR
because that spams CI. And the timeout based fallback still prints a
DRM_ERROR, in case something goes wrong.
v3: Fix comment layout (Michel)
Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170808080828.23650-1-daniel.vetter@ffwll.ch
(cherry picked from commit 97154ec242c14f646a3ab3b4da8f838d197f300d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
When switching between contexts using the aliasing_ppgtt, the VM is
shared. We don't need to reload the PD registers unless they are dirty.
Martin Peres reported an issue that looks like corruption between
Haswell context switches, bisecting to commit f9326be5f1d3 ("drm/i915:
Rearrange switch_context to load the aliasing ppgtt on first use").
Switching between the same mm (the aliasing_ppgtt is used for all
contexts in this case) should be a nop, but appears to trigger some
side-effects in the context switch. However, as we know the switch
is redundant in this case, we can skip it and continue to ignore the
issue until somebody feels strong enough to investigate full-ppgtt on
gen7 again!
Except.. Martin was using full-ppgtt which is not supported as it
doesn't work correctly yet. So whilst the bisect did yield valuable
information about the failures, the fix should not have any user impact
under default settings, with the exception of a slightly lower
throughput on xcs as the VM would always be reloaded.
v2: Also remember to set the legacy_active_context following the switch
on xcs (commit e8a9c58fcd9a ("drm/i915: Unify active context tracking
between legacy/execlists/guc"))
Fixes: f9326be5f1d3 ("drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use")
Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc")
Reported-by: Martin Peres <martin.peres@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170812152724.6883-1-chris@chris-wilson.co.uk
(cherry picked from commit 12124bea5b82dc1e917304aed703c27292270051)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
For 0.85V cnl_get_buf_trans_edp() returns the DP table, instead of EDP.
Use the correct table.
The error was pointed out by this clang warning:
drivers/gpu/drm/i915/intel_ddi.c:392:39: warning: variable
'cnl_ddi_translations_edp_0_85V' is not needed and will not be emitted
[-Wunneeded-internal-declaration]
static const struct cnl_ddi_buf_trans cnl_ddi_translations_edp_0_85V[] = {
Fixes: cf54ca8bc567 ("drm/i915/cnl: Implement voltage swing sequence.")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170717195854.192139-1-mka@chromium.org
(cherry picked from commit 50946c89850db13bd672c664aec6cf4551f71fe9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
A missing part to EU slice power gating is the
debugfs interface. This patch actually should have been
squashed to the initial EU slice power gating one.
v2: Initial patch was merged without this part.
Fixes: c7ae7e9ab207 ("drm/i915/cnl: Configure EU slice power gating.")
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170809200702.11236-1-rodrigo.vivi@intel.com
(cherry picked from commit 7ea1adf30f82a4c0910524ac06f8f1f26281bb23)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
As we may have just bound the renderstate into the GGTT for execution, we
need to ensure that the GTT TLB are also flushed.
On snb-gt2, this would cause a random GPU hang at the start of a new
context (e.g. boot) and on snb-gt1, it was causing the renderstate batch
to take ~10s. It was the GPU hang that revealed the truth, as the CS
gleefully executed beyond the end of the golden renderstate batch, a good
indicator for a GTT TLB miss.
Fixes: 20fe17aa52dc ("drm/i915: Remove redundant TLB invalidate on switching contexts")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20170808131904.1385-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.12-rc1+
(cherry picked from commit 802673d66f8a6ded5d2689d597853c7bb3a70163)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
This function is not part of the driver anymore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 90f4fcd56bda ("drm/i915: Remove forced stop ring on suspend/unload")
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170804140348.24971-1-lionel.g.landwerlin@intel.com
(cherry picked from commit fe29133df37ac31de9e657ad91bcf74cdfe8c4cd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are three firmware core fixes for 4.13-rc5.
All three of these fix reported issues and have been floating around
for a few weeks. They have been in linux-next with no reported
problems"
* tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware: avoid invalid fallback aborts by using killable wait
firmware: fix batched requests - send wake up on failure on direct lookups
firmware: fix batched requests - wake all waiters
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are two patches for 4.13-rc5.
One is a fix for a reported thunderbolt issue, and the other a fix for
an MEI driver issue. Both have been in linux-next with no reported
issues"
* tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
thunderbolt: Do not enumerate more ports from DROM than the controller has
mei: exclude device from suspend direct complete optimization
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are two tty serial driver fixes for 4.13-rc5. One is a revert of
a -rc1 patch that turned out to not be a good idea, and the other is a
fix for the pl011 serial driver.
Both have been in linux-next with no reported issues"
* tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: Delete dead code for CIR serial ports"
tty: pl011: fix initialization order of QDF2400 E44
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/iio fixes from Greg KH:
"Here are some Staging and IIO driver fixes for 4.13-rc5.
Nothing major, just a number of small fixes for reported issues. All
of these have been in linux-next for a while now with no reported
issues. Full details are in the shortlog"
* tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING
iio: aspeed-adc: wait for initial sequence.
iio: accel: bmc150: Always restore device to normal mode after suspend-resume
staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
iio: adc: axp288: Fix the GPADC pin reading often wrongly returning 0
iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
iio: accel: st_accel: add SPI-3wire support
iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications"
iio: adc: sun4i-gpadc-iio: fix unbalanced irq enable/disable
iio: pressure: st_pressure_core: disable multiread by default for LPS22HB
iio: light: tsl2563: use correct event code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes and new device ids for
4.13-rc5. There is the usual gadget driver fixes, some new quirks for
"messy" hardware, and some new device ids.
All have been in linux-next with no reported issues"
* tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: pl2303: add new ATEN device id
usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
USB: Check for dropped connection before switching to full speed
usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
usb: renesas_usbhs: gadget: fix unused-but-set-variable warning
usb: renesas_usbhs: Fix UGCTRL2 value for R-Car Gen3
usb: phy: phy-msm-usb: Fix usage of devm_regulator_bulk_get()
usb: gadget: udc: renesas_usb3: Fix usb_gadget_giveback_request() calling
usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets
USB: serial: option: add D-Link DWM-222 device ID
usb: musb: fix tx fifo flush handling again
usb: core: unlink urbs from the tail of the endpoint's urb_list
usb-storage: fix deadlock involving host lock and scsi_done
uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
USB: hcd: Mark secondary HCD as dead if the primary one died
USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
|
|
Pull another MTD fix from Brian Norris:
"An mtdblock regression occurred in -rc1 (all writes were broken!), in
the process of some block subsystem refactoring. Noticed and fixed
last week, but I'm a little slow on the uptake"
* tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd:
mtd: blkdevs: Fix mtd block write failure
|
|
All the MTD block write requests are failing with
following error messages
mkfs.ext4 /dev/mtdblock0
print_req_error: I/O error, dev mtdblock0, sector 0
Buffer I/O error on dev mtdblock0, logical block 0,
lost async page write
The control is going to default case after block write request
because of missing return.
Fixes: commit 2a842acab109 ("block: introduce new block status code type")
Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
|
|
Pull SCSI target fixes from Nicholas Bellinger:
"The highlights include:
- Fix iscsi-target payload memory leak during
ISCSI_FLAG_TEXT_CONTINUE (Varun Prakash)
- Fix tcm_qla2xxx incorrect use of tcm_qla2xxx_free_cmd during ABORT
(Pascal de Bruijn + Himanshu Madhani + nab)
- Fix iscsi-target long-standing issue with parallel delete of a
single network portal across multiple target instances (Gary Guo +
nab)
- Fix target dynamic se_node GPF during uncached shutdown regression
(Justin Maggard + nab)"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix node_acl demo-mode + uncached dynamic shutdown regression
iscsi-target: Fix iscsi_np reset hung task during parallel delete
qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
cxgbit: fix sg_nents calculation
iscsi-target: fix invalid flags in text response
iscsi-target: fix memory leak in iscsit_setup_text_cmd()
cxgbit: add missing __kfree_skb()
tcmu: free old string on reconfig
tcmu: Fix possible to/from address overflow when doing the memcpy
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Some fixes for Xen:
- a fix for a regression introduced in 4.13 for a Xen HVM-guest
configured with KASLR
- a fix for a possible deadlock in the xenbus driver when booting the
system
- a fix for lost interrupts in Xen guests"
* tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: Fix interrupt lost during irq_disable and irq_enable
xen: avoid deadlock in xenbus
xen: fix hvm guest with kaslr enabled
xen: split up xen_hvm_init_shared_info()
x86: provide an init_mem_mapping hypervisor hook
|
|
Pull block fixes from Jens Axboe:
"A set of fixes that should go into this series. This contains:
- Fix from Bart for blk-mq requeue queue running, preventing a
continued loop of run/restart.
- Fix for a bio/blk-integrity issue, in two parts. One from
Christoph, fixing where verification happens, and one from Milan,
for a NULL profile.
- NVMe pull request, most of the changes being for nvme-fc, but also
a few trivial core/pci fixes"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: fix directive command numd calculation
nvme: fix nvme reset command timeout handling
nvme-pci: fix CMB sysfs file removal in reset path
lpfc: support nvmet_fc defer_rcv callback
nvmet_fc: add defer_req callback for deferment of cmd buffer return
nvme: strip trailing 0-bytes in wwid_show
block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time
bio-integrity: only verify integrity on the lowest stacked driver
bio-integrity: Fix regression if profile verify_fn is NULL
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- fix lockdep splat when removing mmc_block module
- fix the logic for setting eMMC HS400ES signal voltage
MMC host:
- omap_hsmmc: add CMD23 capability to fix -EIO errors"
* tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: block: fix lockdep splat when removing mmc_block module
mmc: mmc: correct the logic for setting HS400ES signal voltage
mmc: host: omap_hsmmc: Add CMD23 capability to omap_hsmmc driver
|
|
Pull fbdev fixes from Bartlomiej Zolnierkiewicz:
- allow user to disable write combined mapping in efifb driver (Dave
Airlie)
- fix use after free bugs on driver removal in imxfb driver (Dan
Carpenter)
- fix unused variable warning in omapfb driver (Arnd Bergmann)
* tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux:
efifb: allow user to disable write combined mapping.
fbdev: omapfb: remove unused variable
video: fbdev: imxfb: use after free in imxfb_remove()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"Fix a NULL-pointer dereference in arm_smmu_add_device"
* tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"All fixes for code that went in this cycle.
- a revert of an optimisation to the syscall exit path, which could
lead to an oops on either older machines or machines with > 1TB of
memory
- disable some deep idle states if the firmware configuration for
them fails
- re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig
change
- six fairly small patches fixing bugs in our new watchdog code
Thanks to: Gautham R Shenoy, Nicholas Piggin"
* tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/watchdog: add locking around init/exit functions
powerpc/watchdog: Fix marking of stuck CPUs
powerpc/watchdog: Fix final-check recovered case
powerpc/watchdog: Moderate touch_nmi_watchdog overhead
powerpc/watchdog: Improve watchdog lock primitive
powerpc: NMI IPI improve lock primitive
powerpc/configs: Re-enable HARD/SOFT lockup detectors
powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails
Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"
|
|
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Here is a device has xen-pirq-MSI interrupt. Dom0 might lost interrupt
during driver irq_disable/irq_enable. Here is the scenario,
1. irq_disable -> disable_dynirq -> mask_evtchn(irq channel)
2. dev interrupt raised by HW and Xen mark its evtchn as pending
3. irq_enable -> startup_pirq -> eoi_pirq ->
clear_evtchn(channel of irq) -> clear pending status
4. consume_one_event process the irq event without pending bit assert
which result in interrupt lost once
5. No HW interrupt raising anymore.
Now use enable_dynirq for enable_pirq of xen_pirq_chip to remove
eoi_pirq when irq_enable.
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
When starting the xenwatch thread a theoretical deadlock situation is
possible:
xs_init() contains:
task = kthread_run(xenwatch_thread, NULL, "xenwatch");
if (IS_ERR(task))
return PTR_ERR(task);
xenwatch_pid = task->pid;
And xenwatch_thread() does:
mutex_lock(&xenwatch_mutex);
...
event->handle->callback();
...
mutex_unlock(&xenwatch_mutex);
The callback could call unregister_xenbus_watch() which does:
...
if (current->pid != xenwatch_pid)
mutex_lock(&xenwatch_mutex);
...
In case a watch is firing before xenwatch_pid could be set and the
callback of that watch unregisters a watch, then a self-deadlock would
occur.
Avoid this by setting xenwatch_pid in xenwatch_thread().
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Pull NVMe fixes from Christoph:
"A few more small fixes - the fc/lpfc update is the biggest by far."
|
|
git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Nothing too earth shattering here, it just seems like lots of little
things all over the place.
msm has probably the larger amount of changes, but they all seem fine,
otherwise, some rockchip, i915, etnaviv and exynos fixes, along with
one nouveau regression fix for some older GPUs"
* tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux: (35 commits)
drm/nouveau/disp/nv04: avoid creation of output paths
drm: make DRM_STM default n
drm/exynos: forbid creating framebuffers from too small GEM buffers
drm/etnaviv: Fix off-by-one error in reloc checking
drm/i915: fix backlight invert for non-zero minimum brightness
drm/i915/shrinker: Wrap need_resched() inside preempt-disable
drm/i915/perf: fix flex eu registers programming
drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut
drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8
drm/i915/gvt: Initialize MMIO Block with HW state
drm/rockchip: vop: report error when check resource error
drm/rockchip: vop: round_up pitches to word align
drm/rockchip: vop: fix NV12 video display error
drm/rockchip: vop: fix iommu page fault when resume
drm/i915/gvt: clean workload queue if error happened
drm/i915/gvt: change resetting to resetting_eng
drm/msm: gpu: don't abuse dma_alloc for non-DMA allocations
drm/msm: gpu: call qcom_mdt interfaces only for ARCH_QCOM
drm/msm/adreno: Prevent unclocked access when retrieving timestamps
drm/msm: Remove __user from __u64 data types
...
|
|
Merge misc fixes from Andrew Morton:
"21 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
zram: rework copy of compressor name in comp_algorithm_store()
rmap: do not call mmu_notifier_invalidate_page() under ptl
mm: fix list corruptions on shmem shrinklist
mm/balloon_compaction.c: don't zero ballooned pages
MAINTAINERS: copy virtio on balloon_compaction.c
mm: fix KSM data corruption
mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
mm: make tlb_flush_pending global
mm: refactor TLB gathering API
Revert "mm: numa: defer TLB flush for THP migration as long as possible"
mm: migrate: fix barriers around tlb_flush_pending
mm: migrate: prevent racy access to tlb_flush_pending
fault-inject: fix wrong should_fail() decision in task context
test_kmod: fix small memory leak on filesystem tests
test_kmod: fix the lock in register_test_dev_kmod()
test_kmod: fix bug which allows negative values on two config options
test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
mm: ratelimit PFNs busy info message
...
|
|
comp_algorithm_store() passes the size of the source buffer to strlcpy()
instead of the destination buffer size. Make it explicit that the two
buffers have the same size and use strcpy() instead of strlcpy(). The
latter can be done safely since the function ensures that the string in
the source buffer is terminated.
Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Work around Renesas uPD72020x 32-bit DMA issue"
* tag 'pci-v4.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
PCI: Add pci_reset_function_locked()
|
|
Some Alpine Ridge LP DROMs (there might be others) erroneusly list more
ports than the controller actually has. Most probably because DROM of
the full Dual/Single port Thunderbolt controller was reused for LP
version. The current DROM parser does not check the upper bound thus it
leads to crash when sw->ports[] is accessed over bounds:
BUG: unable to handle kernel NULL pointer dereference at 00000000000002ec
IP: tb_drom_read+0x383/0x890 [thunderbolt]
PGD 0
P4D 0
Oops: 0000 [#1] SMP
CPU: 3 PID: 12248 Comm: systemd-udevd Not tainted 4.13.0-rc1-next-20170719 #1
Hardware name: LENOVO 20HF000YGE/20HF000YGE, BIOS N1WET32W (1.11 ) 05/23/2017
task: ffff8a293e4bcd80 task.stack: ffffa698027a8000
RIP: 0010:tb_drom_read+0x383/0x890 [thunderbolt]
RSP: 0018:ffffa698027ab990 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8a2940af7800 RCX: 0000000000000000
RDX: ffff8a2940ebb400 RSI: 0000000000000000 RDI: ffffa698027ab9a0
RBP: ffffa698027ab9d0 R08: 0000000000000001 R09: 0000000000000002
R10: ffff8a2940ebb5b0 R11: 0000000000000000 R12: ffff8a293bfa968c
R13: 000000000000002c R14: 0000000000000056 R15: 0000000000000056
FS: 00007f0a945a38c0(0000) GS:ffff8a2961580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002ec CR3: 000000043e785000 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tb_switch_add+0x9d/0x730 [thunderbolt]
? tb_switch_alloc+0x3cd/0x4d0 [thunderbolt]
icm_start+0x5a/0xa0 [thunderbolt]
tb_domain_add+0xc3/0xf0 [thunderbolt]
nhi_probe+0x19e/0x310 [thunderbolt]
local_pci_probe+0x42/0xa0
pci_device_probe+0x18d/0x1a0
driver_probe_device+0x2ff/0x450
__driver_attach+0xa4/0xe0
? driver_probe_device+0x450/0x450
bus_for_each_dev+0x6e/0xb0
driver_attach+0x1e/0x20
bus_add_driver+0x1d0/0x270
? 0xffffffffc0bbb000
driver_register+0x60/0xe0
? 0xffffffffc0bbb000
__pci_register_driver+0x4c/0x50
nhi_init+0x28/0x1000 [thunderbolt]
do_one_initcall+0x50/0x190
? __vunmap+0x81/0xb0
? _cond_resched+0x1a/0x50
? kmem_cache_alloc_trace+0x15f/0x1c0
? do_init_module+0x27/0x1e9
do_init_module+0x5f/0x1e9
load_module+0x24e7/0x2a60
? vfs_read+0x115/0x130
SYSC_finit_module+0xfc/0x120
? SYSC_finit_module+0xfc/0x120
SyS_finit_module+0xe/0x10
do_syscall_64+0x67/0x170
entry_SYSCALL64_slow_path+0x25/0x25
Fix this by making sure we only enumerate DROM port entries the hardware
actually has.
Reported-by: Christian Kellner <ckellner@redhat.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Christian Kellner <ckellner@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
MEI device performs link reset during system suspend sequence.
The link reset cannot be performed while device is in
runtime suspend state. The resume sequence is bypassed with
suspend direct complete optimization,so the optimization should be
disabled for mei devices.
Fixes:
[ 192.940537] Restarting tasks ...
[ 192.940610] PGI is not set
[ 192.940619] ------------[ cut here ]------------ [ 192.940623]
WARNING: CPU: 0
me.c:653 mei_me_pg_exit_sync+0x351/0x360 [ 192.940624] Modules
linked
in:
[ 192.940627] CPU: 0 PID: 1661 Comm: kworker/0:3 Not tainted
4.13.0-rc2+
#2 [ 192.940628] Hardware name: Dell Inc. XPS 13 9343/0TM99H, BIOS
A11
12/08/2016 [ 192.940630] Workqueue: pm pm_runtime_work <snip> [
192.940642] Call Trace:
[ 192.940646] ? pci_pme_active+0x1de/0x1f0 [ 192.940649] ?
pci_restore_standard_config+0x50/0x50
[ 192.940651] ? kfree+0x172/0x190
[ 192.940653] ? kfree+0x172/0x190
[ 192.940655] ? pci_restore_standard_config+0x50/0x50
[ 192.940663] mei_me_pm_runtime_resume+0x3f/0xc0
[ 192.940665] pci_pm_runtime_resume+0x7a/0xa0 [ 192.940667]
__rpm_callback+0xb9/0x1e0 [ 192.940668] ?
preempt_count_add+0x6d/0xc0 [ 192.940670] rpm_callback+0x24/0x90 [
192.940672] ? pci_restore_standard_config+0x50/0x50
[ 192.940674] rpm_resume+0x4e8/0x800 [ 192.940676]
pm_runtime_work+0x55/0xb0 [ 192.940678]
process_one_work+0x184/0x3e0 [ 192.940680]
worker_thread+0x4d/0x3a0 [ 192.940681] ?
preempt_count_sub+0x9b/0x100 [ 192.940683]
kthread+0x122/0x140 [ 192.940684] ? process_one_work+0x3e0/0x3e0 [
192.940685] ? __kthread_create_on_node+0x1a0/0x1a0
[ 192.940688] ret_from_fork+0x27/0x40 [ 192.940690] Code: 96 3a
9e ff 48 8b 7d 98 e8 cd 21 58 00 83 bb bc 01 00 00
04 0f 85 40 fe ff ff e9 41 fe ff ff 48 c7 c7 5f 04 99 96 e8 93 6b 9f
ff <0f> ff e9 5d fd ff ff e8 33 fe 99 ff 0f 1f 00 0f 1f 44 00 00 55
[ 192.940719] ---[ end trace
a86955597774ead8 ]--- [ 192.942540] done.
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
is interrupted") added via 4.0 added support to abort the fallback mechanism
when a signal was detected and wait_for_completion_interruptible() returned
-ERESTARTSYS -- for instance when a user hits CTRL-C. The abort was overly
*too* effective.
When a child process terminates (successful or not) the signal SIGCHLD can
be sent to the parent process which ran the child in the background and
later triggered a sync request for firmware through a sysfs interface which
relies on the fallback mechanism. This signal in turn can be recieved by the
interruptible wait we constructed on firmware_class and detects it as an
abort *before* userspace could get a chance to write the firmware. Upon
failure -EAGAIN is returned, so userspace is also kept in the dark about
exactly what happened.
We can reproduce the issue with the fw_fallback.sh selftest:
Before this patch:
$ sudo tools/testing/selftests/firmware/fw_fallback.sh
...
tools/testing/selftests/firmware/fw_fallback.sh: error - sync firmware request cancelled due to SIGCHLD
After this patch:
$ sudo tools/testing/selftests/firmware/fw_fallback.sh
...
tools/testing/selftests/firmware/fw_fallback.sh: SIGCHLD on sync ignored as expected
Fix this by making the wait killable -- only killable by SIGKILL (kill -9).
We loose the ability to allow userspace to cancel a write with CTRL-C
(SIGINT), however its been decided the compromise to require SIGKILL is
worth the gains.
Chances of this issue occuring are low due to the number of drivers upstream
exclusively relying on the fallback mechanism for firmware (2 drivers),
however this is observed in the field with custom drivers with sysfs
triggers to load firmware. Only distributions relying on the fallback
mechanism are impacted as well. An example reported issue was on Android,
as follows:
1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
unrelated to firmware loading, it could be sleep 2; for all we care ]
2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
ends up calling request_firmware() kernel side
3) The firmware loading fallback mechanism is used, the request is sent to
userspace and pid 1 waits in the kernel on wait_*
4) before firmware loading completes pid 42 dies (for any reason, even
normal termination)
5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
causes -ERESTARTSYS to be returned from wait_*
6) The kernel's wait aborts and return -EAGAIN for the
request_firmware() caller.
Cc: stable <stable@vger.kernel.org> # 4.0
Fixes: 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted")
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Martin Fuzzey <mfuzzey@parkeon.com>
Reported-by: Martin Fuzzey <mfuzzey@parkeon.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix batched requests from waiting forever on failure.
The firmware API batched requests feature has been broken since the API call
request_firmware_direct() was introduced on commit bba3a87e982ad ("firmware:
Introduce request_firmware_direct()"), added on v3.14 *iff* the firmware
being requested was not present in *certain kernel builds* [0].
When no firmware is found the worker which goes on to finish never informs
waiters queued up of this, so any batched request will stall in what seems
to be forever (MAX_SCHEDULE_TIMEOUT). Sadly, a reboot will also stall, as
the reboot notifier was only designed to kill custom fallback workers. The
issue seems to the user as a type of soft lockup, what *actually* happens
underneath the hood is a wait call which never completes as we failed to
issue a completion on error.
For device drivers with optional firmware schemes (ie, Intel iwlwifi, or
Netronome -- even though it uses request_firmware() and not
request_firmware_direct()), this could mean that when you boot a system with
multiple cards the firmware will seem to never load on the system, or that
the card is just not responsive even the driver initialization. Due to
differences in scheduling possible this should not always trigger --
one would need to to ensure that multiple requests are in place at the
right time for this to work, also release_firmware() must not be called
prior to any other incoming request. The complexity may not be worth
supporting batched requests in the future given the wait mechanism is
only used also for the fallback mechanism. We'll keep it for now and
just fix it.
Its reported that at least with the Intel WiFi cards on one system this
issue was creeping up 50% of the boots [0].
Before this commit batched requests testing revealed:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y
Most common Linux distribution setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) FAIL OK
request_firmware_nowait(uevent=false) FAIL OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n
Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) FAIL OK
request_firmware_nowait(uevent=false) FAIL OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y
Google Android setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================
Ater this commit batched testing results:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y
Most common Linux distribution setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() OK OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n
Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() OK OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y
Google Android setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() OK OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================
[0] https://bugzilla.kernel.org/show_bug.cgi?id=195477
Cc: stable <stable@vger.kernel.org> # v3.14
Fixes: bba3a87e982ad ("firmware: Introduce request_firmware_direct()"
Reported-by: Nicolas <nbroeking@me.com>
Reported-by: John Ewalt <jewalt@lgsinnovations.com>
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The firmware cache mechanism serves two purposes, the secondary purpose is
not well documented nor understood. This fixes a regression with the
secondary purpose of the firmware cache mechanism: batched requests on
successful lookups. Without this fix *any* time a batched request is
triggered, secondary requests for which the batched request mechanism
was designed for will seem to last forver and seem to never return.
This issue is present for all kernel builds possible, and a hard reset
is required.
The firmware cache is used for:
1) Addressing races with file lookups during the suspend/resume cycle
by keeping firmware in memory during the suspend/resume cycle
2) Batched requests for the same file rely only on work from the first file
lookup, which keeps the firmware in memory until the last
release_firmware() is called
Batched requests *only* take effect if secondary requests come in prior to
the first user calling release_firmware(). The devres name used for the
internal firmware cache is used as a hint other pending requests are
ongoing, the firmware buffer data is kept in memory until the last user of
the buffer calls release_firmware(), therefore serializing requests and
delaying the release until all requests are done.
Batched requests wait for a wakup or signal so we can rely on the first file
fetch to write to the pending secondary requests. Commit 5b029624948d
("firmware: do not use fw_lock for fw_state protection") ported the firmware
API to use swait, and in doing so failed to convert complete_all() to
swake_up_all() -- it used swake_up(), loosing the ability for *some* batched
requests to take effect.
We *could* fix this by just using swake_up_all() *but* swait is now known
to be very special use case, so its best to just move away from it. So we
just go back to using completions as before commit 5b029624948d ("firmware:
do not use fw_lock for fw_state protection") given this was using
complete_all().
Without this fix it has been reported plugging in two Intel 6260 Wifi cards
on a system will end up enumerating the two devices only 50% of the time
[0]. The ported swake_up() should have actually handled the case with two
devices, however, *if more than two cards are used* the swake_up() would
not have sufficed. This change is only part of the required fixes for
batched requests. Another fix is provided in the next patch.
This particular change should fix the cases where more than three requests
with the same firmware name is used, otherwise batched requests will wait
for MAX_SCHEDULE_TIMEOUT and just timeout eventually.
Below is a summary of tests triggering batched requests on different
kernel builds.
Before this patch:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y
Most common Linux distribution setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL FAIL
request_firmware_direct() FAIL FAIL
request_firmware_nowait(uevent=true) FAIL FAIL
request_firmware_nowait(uevent=false) FAIL FAIL
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n
Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL FAIL
request_firmware_direct() FAIL FAIL
request_firmware_nowait(uevent=true) FAIL FAIL
request_firmware_nowait(uevent=false) FAIL FAIL
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y
Google Android setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL FAIL
request_firmware_direct() FAIL FAIL
request_firmware_nowait(uevent=true) FAIL FAIL
request_firmware_nowait(uevent=false) FAIL FAIL
============================================================================
After this patch:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y
Most common Linux distribution setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) FAIL OK
request_firmware_nowait(uevent=false) FAIL OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n
Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() FAIL OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) FAIL OK
request_firmware_nowait(uevent=false) FAIL OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y
Google Android setup.
API-type no-firmware-found firmware-found
----------------------------------------------------------------------
request_firmware() OK OK
request_firmware_direct() FAIL OK
request_firmware_nowait(uevent=true) OK OK
request_firmware_nowait(uevent=false) OK OK
============================================================================
[0] https://bugzilla.kernel.org/show_bug.cgi?id=195477
CC: <stable@vger.kernel.org> [4.10+]
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 5b029624948d ("firmware: do not use fw_lock for fw_state protection")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This adds a new ATEN device id for a new pl2303-based device.
Reported-by: Peter Kuo <PeterKuo@aten.com.tw>
Cc: stable <stable@vger.kernel.org>
Cc: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Moshi USB to Ethernet Adapter internally uses a Genesys Logic hub to
connect to Realtek r8153.
The Realtek r8153 ethernet does not work on the internal hub, no-lpm quirk
can make it work.
Since another r8153 dongle at my hand does not have the issue, so add
the quirk to the Genesys Logic hub instead.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Some buggy USB disk adapters disconnect and reconnect multiple times
during the enumeration procedure. This may lead to a device
connecting at full speed instead of high speed, because when the USB
stack sees that a device isn't able to enumerate at high speed, it
tries to hand the connection over to a full-speed companion
controller.
The logic for doing this is careful to check that the device is still
connected. But this check is inadequate if the device disconnects and
reconnects before the check is done. The symptom is that a device
works, but much more slowly than it is capable of operating.
The situation was made worse recently by commit 22547c4cc4fe ("usb:
hub: Wait for connection to be reestablished after port reset"), which
increases the delay following a reset before a disconnect is
recognized, thus giving the device more time to reconnect.
This patch makes the check more robust. If the device was
disconnected at any time during enumeration, we will now skip the
full-speed handover.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Certain HP keyboards would keep inputting a character automatically which
is the wake-up key after S3 resume
On some AMD platforms USB host fails to respond (by holding resume-K) to
USB device (an HP keyboard) resume request within 1ms (TURSM) and ensures
that resume is signaled for at least 20 ms (TDRSMDN), which is defined in
USB 2.0 spec. The result is that the keyboard is out of function.
In SNPS USB design, the host responds to the resume request only after
system gets back to S0 and the host gets to functional after the internal
HW restore operation that is more than 1 second after the initial resume
request from the USB device.
As a workaround for specific keyboard ID(HP Keyboards), applying port reset
after resume when the keyboard is plugged in.
Signed-off-by: Sandeep Singh <Sandeep.Singh@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
cc: Nehal Shah <Nehal-bakulchandra.Shah@amd.com>
Reviewed-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The numd field of directive receive command takes number of dwords to
transfer. This fix has the correct calculation for numd.
Signed-off-by: Kwan (Hingkwan) Huen-SSI <kwan.huen@samsung.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We need to return an error if a timeout occurs on any NVMe command during
initialization. Without this, the nvme reset work will be stuck. A timeout
will have a negative error code, meaning we need to stop initializing
the controller. All postitive returns mean the controller is still usable.
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196325
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Martin Peres <martin.peres@intel.com>
[jth consolidated cleanup path ]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Pull networking fixes from David Miller:
1) Fix handling of initial STATE message in TIPC, from Jon Paul Maloy.
2) Fix stats handling in bcm_sysport_get_stats(), from Florian
Fainelli.
3) Reject 16777215 VNI value in geneve_validate(), from Girish
Moodalbail.
4) Fix initial IGMP sysctl setting regression, from Nikolay Borisov.
5) Once a UFO fragmented frame is treated as UFO, we should continue
doing so. Likewise once a frame has been segmented, we should
continue doing that and not try to convert it to a UFO frame. From
Willem de Bruijn.
6) Test the AF_PACKET RX/TX ring pg_vec state under the socket lock to
prevent races. From Willem de Bruijn.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
packet: fix tp_reserve race in packet_set_ring
udp: consistently apply ufo or fragmentation
net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
igmp: Fix regression caused by igmp sysctl namespace code.
geneve: maximum value of VNI cannot be used
net: systemport: Fix software statistics for SYSTEMPORT Lite
tipc: remove premature ESTABLISH FSM event at link synchronization
|
|
Pull sparc updates from David Miller:
1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.
2) Prevent crashes when bringing up sunvdc virtual block devices in
some environments. From Jim Quigley.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain
sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.
sparc64: recognize and support sparc M8 cpu type
sparc64: properly name the cpu constants
|
|
Currently we create the sysfs entry even if we fail mapping
it. In that case, the unmapping will not remove the sysfs created
file. There is no good reason to create a sysfs entry for a non
working CMB and show his characteristics.
Fixes: f63572dff ("nvme: unmap CMB and remove sysfs file in reset path")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Currently, calls to nvmet_fc_rcv_fcp_req() always copied the
FC-NVME cmd iu to a temporary buffer before returning, allowing
the driver to immediately repost the buffer to the hardware.
To address timing conditions on queue element structures vs async
command reception, the nvmet_fc transport occasionally may need to
hold on to the command iu buffer for a short period. In these cases,
the nvmet_fc_rcv_fcp_req() will return a special return code
(-EOVERFLOW). In these cases, the LLDD must delay until the new
defer_rcv lldd callback is called before recycling the buffer back
to the hw.
This patch adds support for the new nvmet_fc transport defer_rcv
callback and recognition of the new error code when passing commands
to the transport.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
At queue creation, the transport allocates a local job struct
(struct nvmet_fc_fcp_iod) for each possible element of the queue.
When a new CMD is received from the wire, a jobs struct is allocated
from the queue and then used for the duration of the command.
The job struct contains buffer space for the wire command iu. Thus,
upon allocation of the job struct, the cmd iu buffer is copied to
the job struct and the LLDD may immediately free/reuse the CMD IU
buffer passed in the call.
However, in some circumstances, due to the packetized nature of FC
and the api of the FC LLDD which may issue a hw command to send the
wire response, but the LLDD may not get the hw completion for the
command and upcall the nvmet_fc layer before a new command may be
asynchronously received on the wire. In other words, its possible
for the initiator to get the response from the wire, thus believe a
command slot free, and send a new command iu. The new command iu
may be received by the LLDD and passed to the transport before the
LLDD had serviced the hw completion and made the teardown calls for
the original job struct. As such, there is no available job struct
available for the new io. E.g. it appears like the host sent more
queue elements than the queue size. It didn't based on it's
understanding.
Rather than treat this as a hard connection failure queue the new
request until the job struct does free up. As the buffer isn't
copied as there's no job struct, a special return value must be
returned to the LLDD to signify to hold off on recycling the cmd
iu buffer. And later, when a job struct is allocated and the
buffer copied, a new LLDD callback is introduced to notify the
LLDD and allow it to recycle it's command iu buffer.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Some broken controllers (such as earlier Linux targets) pad model or
serial fields with 0-bytes rather than spaces. The NVMe spec disallows
0 bytes in "ASCII" fields. Thus strip trailing 0-bytes, too. Also make
sure that we get no underflow for pathological input.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range
of values for it would be from 0 to 16777215 (2^24 -1). However, one
cannot create a geneve device with VNI set to 16777215. This patch fixes
this issue.
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With SYSTEMPORT Lite we have holes in our statistics layout that make us
skip over the hardware MIB counters, bcm_sysport_get_stats() was not
taking that into account resulting in reporting 0 for all SW-maintained
statistics, fix this by skipping accordingly.
Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using mpgroup to define multiple paths for a virtual disk causes multiple
virtual-device-port ports to be created for that virtual device.
Each virtual-device-port port then gets a vdisk created for it by the Linux
sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it
cannot handle multiple ports for a single vdisk, leading to a kernel panic
at startup.
This fix prevents more than one vdisk per virtual-device-port being created
until full virtual disk multipathing (mpgroup) support is implemented.
Signed-off-by: Jim Quigley <Jim.Quigley@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
regression, that was introduced by
commit 01d4d673558985d9a118e1e05026633c3e2ade9b
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Wed Dec 7 12:55:54 2016 -0800
which originally had the proper list_del_init() usage, but was
dropped during list review as it was thought unnecessary by HCH.
However, list_del_init() usage is required during the special
generate_node_acls = 1 + cache_dynamic_acls = 0 case when
transport_free_session() does a list_del(&se_nacl->acl_list),
followed by target_complete_nacl() doing the same thing.
This was manifesting as a general protection fault as reported
by Justin:
kernel: general protection fault: 0000 [#1] SMP
kernel: Modules linked in:
kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20
kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014
kernel: task: ffff88026939e800 task.stack: ffffc90007884000
kernel: RIP: 0010:target_put_nacl+0x49/0xb0
kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246
kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000
kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028
kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000
kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020
kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540
kernel: FS: 0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel: transport_free_session+0x67/0x140
kernel: transport_deregister_session+0x7a/0xc0
kernel: iscsit_close_session+0x92/0x210
kernel: iscsit_close_connection+0x5f9/0x840
kernel: iscsit_take_action_for_connection_exit+0xfe/0x110
kernel: iscsi_target_tx_thread+0x140/0x1e0
kernel: ? wait_woken+0x90/0x90
kernel: kthread+0x124/0x160
kernel: ? iscsit_thread_get_cpumask+0x90/0x90
kernel: ? kthread_create_on_node+0x40/0x40
kernel: ret_from_fork+0x22/0x30
kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c
89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89
ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20
kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70
kernel: ---[ end trace f12821adbfd46fed ]---
To address this, go ahead and use proper list_del_list() for all
cases of se_nacl->acl_list deletion.
Reported-by: Justin Maggard <jmaggard01@gmail.com>
Tested-by: Justin Maggard <jmaggard01@gmail.com>
Cc: Justin Maggard <jmaggard01@gmail.com>
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|
single nouveau regression fix.
* 'linux-4.13' of git://github.com/skeggsb/linux:
drm/nouveau/disp/nv04: avoid creation of output paths
|