Age | Commit message (Collapse) | Author | Files | Lines |
|
All pieces are in place for ODP (on demand paging) to work using HMM.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
This patch add HMM specific support for hardware page faulting of
user memory region.
Changed since v1:
- Adapt to HMM page table changes.
- Turn some sanity test to BUG_ON().
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
This add the core HMM callback for mlx5 device driver and initialize
the HMM device for the mlx5 infiniband device driver.
Changed since v1:
- Adapt to new hmm_mirror lifetime rules.
- HMM_ISDIRTY no longer exist.
Changed since v2:
- Adapt to HMM page table changes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
|
|
This add new core infiniband structure and helper to implement ODP (on
demand paging) on top of HMM. We need to retain the tree of ib_umem as
some hardware associate unique identifiant with each umem (or mr) and
only allow hardware page table to be updated using this unique id.
Changed since v1:
- Adapt to new hmm_mirror lifetime rules.
- Fix scan of existing mirror in ib_umem_odp_get().
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
|
|
This is a preparatory patch for HMM implementation of ODP (on demand
paging). It introduce a new configure option and add proper build
time conditional code section. Enabling INFINIBAND_ON_DEMAND_PAGING_HMM
will result in build error with this patch.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
The mlx5 driver will need this function for its driver specific bit
of ODP (on demand paging) on HMM (Heterogeneous Memory Management).
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
When using HMM for ODP it will be useful to pass the current mirror
page table iterator for mlx5_ib_update_mtt() function benefit. Add
void parameter for this.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
When using HMM for ODP it will be useful to pass the current mirror
page table iterator for __mlx_ib_populated_pas() function benefit. Add
void parameter for this.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
This is a dummy driver which full fill two purposes :
- showcase the HMM API and gives references on how to use it.
- provide an extensive user space API to stress test HMM.
This is a particularly dangerous module as it allow to access a
mirror of a process address space through its device file. Hence
it should not be enabled by default and only people actively
developing for hmm should use it.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
This add documentation with a high level overview of how HMM works
and a more in depth view of how it should be use by device driver
writers.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
Do the DMA mapping on behalf of the device as HMM is a good place
to perform this common task. Moreover in the future we hope to
add new infrastructure that would make DMA mapping more efficient
(lower overhead per page) by leveraging HMM data structure.
Changed since v1:
- Adapt to HMM page table changes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
Device driver must properly toggle the dirty inside the mirror page table
so dirtyness is properly accounted when core mm code needs to know. Provide
a simple helper to toggle that bit for a range of address.
Changed since v1:
- Adapt to HMM page table changes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
A common use case is for device driver to stop caring for a range of
address long before said range is munmapped by userspace program. To
avoid keeping track of such range provide an helper function that will
free HMM resources for a range of address.
NOTE THAT DEVICE DRIVER MUST MAKE SURE THE HARDWARE WILL NO LONGER
ACCESS THE RANGE BECAUSE CALLING THIS HELPER !
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
Once we store the dma mapping inside the secondary page table we can
no longer easily find back the page backing an address. Instead use
the cpu page table which still has the proper information, except for
the invalidate_page() case which is handled by using the page passed
by the mmu_notifier layer.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
Because inside the mmu_notifier callback we do not have access to the
vma nor do we know which lock we are holding (the mmap semaphore or
the i_mmap_lock) we can not rely on the regular page table walk (nor
do we want as we have to be carefull to not split huge page).
So this patch introduce an helper to iterate of the cpu page table
content in an efficient way for the situation we are in. Which is we
know that none of the page table entry might vanish from below us
and thus it is safe to walk the page table.
The only added value of the iterator is that it keeps the page table
entry level map accross call which fit well with the HMM mirror page
table update code.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
This patch add helper for device page fault. Device page fault helper will
fill the mirror page table using the CPU page table all this synchronized
with any update to CPU page table.
Changed since v1:
- Add comment about directory lock.
Changed since v2:
- Check for mirror->hmm in hmm_mirror_fault()
Changed since v3:
- Adapt to HMM page table changes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
|
|
This patch add the per mirror page table. It also propagate CPU page
table update to this per mirror page table using mmu_notifier callback.
All update are contextualized with an HMM event structure that convey
all information needed by device driver to take proper actions (update
its own mmu to reflect changes and schedule proper flushing).
Core HMM is responsible for updating the per mirror page table once
the device driver is done with its update. Most importantly HMM will
properly propagate HMM page table dirty bit to underlying page.
Changed since v1:
- Removed unused fence code to defer it to latter patches.
Changed since v2:
- Use new bit flag helper for mirror page table manipulation.
- Differentiate fork event with HMM_FORK from other events.
Changed since v3:
- Get rid of HMM_ISDIRTY and rely on write protect instead.
- Adapt to HMM page table changes
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
|
|
Heterogeneous memory management main purpose is to mirror a process
address. To do so it must maintain a secondary page table that is
use by the device driver to program the device or build a device
specific page table.
Radix tree can't be use to create this secondary page table because
HMM needs more flags than RADIX_TREE_MAX_TAGS (while this can be
increase we believe HMM will require so much flags that cost will
becomes prohibitive to others users of radix tree).
Moreover radix tree is built around long but for HMM we need to
store dma address and on some platform sizeof(dma_addr_t) is bigger
than sizeof(long). Thus radix tree is unsuitable to fulfill HMM
requirement hence why we introduce this code which allows to create
page table that can grow and shrink dynamicly.
The design is very close to CPU page table as it reuse some of the
feature such as spinlock embedded in struct page.
Changed since v1:
- Use PAGE_SHIFT as shift value to reserve low bit for private
device specific flags. This is to allow device driver to use
and some of the lower bits for their own device specific purpose.
- Add set of helper for atomically clear, setting and testing bit
on dma_addr_t pointer. Atomicity being useful only for dirty bit.
- Differentiate btw DMA mapped entry and non mapped entry (pfn).
- Split page directory entry and page table entry helpers.
Changed since v2:
- Rename hmm_pt_iter_update() -> hmm_pt_iter_lookup().
- Rename hmm_pt_iter_fault() -> hmm_pt_iter_populate().
- Add hmm_pt_iter_walk()
- Remove hmm_pt_iter_next() (useless now).
- Code simplification and improved comments.
- Fix hmm_pt_fini_directory().
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
|
|
This patch only introduce core HMM functions for registering a new
mirror and stopping a mirror as well as HMM device registering and
unregistering.
The lifecycle of HMM object is handled differently then the one of
mmu_notifier because unlike mmu_notifier there can be concurrent
call from both mm code to HMM code and/or from device driver code
to HMM code. Moreover lifetime of HMM can be uncorrelated from the
lifetime of the process that is being mirror (GPU might take longer
time to cleanup).
Changed since v1:
- Updated comment of hmm_device_register().
Changed since v2:
- Expose struct hmm for easy access to mm struct.
- Simplify hmm_mirror_register() arguments.
- Removed the device name.
- Refcount the mirror struct internaly to HMM allowing to get
rid of the srcu and making the device driver callback error
handling simpler.
- Safe to call several time hmm_mirror_unregister().
- Rework the mmu_notifier unregistration and release callback.
Changed since v3:
- Rework hmm_mirror lifetime rules.
- Synchronize with mmu_notifier srcu before droping mirror last
reference in hmm_mirror_unregister()
- Use spinlock for device's mirror list.
- Export mirror ref/unref functions.
- English syntax fixes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
|
|
This patch allow to invalidate a range while excluding call to a specific
mmu_notifier which allow for a subsystem to invalidate a range for everyone
but itself.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
Listener of mm event might not have easy way to get the struct page
behind an address invalidated with mmu_notifier_invalidate_page()
function as this happens after the cpu page table have been clear/
updated. This happens for instance if the listener is storing a dma
mapping inside its secondary page table. To avoid complex reverse
dma mapping lookup just pass along a pointer to the page being
invalidated.
Changed since v1:
- English syntax fixes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
|
|
The invalidate_range_start() and invalidate_range_end() can be
considered as forming an "atomic" section for the cpu page table
update point of view. Between this two function the cpu page
table content is unreliable for the address range being
invalidated.
This patch use a structure define at all place doing range
invalidation. This structure is added to a list for the duration
of the update ie added with invalid_range_start() and removed
with invalidate_range_end().
Helpers allow querying if a range is valid and wait for it if
necessary.
For proper synchronization, user must block any new range
invalidation from inside there invalidate_range_start() callback.
Otherwise there is no garanty that a new range invalidation will
not be added after the call to the helper function to query for
existing range.
Changed since v1:
- Fix a possible deadlock in mmu_notifier_range_wait_valid()
Changed since v2:
- Add the range to invalid range list before calling ->range_start().
- Del the range from invalid range list after calling ->range_end().
- Remove useless list initialization.
Changed since v3:
- Improved commit message.
- Added comment to explain how helpers function are suppose to be use.
- English syntax fixes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
|
|
The event information will be useful for new user of mmu_notifier API.
The event argument differentiate between a vma disappearing, a page
being write protected or simply a page being unmaped. This allow new
user to take different path for different event for instance on unmap
the resource used to track a vma are still valid and should stay around.
While if the event is saying that a vma is being destroy it means that any
resources used to track this vma can be free.
Changed since v1:
- renamed action into event (updated commit message too).
- simplified the event names and clarified their usage
also documenting what exceptation the listener can have in
respect to each event.
Changed since v2:
- Avoid crazy name.
- Do not move code that do not need to move.
Changed since v3:
- Separate huge page split from mlock/munlock and softdirty.
Changed since v4:
- Rebase (no other changes).
Changed since v5:
- Typo fix.
- Changed zap_page_range from MMU_MUNMAP to MMU_MIGRATE to reflect the
fact that the address range is still valid just the page backing it
are no longer.
Changed since v6:
- try_to_unmap_one() only invalidate when doing migration.
- Differentiate fork from other case.
Changed since v7:
- Renamed MMU_HUGE_PAGE_SPLIT to MMU_HUGE_PAGE_SPLIT.
- Renamed MMU_ISDIRTY to MMU_CLEAR_SOFT_DIRTY.
- Renamed MMU_WRITE_PROTECT to MMU_KSM_WRITE_PROTECT.
- English syntax fixes.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These fix two bugs in the cpufreq core (including one recent
regression), fix a 4.0 PCI regression related to the ACPI resources
management and quieten an RCU-related lockdep complaint about a
tracepoint in the suspend-to-idle code.
Specifics:
- Fix a recently introduced issue in the cpufreq policy object
reinitialization that leads to CPU offline/online breakage (Viresh
Kumar)
- Make it possible to access frequency tables of offline CPUs which
is needed by thermal management code among other things (Viresh
Kumar)
- Fix an ACPI resource management regression introduced during the
4.0 cycle that may cause incorrect resource validation results to
appear in 32-bit x86 kernels due to silent truncation of 64-bit
values to 32-bit (Jiang Liu)
- Fix up an RCU-related lockdep complaint about suspicious RCU usage
in idle caused by using a suspend tracepoint in the core suspend-
to-idle code (Rafael J Wysocki)"
* tag 'pm+acpi-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
cpufreq: Allow freq_table to be obtained for offline CPUs
cpufreq: Initialize the governor again while restoring policy
suspend-to-idle: Prevent RCU from complaining about tick_freeze()
|
|
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
"Fix SMBIOS call handling and hwswitch state coherency in the
dell-laptop driver. Cleanups for intel_*_ipc drivers. Details:
dell-laptop:
- Do not cache hwswitch state
- Check return value of each SMBIOS call
- Clear buffer before each SMBIOS call
intel_scu_ipc:
- Move local memory initialization out of a mutex
intel_pmc_ipc:
- Update kerneldoc formatting
- Fix compiler casting warnings"
* tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
intel_scu_ipc: move local memory initialization out of a mutex
intel_pmc_ipc: Update kerneldoc formatting
dell-laptop: Do not cache hwswitch state
dell-laptop: Check return value of each SMBIOS call
dell-laptop: Clear buffer before each SMBIOS call
intel_pmc_ipc: Fix compiler casting warnings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu/coldfire fixes from Greg Ungerer:
"Contains build fixes and updates for the ColdFire defconfigs.
Specifically there is a couple of fixes that address problems building
allnoconfig. Also fix for enabling PCI bus on the M54xx family of
ColdFire"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: enable PCI support for m5475evb defconfig
m68k: fix io functions for ColdFire/MMU/PCI case
m68knommu: update defconfig for ColdFire m5475evb
m68knommu: update defconfig for ColdFire m5407c3
m68knommu: update defconfig for ColdFire m5307c3
m68knommu: update defconfig for ColdFire m5275evb
m68knommu: update defconfig for ColdFire m5272c3
m68knommu: update defconfig for ColdFire m5249evb
m68knommu: update defconfig for m5208evb
m68knommu: make ColdFire SoC selection a choice
m68knommu: improve the clock configuration defaults
m68knommu: force setting of CONFIG_CLOCK_FREQ for ColdFire
|
|
Pull block fixes from Jens Axboe:
"A collection of fixes from the last few weeks that should go into the
current series. This contains:
- Various fixes for the per-blkcg policy data, fixing regressions
since 4.1. From Arianna and Tejun
- Code cleanup for bcache closure macros from me. Really just
flushing this out, it's been sitting in another branch for months
- FIELD_SIZEOF cleanup from Maninder Singh
- bio integrity oops fix from Mike
- Timeout regression fix for blk-mq from Ming Lei"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: set default timeout as 30 seconds
NVMe: Reread partitions on metadata formats
bcache: don't embed 'return' statements in closure macros
blkcg: fix blkcg_policy_data allocation bug
blkcg: implement all_blkcgs list
blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[]
blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods
block/blk-cgroup.c: free per-blkcg data when freeing the blkcg
block: use FIELD_SIZEOF to calculate size of a field
bio integrity: do not assume bio_integrity_pool exists if bioset exists
|
|
Pull jfs fixes from David Kleikamp:
"A couple trivial fixes and an error path fix"
* tag 'jfs-4.2' of git://github.com/kleikamp/linux-shaggy:
jfs: clean up jfs_rename and fix out of order unlock
jfs: fix indentation on if statement
jfs: removed a prohibited space after opening parenthesis
|
|
* pm-cpuidle:
suspend-to-idle: Prevent RCU from complaining about tick_freeze()
* pm-cpufreq:
cpufreq: Allow freq_table to be obtained for offline CPUs
cpufreq: Initialize the governor again while restoring policy
* acpi-resources:
ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
|
|
It is reasonable to set default timeout of request as 30 seconds instead of
30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64
based systems may choose 100 HZ.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Fixes: c76cbbcf4044 ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()"
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull TPM bugfixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
tpm, tpm_crb: fail when TPM2 ACPI table contents look corrupted
tpm: Fix initialization of the cdev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Mainly fix-ups for the various 4.2 items"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits)
IB/core: Destroy ocrdma_dev_id IDR on module exit
IB/core: Destroy multcast_idr on module exit
IB/mlx4: Optimize do_slave_init
IB/mlx4: Fix memory leak in do_slave_init
IB/mlx4: Optimize freeing of items on error unwind
IB/mlx4: Fix use of flow-counters for process_mad
IB/ipath: Convert use of __constant_<foo> to <foo>
IB/ipoib: Set MTU to max allowed by mode when mode changes
IB/ipoib: Scatter-Gather support in connected mode
IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES
IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush
IB/ucma: Fix lockdep warning in ucma_lock_files
rds: rds_ib_device.refcount overflow
RDMA/nes: Fix for incorrect recording of the MAC address
RDMA/nes: Fix for resolving the neigh
RDMA/core: Fixes for port mapper client registration
IB/IPoIB: Fix bad error flow in ipoib_add_port()
IB/mlx4: Do not attemp to report HCA clock offset on VFs
IB/cm: Do not queue work to a device that's going away
IB/srp: Avoid using uninitialized variable
...
|
|
This patch has the driver automatically reread partitions if a namespace
has a separate metadata format. Previously revalidating a disk was
sufficient to get the correct capacity set on such formatted drives,
but partitions that may exist would not have been surfaced.
Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Paul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Pull file locking updates from Jeff Layton:
"I had thought that I was going to get away without a pull request this
cycle. There was a NFSv4 file locking problem that cropped up that I
tried to fix in the NFSv4 code alone, but that fix has turned out to
be problematic. These patches fix this in the correct way.
Note that this touches some NFSv4 code as well. Ordinarily I'd wait
for Trond to ACK this, but he's on holiday right now and the bug is
rather nasty. So I suggest we merge this and if he raises issues with
it we can sort it out when he gets back"
Acked-by: Bruce Fields <bfields@fieldses.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
[ +1 to this series fixing a 100% reproducible slab corruption +
general protection fault in my nfs-root test environment. - Dan ]
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux:
locks: inline posix_lock_file_wait and flock_lock_file_wait
nfs4: have do_vfs_lock take an inode pointer
locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait
locks: have flock_lock_file take an inode pointer instead of a filp
Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"
|
|
Pull KVM fixes from Paolo Bonzini:
- Fix FPU refactoring ("kvm: x86: fix load xsave feature warning")
- Fix eager FPU mode (Cc stable)
- AMD bits of MTRR virtualization
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: fix load xsave feature warning
KVM: x86: apply guest MTRR virtualization on host reserved pages
KVM: SVM: Sync g_pat with guest-written PAT value
KVM: SVM: use NPT page attributes
KVM: count number of assigned devices
KVM: VMX: fix vmwrite to invalid VMCS
KVM: x86: reintroduce kvm_is_mmio_pfn
x86: hyperv: add CPUID bit for crash handlers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Makefile changes (top-level+ARC) reinstates -O3 builds (regression
since 3.16)
- IDU intc related fixes, IRQ affinity
- patch to make bitops safer for ARC
- perf fix from Alexey to remove signed PC braino
- Futex backend gets llock/scond support
* tag 'arc-v4.2-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARCv2: support HS38 releases
ARC: make sure instruction_pointer() returns unsigned value
ARC: slightly refactor macros for boot logging
ARC: Add llock/scond to futex backend
arc:irqchip: prepare for drivers/irqchip/irqchip.h removal
ARC: Make ARC bitops "safer" (add anti-optimization)
ARCv2: [axs103] bump CPU frequency from 75 to 90 MHZ
ARCv2: intc: IDU: Fix potential race in installing a chained IRQ handler
ARCv2: intc: IDU: support irq affinity
ARC: fix unused var wanring
ARC: Don't memzero twice in dma_alloc_coherent for __GFP_ZERO
ARC: Override toplevel default -O2 with -O3
kbuild: Allow arch Makefiles to override {cpp,ld,c}flags
ARCv2: guard SLC DMA ops with spinlock
ARC: Kconfig: better way to disable ARC_HAS_LLSC for ARC_CPU_750D
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"One improvement for the zcrypt driver, the quality attribute for the
hwrng device has been missing. Without it the kernel entropy seeding
will not happen automatically.
And six bug fixes, the most important one is the fix for the vector
register corruption due to machine checks"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/nmi: fix vector register corruption
s390/process: fix sfpc inline assembly
s390/dasd: fix kernel panic when alias is set offline
s390/sclp: clear upper register halves in _sclp_print_early
s390/oprofile: fix compile error
s390/sclp: fix compile error
s390/zcrypt: enable s390 hwrng to seed kernel entropy
|
|
The end of jfs_rename(), which is also used by the error paths,
included a call to IWRITE_UNLOCK(new_ip) after labels out1, out2
and out3. If we come in through these labels, IWRITE_LOCK() has not
been called yet.
In moving that call to the correct spot, I also moved some
exceptional truncate code earlier as well, since the early error
paths don't need to deal with it, and I renamed out4: to out_tx: so
a future patch by Jan Kara doesn't need to deal with renumbering or
confusing out-of-order labels.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull final init.h/module.h code relocation from Paul Gortmaker:
"With the release of 4.2-rc2 done, we should not be seeing any new code
added that gets upset by this small code move, and we've banked yet
another complete week of testing with this move in place on top of
4.2-rc1 via linux-next to ensure that remained true.
Given that, I'd like to put it in now so that people formulating new
work for 4.3-rc1 will be exposed to the ever so slightly stricter (but
sensible) requirements wrt. whether they are needing init.h vs.
module.h macros, even if they are not using linux-next.
The diffstat of the move is slightly asymmetrical due to needing to
leave behind a couple #ifdef in the old location and add the same ones
to the new location, but other than that, it is a 1:1 move, complete
with the module_init/exit trailing semicolon that we can't fix. That
is, until/unless someone does a tree-wide sed fix of all the
approximately 800 currently in tree users relying on it"
* tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
module: relocate module_init from init.h to module.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fengguang Wu discovered a crash that happened to be because of the
branch tracer (traces unlikely and likely branches) when enabled with
certain debug options.
What happened was that various debug options like lockdep and
DEBUG_PREEMPT can cause parts of the branch tracer to recurse outside
its recursion protection. In fact, part of its recursion protection
used these features that caused the lockup. This cleans up the code a
little and makes the recursion protection a bit more robust"
* tag 'trace-v4.2-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have branch tracer use recursive field of task struct
|
|
https://github.com/PeterHuewe/linux-tpmdd into for-linus
|
|
'{ }' and memset will both reset the cbuf buffer.
Only once is enough and this can be done outside fo the mutex.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
|
Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory.
This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@
module_init(init);
@ defines_module_exit @
identifier exit;
@@
module_exit(exit);
@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@
DEFINE_IDR(idr);
@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
idr_destroy(&idr);
...
}
@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
+idr_destroy(&idr);
}
</SmPL>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Destroy multcast_idr on module exit, reclaiming the allocated memory.
This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@
module_init(init);
@ defines_module_exit @
identifier exit;
@@
module_exit(exit);
@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@
DEFINE_IDR(idr);
@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
idr_destroy(&idr);
...
}
@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
+idr_destroy(&idr);
}
</SmPL>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
There is little chance our memory allocation will fail, so we can
combine initializing the work structs with allocating them instead of
looping through all of them once to allocate and again to initialize.
Then when we need to actually find out if our device is up or in the
process of going down, have all of our work structs batched up, take the
spin_lock once and only once, and do all of the batch under the one
spin_lock invocation instead of incurring all of the locked memory cycles
we would otherwise incur to take/release the spin_lock over and over
again.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
We create a number of work structs to be queued up to a workqueue, and
on completion of the workqueue handler, the workqueue handler frees the
allocated memory. If, however, we don't queue the work struct because
the device is going down, then we need to free the memory ourselves.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
On failure, we loop through all possible pointers and test them before
calling kfree. But really, why even attempt to free items we didn't
allocate when we can easily loop through exactly and only the devices
for which the original memory allocation succeeded and free just those.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
For IB links, reading HCA flow counters through iboe_process_mad() should
be used when mlx4_ib_process_mad() is invoked only for VFs PMA queries and
exactly nothing else.
Fixes: 7193a141eb74 ('IB/mlx4: Set VF to read from QP counters')
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In little endian cases, the macros be16_to_cpu and cpu_to_be64
unfolds to __swab{16,64} which provides special case for constants.
In big endian cases, __constant_be16_to_cpu and be16_to_cpu
expand directly to the same expression. The same applies for
__constant_cpu_to_be64 and cpu_to_be64.
So, replace __constant_be16_to_cpu with be16_to_cpu and
__constant_cpu_to_be64 with cpu_to_be64, with the goal of getting
rid of the definition of __constant_be16_to_cpu and
__constant_cpu_to_be64 completely.
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When switching between modes (datagram / connected) change the MTU
accordingly.
datagram mode up to 4K, connected mode up to (64K - 0x10).
Signed-off-by: ELi Cohen <eli@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|