summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-07-17IB/mlx5/hmm: enable ODP using HMM.hmm-v9Jérôme Glisse2-5/+4
All pieces are in place for ODP (on demand paging) to work using HMM. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17IB/mlx5/hmm: add page fault support for ODP on HMM v2.Jérôme Glisse1-1/+143
This patch add HMM specific support for hardware page faulting of user memory region. Changed since v1: - Adapt to HMM page table changes. - Turn some sanity test to BUG_ON(). Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17IB/mlx5/hmm: add mlx5 HMM device initialization and callback v3.Jérôme Glisse7-10/+269
This add the core HMM callback for mlx5 device driver and initialize the HMM device for the mlx5 infiniband device driver. Changed since v1: - Adapt to new hmm_mirror lifetime rules. - HMM_ISDIRTY no longer exist. Changed since v2: - Adapt to HMM page table changes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
2015-07-17IB/odp/hmm: add core infiniband structure and helper for ODP with HMM v2.Jérôme Glisse5-6/+201
This add new core infiniband structure and helper to implement ODP (on demand paging) on top of HMM. We need to retain the tree of ib_umem as some hardware associate unique identifiant with each umem (or mr) and only allow hardware page table to be updated using this unique id. Changed since v1: - Adapt to new hmm_mirror lifetime rules. - Fix scan of existing mirror in ib_umem_odp_get(). Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com>
2015-07-17IB/odp/hmm: add new kernel option to use HMM for ODP.Jérôme Glisse12-94/+157
This is a preparatory patch for HMM implementation of ODP (on demand paging). It introduce a new configure option and add proper build time conditional code section. Enabling INFINIBAND_ON_DEMAND_PAGING_HMM will result in build error with this patch. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17IB/odp: export rbt_ib_umem_for_each_in_range()Jérôme Glisse1-0/+1
The mlx5 driver will need this function for its driver specific bit of ODP (on demand paging) on HMM (Heterogeneous Memory Management). Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17IB/mlx5: add a new parameter to mlx5_ib_update_mtt() for ODP with HMM.Jérôme Glisse3-6/+8
When using HMM for ODP it will be useful to pass the current mirror page table iterator for mlx5_ib_update_mtt() function benefit. Add void parameter for this. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17IB/mlx5: add a new parameter to __mlx_ib_populated_pas for ODP with HMM.Jérôme Glisse3-5/+7
When using HMM for ODP it will be useful to pass the current mirror page table iterator for __mlx_ib_populated_pas() function benefit. Add void parameter for this. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17hmm/dummy: dummy driver for testing and showcasing the HMM APIJérôme Glisse4-0/+986
This is a dummy driver which full fill two purposes : - showcase the HMM API and gives references on how to use it. - provide an extensive user space API to stress test HMM. This is a particularly dangerous module as it allow to access a mirror of a process address space through its device file. Hence it should not be enabled by default and only people actively developing for hmm should use it. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17HMM: add documentation explaining HMM internals and how to use it.Jérôme Glisse1-0/+219
This add documentation with a high level overview of how HMM works and a more in depth view of how it should be use by device driver writers. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17HMM: DMA map memory on behalf of device driver v2.Jérôme Glisse2-38/+173
Do the DMA mapping on behalf of the device as HMM is a good place to perform this common task. Moreover in the future we hope to add new infrastructure that would make DMA mapping more efficient (lower overhead per page) by leveraging HMM data structure. Changed since v1: - Adapt to HMM page table changes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17HMM: add dirty range helper (toggle dirty bit inside mirror page table) v2.Jérôme Glisse2-0/+41
Device driver must properly toggle the dirty inside the mirror page table so dirtyness is properly accounted when core mm code needs to know. Provide a simple helper to toggle that bit for a range of address. Changed since v1: - Adapt to HMM page table changes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17HMM: add discard range helper (to clear and free resources for a range).Jérôme Glisse2-0/+27
A common use case is for device driver to stop caring for a range of address long before said range is munmapped by userspace program. To avoid keeping track of such range provide an helper function that will free HMM resources for a range of address. NOTE THAT DEVICE DRIVER MUST MAKE SURE THE HARDWARE WILL NO LONGER ACCESS THE RANGE BECAUSE CALLING THIS HELPER ! Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17HMM: use CPU page table during invalidation.Jerome Glisse1-18/+35
Once we store the dma mapping inside the secondary page table we can no longer easily find back the page backing an address. Instead use the cpu page table which still has the proper information, except for the invalidate_page() case which is handled by using the page passed by the mmu_notifier layer. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17HMM: add mm page table iterator helpers.Jérôme Glisse1-0/+95
Because inside the mmu_notifier callback we do not have access to the vma nor do we know which lock we are holding (the mmap semaphore or the i_mmap_lock) we can not rely on the regular page table walk (nor do we want as we have to be carefull to not split huge page). So this patch introduce an helper to iterate of the cpu page table content in an efficient way for the situation we are in. Which is we know that none of the page table entry might vanish from below us and thus it is safe to walk the page table. The only added value of the iterator is that it keeps the page table entry level map accross call which fit well with the HMM mirror page table update code. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17HMM: add device page fault support v4.Jérôme Glisse2-1/+384
This patch add helper for device page fault. Device page fault helper will fill the mirror page table using the CPU page table all this synchronized with any update to CPU page table. Changed since v1: - Add comment about directory lock. Changed since v2: - Check for mirror->hmm in hmm_mirror_fault() Changed since v3: - Adapt to HMM page table changes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Sherry Cheung <SCheung@nvidia.com> Signed-off-by: Subhash Gutti <sgutti@nvidia.com> Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
2015-07-17HMM: add per mirror page table v4.Jérôme Glisse2-0/+301
This patch add the per mirror page table. It also propagate CPU page table update to this per mirror page table using mmu_notifier callback. All update are contextualized with an HMM event structure that convey all information needed by device driver to take proper actions (update its own mmu to reflect changes and schedule proper flushing). Core HMM is responsible for updating the per mirror page table once the device driver is done with its update. Most importantly HMM will properly propagate HMM page table dirty bit to underlying page. Changed since v1: - Removed unused fence code to defer it to latter patches. Changed since v2: - Use new bit flag helper for mirror page table manipulation. - Differentiate fork event with HMM_FORK from other events. Changed since v3: - Get rid of HMM_ISDIRTY and rely on write protect instead. - Adapt to HMM page table changes Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Sherry Cheung <SCheung@nvidia.com> Signed-off-by: Subhash Gutti <sgutti@nvidia.com> Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
2015-07-17HMM: add HMM page table v3.Jérôme Glisse4-1/+947
Heterogeneous memory management main purpose is to mirror a process address. To do so it must maintain a secondary page table that is use by the device driver to program the device or build a device specific page table. Radix tree can't be use to create this secondary page table because HMM needs more flags than RADIX_TREE_MAX_TAGS (while this can be increase we believe HMM will require so much flags that cost will becomes prohibitive to others users of radix tree). Moreover radix tree is built around long but for HMM we need to store dma address and on some platform sizeof(dma_addr_t) is bigger than sizeof(long). Thus radix tree is unsuitable to fulfill HMM requirement hence why we introduce this code which allows to create page table that can grow and shrink dynamicly. The design is very close to CPU page table as it reuse some of the feature such as spinlock embedded in struct page. Changed since v1: - Use PAGE_SHIFT as shift value to reserve low bit for private device specific flags. This is to allow device driver to use and some of the lower bits for their own device specific purpose. - Add set of helper for atomically clear, setting and testing bit on dma_addr_t pointer. Atomicity being useful only for dirty bit. - Differentiate btw DMA mapped entry and non mapped entry (pfn). - Split page directory entry and page table entry helpers. Changed since v2: - Rename hmm_pt_iter_update() -> hmm_pt_iter_lookup(). - Rename hmm_pt_iter_fault() -> hmm_pt_iter_populate(). - Add hmm_pt_iter_walk() - Remove hmm_pt_iter_next() (useless now). - Code simplification and improved comments. - Fix hmm_pt_fini_directory(). Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Sherry Cheung <SCheung@nvidia.com> Signed-off-by: Subhash Gutti <sgutti@nvidia.com> Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
2015-07-17HMM: introduce heterogeneous memory management v4.Jérôme Glisse8-0/+603
This patch only introduce core HMM functions for registering a new mirror and stopping a mirror as well as HMM device registering and unregistering. The lifecycle of HMM object is handled differently then the one of mmu_notifier because unlike mmu_notifier there can be concurrent call from both mm code to HMM code and/or from device driver code to HMM code. Moreover lifetime of HMM can be uncorrelated from the lifetime of the process that is being mirror (GPU might take longer time to cleanup). Changed since v1: - Updated comment of hmm_device_register(). Changed since v2: - Expose struct hmm for easy access to mm struct. - Simplify hmm_mirror_register() arguments. - Removed the device name. - Refcount the mirror struct internaly to HMM allowing to get rid of the srcu and making the device driver callback error handling simpler. - Safe to call several time hmm_mirror_unregister(). - Rework the mmu_notifier unregistration and release callback. Changed since v3: - Rework hmm_mirror lifetime rules. - Synchronize with mmu_notifier srcu before droping mirror last reference in hmm_mirror_unregister() - Use spinlock for device's mirror list. - Export mirror ref/unref functions. - English syntax fixes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Sherry Cheung <SCheung@nvidia.com> Signed-off-by: Subhash Gutti <sgutti@nvidia.com> Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
2015-07-17mmu_notifier: allow range invalidation to exclude a specific mmu_notifierJérôme Glisse2-9/+73
This patch allow to invalidate a range while excluding call to a specific mmu_notifier which allow for a subsystem to invalidate a range for everyone but itself. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() v2Jérôme Glisse8-4/+14
Listener of mm event might not have easy way to get the struct page behind an address invalidated with mmu_notifier_invalidate_page() function as this happens after the cpu page table have been clear/ updated. This happens for instance if the listener is storing a dma mapping inside its secondary page table. To avoid complex reverse dma mapping lookup just pass along a pointer to the page being invalidated. Changed since v1: - English syntax fixes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
2015-07-17mmu_notifier: keep track of active invalidation ranges v4Jérôme Glisse18-259/+350
The invalidate_range_start() and invalidate_range_end() can be considered as forming an "atomic" section for the cpu page table update point of view. Between this two function the cpu page table content is unreliable for the address range being invalidated. This patch use a structure define at all place doing range invalidation. This structure is added to a list for the duration of the update ie added with invalid_range_start() and removed with invalidate_range_end(). Helpers allow querying if a range is valid and wait for it if necessary. For proper synchronization, user must block any new range invalidation from inside there invalidate_range_start() callback. Otherwise there is no garanty that a new range invalidation will not be added after the call to the helper function to query for existing range. Changed since v1: - Fix a possible deadlock in mmu_notifier_range_wait_valid() Changed since v2: - Add the range to invalid range list before calling ->range_start(). - Del the range from invalid range list after calling ->range_end(). - Remove useless list initialization. Changed since v3: - Improved commit message. - Added comment to explain how helpers function are suppose to be use. - English syntax fixes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com>
2015-07-17mmu_notifier: add event information to address invalidation v8Jérôme Glisse20-101/+258
The event information will be useful for new user of mmu_notifier API. The event argument differentiate between a vma disappearing, a page being write protected or simply a page being unmaped. This allow new user to take different path for different event for instance on unmap the resource used to track a vma are still valid and should stay around. While if the event is saying that a vma is being destroy it means that any resources used to track this vma can be free. Changed since v1: - renamed action into event (updated commit message too). - simplified the event names and clarified their usage also documenting what exceptation the listener can have in respect to each event. Changed since v2: - Avoid crazy name. - Do not move code that do not need to move. Changed since v3: - Separate huge page split from mlock/munlock and softdirty. Changed since v4: - Rebase (no other changes). Changed since v5: - Typo fix. - Changed zap_page_range from MMU_MUNMAP to MMU_MIGRATE to reflect the fact that the address range is still valid just the page backing it are no longer. Changed since v6: - try_to_unmap_one() only invalidate when doing migration. - Differentiate fork from other case. Changed since v7: - Renamed MMU_HUGE_PAGE_SPLIT to MMU_HUGE_PAGE_SPLIT. - Renamed MMU_ISDIRTY to MMU_CLEAR_SOFT_DIRTY. - Renamed MMU_WRITE_PROTECT to MMU_KSM_WRITE_PROTECT. - English syntax fixes. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com>
2015-07-16Merge tag 'pm+acpi-4.2-rc3' of ↵Linus Torvalds4-20/+32
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These fix two bugs in the cpufreq core (including one recent regression), fix a 4.0 PCI regression related to the ACPI resources management and quieten an RCU-related lockdep complaint about a tracepoint in the suspend-to-idle code. Specifics: - Fix a recently introduced issue in the cpufreq policy object reinitialization that leads to CPU offline/online breakage (Viresh Kumar) - Make it possible to access frequency tables of offline CPUs which is needed by thermal management code among other things (Viresh Kumar) - Fix an ACPI resource management regression introduced during the 4.0 cycle that may cause incorrect resource validation results to appear in 32-bit x86 kernels due to silent truncation of 64-bit values to 32-bit (Jiang Liu) - Fix up an RCU-related lockdep complaint about suspicious RCU usage in idle caused by using a suspend tracepoint in the core suspend- to-idle code (Rafael J Wysocki)" * tag 'pm+acpi-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel cpufreq: Allow freq_table to be obtained for offline CPUs cpufreq: Initialize the governor again while restoring policy suspend-to-idle: Prevent RCU from complaining about tick_freeze()
2015-07-16Merge tag 'platform-drivers-x86-v4.2-3' of ↵Linus Torvalds4-111/+176
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Fix SMBIOS call handling and hwswitch state coherency in the dell-laptop driver. Cleanups for intel_*_ipc drivers. Details: dell-laptop: - Do not cache hwswitch state - Check return value of each SMBIOS call - Clear buffer before each SMBIOS call intel_scu_ipc: - Move local memory initialization out of a mutex intel_pmc_ipc: - Update kerneldoc formatting - Fix compiler casting warnings" * tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_scu_ipc: move local memory initialization out of a mutex intel_pmc_ipc: Update kerneldoc formatting dell-laptop: Do not cache hwswitch state dell-laptop: Check return value of each SMBIOS call dell-laptop: Clear buffer before each SMBIOS call intel_pmc_ipc: Fix compiler casting warnings
2015-07-16Merge branch 'for-next' of ↵Linus Torvalds10-128/+45
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu/coldfire fixes from Greg Ungerer: "Contains build fixes and updates for the ColdFire defconfigs. Specifically there is a couple of fixes that address problems building allnoconfig. Also fix for enabling PCI bus on the M54xx family of ColdFire" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: enable PCI support for m5475evb defconfig m68k: fix io functions for ColdFire/MMU/PCI case m68knommu: update defconfig for ColdFire m5475evb m68knommu: update defconfig for ColdFire m5407c3 m68knommu: update defconfig for ColdFire m5307c3 m68knommu: update defconfig for ColdFire m5275evb m68knommu: update defconfig for ColdFire m5272c3 m68knommu: update defconfig for ColdFire m5249evb m68knommu: update defconfig for m5208evb m68knommu: make ColdFire SoC selection a choice m68knommu: improve the clock configuration defaults m68knommu: force setting of CONFIG_CLOCK_FREQ for ColdFire
2015-07-16Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds10-79/+113
Pull block fixes from Jens Axboe: "A collection of fixes from the last few weeks that should go into the current series. This contains: - Various fixes for the per-blkcg policy data, fixing regressions since 4.1. From Arianna and Tejun - Code cleanup for bcache closure macros from me. Really just flushing this out, it's been sitting in another branch for months - FIELD_SIZEOF cleanup from Maninder Singh - bio integrity oops fix from Mike - Timeout regression fix for blk-mq from Ming Lei" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: set default timeout as 30 seconds NVMe: Reread partitions on metadata formats bcache: don't embed 'return' statements in closure macros blkcg: fix blkcg_policy_data allocation bug blkcg: implement all_blkcgs list blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods block/blk-cgroup.c: free per-blkcg data when freeing the blkcg block: use FIELD_SIZEOF to calculate size of a field bio integrity: do not assume bio_integrity_pool exists if bioset exists
2015-07-16Merge tag 'jfs-4.2' of git://github.com/kleikamp/linux-shaggyLinus Torvalds3-17/+16
Pull jfs fixes from David Kleikamp: "A couple trivial fixes and an error path fix" * tag 'jfs-4.2' of git://github.com/kleikamp/linux-shaggy: jfs: clean up jfs_rename and fix out of order unlock jfs: fix indentation on if statement jfs: removed a prohibited space after opening parenthesis
2015-07-16Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'acpi-resources'Rafael J. Wysocki4-20/+32
* pm-cpuidle: suspend-to-idle: Prevent RCU from complaining about tick_freeze() * pm-cpufreq: cpufreq: Allow freq_table to be obtained for offline CPUs cpufreq: Initialize the governor again while restoring policy * acpi-resources: ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
2015-07-16blk-mq: set default timeout as 30 secondsMing Lei1-1/+1
It is reasonable to set default timeout of request as 30 seconds instead of 30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64 based systems may choose 100 HZ. Signed-off-by: Ming Lei <ming.lei@canonical.com> Fixes: c76cbbcf4044 ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()" Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-15Merge branch 'for-linus' of ↵Linus Torvalds2-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull TPM bugfixes from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: tpm, tpm_crb: fail when TPM2 ACPI table contents look corrupted tpm: Fix initialization of the cdev
2015-07-15Merge tag 'for-linus' of ↵Linus Torvalds36-269/+351
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Mainly fix-ups for the various 4.2 items" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits) IB/core: Destroy ocrdma_dev_id IDR on module exit IB/core: Destroy multcast_idr on module exit IB/mlx4: Optimize do_slave_init IB/mlx4: Fix memory leak in do_slave_init IB/mlx4: Optimize freeing of items on error unwind IB/mlx4: Fix use of flow-counters for process_mad IB/ipath: Convert use of __constant_<foo> to <foo> IB/ipoib: Set MTU to max allowed by mode when mode changes IB/ipoib: Scatter-Gather support in connected mode IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush IB/ucma: Fix lockdep warning in ucma_lock_files rds: rds_ib_device.refcount overflow RDMA/nes: Fix for incorrect recording of the MAC address RDMA/nes: Fix for resolving the neigh RDMA/core: Fixes for port mapper client registration IB/IPoIB: Fix bad error flow in ipoib_add_port() IB/mlx4: Do not attemp to report HCA clock offset on VFs IB/cm: Do not queue work to a device that's going away IB/srp: Avoid using uninitialized variable ...
2015-07-15NVMe: Reread partitions on metadata formatsKeith Busch1-2/+11
This patch has the driver automatically reread partitions if a namespace has a separate metadata format. Previously revalidating a disk was sufficient to get the correct capacity set on such formatted drives, but partitions that may exist would not have been surfaced. Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Tested-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-15Merge tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linuxLinus Torvalds3-40/+46
Pull file locking updates from Jeff Layton: "I had thought that I was going to get away without a pull request this cycle. There was a NFSv4 file locking problem that cropped up that I tried to fix in the NFSv4 code alone, but that fix has turned out to be problematic. These patches fix this in the correct way. Note that this touches some NFSv4 code as well. Ordinarily I'd wait for Trond to ACK this, but he's on holiday right now and the bug is rather nasty. So I suggest we merge this and if he raises issues with it we can sort it out when he gets back" Acked-by: Bruce Fields <bfields@fieldses.org> Acked-by: Dan Williams <dan.j.williams@intel.com> [ +1 to this series fixing a 100% reproducible slab corruption + general protection fault in my nfs-root test environment. - Dan ] Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux: locks: inline posix_lock_file_wait and flock_lock_file_wait nfs4: have do_vfs_lock take an inode pointer locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait locks: have flock_lock_file take an inode pointer instead of a filp Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"
2015-07-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds10-21/+165
Pull KVM fixes from Paolo Bonzini: - Fix FPU refactoring ("kvm: x86: fix load xsave feature warning") - Fix eager FPU mode (Cc stable) - AMD bits of MTRR virtualization * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: fix load xsave feature warning KVM: x86: apply guest MTRR virtualization on host reserved pages KVM: SVM: Sync g_pat with guest-written PAT value KVM: SVM: use NPT page attributes KVM: count number of assigned devices KVM: VMX: fix vmwrite to invalid VMCS KVM: x86: reintroduce kvm_is_mmio_pfn x86: hyperv: add CPUID bit for crash handlers
2015-07-15Merge tag 'arc-v4.2-rc3-fixes' of ↵Linus Torvalds16-57/+112
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Makefile changes (top-level+ARC) reinstates -O3 builds (regression since 3.16) - IDU intc related fixes, IRQ affinity - patch to make bitops safer for ARC - perf fix from Alexey to remove signed PC braino - Futex backend gets llock/scond support * tag 'arc-v4.2-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARCv2: support HS38 releases ARC: make sure instruction_pointer() returns unsigned value ARC: slightly refactor macros for boot logging ARC: Add llock/scond to futex backend arc:irqchip: prepare for drivers/irqchip/irqchip.h removal ARC: Make ARC bitops "safer" (add anti-optimization) ARCv2: [axs103] bump CPU frequency from 75 to 90 MHZ ARCv2: intc: IDU: Fix potential race in installing a chained IRQ handler ARCv2: intc: IDU: support irq affinity ARC: fix unused var wanring ARC: Don't memzero twice in dma_alloc_coherent for __GFP_ZERO ARC: Override toplevel default -O2 with -O3 kbuild: Allow arch Makefiles to override {cpp,ld,c}flags ARCv2: guard SLC DMA ops with spinlock ARC: Kconfig: better way to disable ARC_HAS_LLSC for ARC_CPU_750D
2015-07-15Merge branch 'for-linus' of ↵Linus Torvalds10-31/+87
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "One improvement for the zcrypt driver, the quality attribute for the hwrng device has been missing. Without it the kernel entropy seeding will not happen automatically. And six bug fixes, the most important one is the fix for the vector register corruption due to machine checks" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/nmi: fix vector register corruption s390/process: fix sfpc inline assembly s390/dasd: fix kernel panic when alias is set offline s390/sclp: clear upper register halves in _sclp_print_early s390/oprofile: fix compile error s390/sclp: fix compile error s390/zcrypt: enable s390 hwrng to seed kernel entropy
2015-07-15jfs: clean up jfs_rename and fix out of order unlockDave Kleikamp1-14/+13
The end of jfs_rename(), which is also used by the error paths, included a call to IWRITE_UNLOCK(new_ip) after labels out1, out2 and out3. If we come in through these labels, IWRITE_LOCK() has not been called yet. In moving that call to the correct spot, I also moved some exceptional truncate code earlier as well, since the early error paths don't need to deal with it, and I renamed out4: to out_tx: so a future patch by Jan Kara doesn't need to deal with renumbering or confusing out-of-order labels. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2015-07-15Merge tag 'module-final-v4.2-rc1' of ↵Linus Torvalds2-78/+84
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull final init.h/module.h code relocation from Paul Gortmaker: "With the release of 4.2-rc2 done, we should not be seeing any new code added that gets upset by this small code move, and we've banked yet another complete week of testing with this move in place on top of 4.2-rc1 via linux-next to ensure that remained true. Given that, I'd like to put it in now so that people formulating new work for 4.3-rc1 will be exposed to the ever so slightly stricter (but sensible) requirements wrt. whether they are needing init.h vs. module.h macros, even if they are not using linux-next. The diffstat of the move is slightly asymmetrical due to needing to leave behind a couple #ifdef in the old location and add the same ones to the new location, but other than that, it is a 1:1 move, complete with the module_init/exit trailing semicolon that we can't fix. That is, until/unless someone does a tree-wide sed fix of all the approximately 800 currently in tree users relying on it" * tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: module: relocate module_init from init.h to module.h
2015-07-15Merge tag 'trace-v4.2-rc1-fix' of ↵Linus Torvalds2-7/+11
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fengguang Wu discovered a crash that happened to be because of the branch tracer (traces unlikely and likely branches) when enabled with certain debug options. What happened was that various debug options like lockdep and DEBUG_PREEMPT can cause parts of the branch tracer to recurse outside its recursion protection. In fact, part of its recursion protection used these features that caused the lockup. This cleans up the code a little and makes the recursion protection a bit more robust" * tag 'trace-v4.2-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have branch tracer use recursive field of task struct
2015-07-15Merge tag 'tpm-fixes-for-4.2-rc2' of ↵James Morris2-1/+10
https://github.com/PeterHuewe/linux-tpmdd into for-linus
2015-07-14intel_scu_ipc: move local memory initialization out of a mutexChristophe JAILLET1-3/+3
'{ }' and memset will both reset the cbuf buffer. Only once is enough and this can be done outside fo the mutex. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2015-07-14IB/core: Destroy ocrdma_dev_id IDR on module exitJohannes Thumshirn1-0/+1
Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory. This was detected by the following semantic patch (written by Luis Rodriguez <mcgrof@suse.com>) <SmPL> @ defines_module_init @ declarer name module_init, module_exit; declarer name DEFINE_IDR; identifier init; @@ module_init(init); @ defines_module_exit @ identifier exit; @@ module_exit(exit); @ declares_idr depends on defines_module_init && defines_module_exit @ identifier idr; @@ DEFINE_IDR(idr); @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... idr_destroy(&idr); ... } @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... +idr_destroy(&idr); } </SmPL> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/core: Destroy multcast_idr on module exitJohannes Thumshirn1-0/+1
Destroy multcast_idr on module exit, reclaiming the allocated memory. This was detected by the following semantic patch (written by Luis Rodriguez <mcgrof@suse.com>) <SmPL> @ defines_module_init @ declarer name module_init, module_exit; declarer name DEFINE_IDR; identifier init; @@ module_init(init); @ defines_module_exit @ identifier exit; @@ module_exit(exit); @ declares_idr depends on defines_module_init && defines_module_exit @ identifier idr; @@ DEFINE_IDR(idr); @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... idr_destroy(&idr); ... } @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... +idr_destroy(&idr); } </SmPL> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Optimize do_slave_initDoug Ledford1-7/+9
There is little chance our memory allocation will fail, so we can combine initializing the work structs with allocating them instead of looping through all of them once to allocate and again to initialize. Then when we need to actually find out if our device is up or in the process of going down, have all of our work structs batched up, take the spin_lock once and only once, and do all of the batch under the one spin_lock invocation instead of incurring all of the locked memory cycles we would otherwise incur to take/release the spin_lock over and over again. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Fix memory leak in do_slave_initDoug Ledford1-0/+2
We create a number of work structs to be queued up to a workqueue, and on completion of the workqueue handler, the workqueue handler frees the allocated memory. If, however, we don't queue the work struct because the device is going down, then we need to free the memory ourselves. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Optimize freeing of items on error unwindManinder Singh1-5/+3
On failure, we loop through all possible pointers and test them before calling kfree. But really, why even attempt to free items we didn't allocate when we can easily loop through exactly and only the devices for which the original memory allocation succeeded and free just those. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Fix use of flow-counters for process_madOr Gerlitz1-10/+19
For IB links, reading HCA flow counters through iboe_process_mad() should be used when mlx4_ib_process_mad() is invoked only for VFs PMA queries and exactly nothing else. Fixes: 7193a141eb74 ('IB/mlx4: Set VF to read from QP counters') Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/ipath: Convert use of __constant_<foo> to <foo>Vaishali Thakkar1-2/+2
In little endian cases, the macros be16_to_cpu and cpu_to_be64 unfolds to __swab{16,64} which provides special case for constants. In big endian cases, __constant_be16_to_cpu and be16_to_cpu expand directly to the same expression. The same applies for __constant_cpu_to_be64 and cpu_to_be64. So, replace __constant_be16_to_cpu with be16_to_cpu and __constant_cpu_to_be64 with cpu_to_be64, with the goal of getting rid of the definition of __constant_be16_to_cpu and __constant_cpu_to_be64 completely. Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/ipoib: Set MTU to max allowed by mode when mode changesErez Shitrit1-0/+1
When switching between modes (datagram / connected) change the MTU accordingly. datagram mode up to 4K, connected mode up to (64K - 0x10). Signed-off-by: ELi Cohen <eli@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>