Age | Commit message (Collapse) | Author | Files | Lines |
|
get_device_id() returns an unsigned short device id. It never fails and
it never returns a negative so we can remove this condition.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Preallocate between 4 and 8 apertures when a device gets it
dma_mask. With more apertures we reduce the lock contention
of the domain lock significantly.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
First search for a non-contended aperture with trylock
before spinning.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Make this pointer percpu so that we start searching for new
addresses in the range we last stopped and which is has a
higher probability of being still in the cache.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Remove the long holding times of the domain->lock and rely
on the bitmap_lock instead.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Make sure the aperture range is fully initialized before it
is visible to the address allocator.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This allows to build up the page-tables without holding any
locks. As a consequence it removes the need to pre-populate
dma_ops page-tables.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
It really belongs there and not in __map_single.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Don't flush the iommu tlb when we free something behind the
current next_bit pointer. Update the next_bit pointer
instead and let the flush happen on the next wraparound in
the allocation path.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The flushing of iommu tlbs is now done on a per-range basis.
So there is no need anymore for domain-wide flush tracking.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This way we don't need to care about the next_index wrapping
around in dma_ops_alloc_addresses.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Instead of setting need_flush, do the flush directly in
dma_ops_free_addresses.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
It points to the next aperture index to allocate from. We
don't need the full address anymore because this is now
tracked in struct aperture_range.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Parameter is not needed because the value is part of the
already passed in struct dma_ops_domain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Since the allocator wraparound happens in this function now,
flush the iommu tlb there too.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Instead of skipping to the next aperture, first try again in
the current one.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Moving it before the pte_pages array puts in into the same
cache-line as the spin-lock and the bitmap array pointer.
This should safe a cache-miss.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Make this a wrapper around iommu_ops_area_alloc() for now
and add more logic to this function later on.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The page-offset of the aperture must be passed instead of 0.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This allows to keep the bitmap_lock only for a very short
period of time.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
There have been present PTEs which in theory could have made
it to the IOMMU TLB. Flush the addresses out on the error
path to make sure no stale entries remain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This lock only protects the address allocation bitmap in one
aperture.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
It is only used in this file anyway, so keep it there. Same
with 'struct aperture_range'.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This prevents possible flooding of the kernel log.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
'x86/amd' into next
Conflicts:
drivers/iommu/amd_iommu_types.h
|
|
Set the device_group call-back to pci_device_group() for the
Intel VT-d and the AMD IOMMU driver.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The driver always uses a constant size for these buffers
anyway, so there is no need to waste memory to store the
sizes.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This mostly removes the code to create dev_data structures
for alias device ids. They are not necessary anymore, as
they were only created for device ids which have no struct
pci_dev associated with it. But these device ids are
handled in a simpler way now, so there is no need for this
code anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
With this we don't have to create dev_data entries for
non-existent devices (which only exist as request-ids).
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
These functions rely on being called with IRQs disabled. Add
a WARN_ON to detect early when its not.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This function is already called with IRQs disabled already.
So no need to disable them again.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The alias list is handled aleady by iommu core code. No need
anymore to handle it in this part of the AMD IOMMU code
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The condition in the BUG_ON is an indicator of a BUG, but no
reason to kill the code path. Turn it into a WARN_ON and
bail out if it is hit.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
During device assignment/deassignment the flags in the DTE
get lost, which might cause spurious faults, for example
when the device tries to access the system management range.
Fix this by not clearing the flags with the rest of the DTE.
Reported-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Tested-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When a device group is detached from its domain, the iommu
core code calls into the iommu driver to detach each device
individually.
Before this functionality went into the iommu core code, it
was implemented in the drivers, also in the AMD IOMMU
driver as the device alias handling code.
This code is still present, as there might be aliases that
don't exist as real PCI devices (and are therefore invisible
to the iommu core code).
Unfortunatly it might happen now, that a device is unbound
multiple times from its domain, first by the alias handling
code and then by the iommu core code (or vice verca).
This ends up in the do_detach function which dereferences
the dev_data->domain pointer. When the device is already
detached, this pointer is NULL and we get a kernel oops.
Removing the alias code completly is not an option, as that
would also remove the code which handles invisible aliases.
The code could be simplified, but this is too big of a
change outside the merge window.
For now, just check the dev_data->domain pointer in
do_detach and bail out if it is NULL.
Reported-by: Andreas Hartmann <andihartmann@freenet.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Found by a coccicheck script.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Allocate the irq data only in the loop.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
With the grouping of multi-function devices a non-ATS
capable device might also end up in the same domain as an
IOMMUv2 capable device.
So handle this situation gracefully and don't consider it a
bug anymore.
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Some AMD systems also have non-PCI devices which can do DMA.
Those can't be handled by the AMD IOMMU, as the hardware can
only handle PCI. These devices would end up with no dma_ops,
as neither the per-device nor the global dma_ops will get
set. SWIOTLB provides global dma_ops when it is active, so
make sure there are global dma_ops too when swiotlb is
disabled.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
In passthrough mode (iommu=pt) all devices are identity
mapped. If a device does not support 64bit DMA it might
still need remapping. Make sure swiotlb is initialized to
provide this remapping.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Since devices with IOMMUv2 functionality might be in the
same group as devices without it, allow those devices in
IOMMUv2 domains too.
Otherwise attaching the group with the IOMMUv2 device to the
domain will fail.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Remove the AMD IOMMU driver implementation for passthrough
mode and rely on the new iommu core features for that.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pul IOMMU fixes from Joerg Roedel:
"Four fixes have queued up to fix regressions introduced after v4.1:
- Don't fail IOMMU driver initialization when the add_device
call-back returns -ENODEV, as that just means that the device is
not translated by the IOMMU. This is pretty common on ARM.
- Two fixes for the ARM-SMMU driver for a wrong feature check and to
remove a redundant NULL check.
- A fix for the AMD IOMMU driver to fix a boot panic on systems where
the BIOS requests Unity Mappings in the IVRS table"
* tag 'iommu-fixes-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Introduce protection_domain_init() function
iommu/arm-smmu: Delete an unnecessary check before the function call "free_io_pgtable_ops"
iommu/arm-smmu: Fix broken ATOS check
iommu: Ignore -ENODEV errors from add_device call-back
|
|
This function contains the common parts between the
initialization of dma_ops_domains and usual protection
domains. This also fixes a long-standing bug which was
uncovered by recent changes, in which the api_lock was not
initialized for dma_ops_domains.
Reported-by: George Wang <xuw2015@gmail.com>
Tested-by: George Wang <xuw2015@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This time with bigger changes than usual:
- A new IOMMU driver for the ARM SMMUv3.
This IOMMU is pretty different from SMMUv1 and v2 in that it is
configured through in-memory structures and not through the MMIO
register region. The ARM SMMUv3 also supports IO demand paging for
PCI devices with PRI/PASID capabilities, but this is not
implemented in the driver yet.
- Lots of cleanups and device-tree support for the Exynos IOMMU
driver. This is part of the effort to bring Exynos DRM support
upstream.
- Introduction of default domains into the IOMMU core code.
The rationale behind this is to move functionalily out of the IOMMU
drivers to common code to get to a unified behavior between
different drivers. The patches here introduce a default domain for
iommu-groups (isolation groups).
A device will now always be attached to a domain, either the
default domain or another domain handled by the device driver. The
IOMMU drivers have to be modified to make use of that feature. So
long the AMD IOMMU driver is converted, with others to follow.
- Patches for the Intel VT-d drvier to fix DMAR faults that happen
when a kdump kernel boots.
When the kdump kernel boots it re-initializes the IOMMU hardware,
which destroys all mappings from the crashed kernel. As this
happens before the endpoint devices are re-initialized, any
in-flight DMA causes a DMAR fault. These faults cause PCI master
aborts, which some devices can't handle properly and go into an
undefined state, so that the device driver in the kdump kernel
fails to initialize them and the dump fails.
This is now fixed by copying over the mapping structures (only
context tables and interrupt remapping tables) from the old kernel
and keep the old mappings in place until the device driver of the
new kernel takes over. This emulates the the behavior without an
IOMMU to the best degree possible.
- A couple of other small fixes and cleanups"
* tag 'iommu-updates-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (69 commits)
iommu/amd: Handle large pages correctly in free_pagetable
iommu/vt-d: Don't disable IR when it was previously enabled
iommu/vt-d: Make sure copied over IR entries are not reused
iommu/vt-d: Copy IR table from old kernel when in kdump mode
iommu/vt-d: Set IRTA in intel_setup_irq_remapping
iommu/vt-d: Disable IRQ remapping in intel_prepare_irq_remapping
iommu/vt-d: Move QI initializationt to intel_setup_irq_remapping
iommu/vt-d: Move EIM detection to intel_prepare_irq_remapping
iommu/vt-d: Enable Translation only if it was previously disabled
iommu/vt-d: Don't disable translation prior to OS handover
iommu/vt-d: Don't copy translation tables if RTT bit needs to be changed
iommu/vt-d: Don't do early domain assignment if kdump kernel
iommu/vt-d: Allocate si_domain in init_dmars()
iommu/vt-d: Mark copied context entries
iommu/vt-d: Do not re-use domain-ids from the old kernel
iommu/vt-d: Copy translation tables from old kernel
iommu/vt-d: Detect pre enabled translation
iommu/vt-d: Make root entry visible for hardware right after allocation
iommu/vt-d: Init QI before root entry is allocated
iommu/vt-d: Cleanup log messages
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
"There were so many changes in the x86/asm, x86/apic and x86/mm topics
in this cycle that the topical separation of -tip broke down somewhat -
so the result is a more traditional architecture pull request,
collected into the 'x86/core' topic.
The topics were still maintained separately as far as possible, so
bisectability and conceptual separation should still be pretty good -
but there were a handful of merge points to avoid excessive
dependencies (and conflicts) that would have been poorly tested in the
end.
The next cycle will hopefully be much more quiet (or at least will
have fewer dependencies).
The main changes in this cycle were:
* x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
Gleixner)
- This is the second and most intrusive part of changes to the x86
interrupt handling - full conversion to hierarchical interrupt
domains:
[IOAPIC domain] -----
|
[MSI domain] --------[Remapping domain] ----- [ Vector domain ]
| (optional) |
[HPET MSI domain] ----- |
|
[DMAR domain] -----------------------------
|
[Legacy domain] -----------------------------
This now reflects the actual hardware and allowed us to distangle
the domain specific code from the underlying parent domain, which
can be optional in the case of interrupt remapping. It's a clear
separation of functionality and removes quite some duct tape
constructs which plugged the remap code between ioapic/msi/hpet
and the vector management.
- Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
injection into guests (Feng Wu)
* x86/asm changes:
- Tons of cleanups and small speedups, micro-optimizations. This
is in preparation to move a good chunk of the low level entry
code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
Brian Gerst)
- Moved all system entry related code to a new home under
arch/x86/entry/ (Ingo Molnar)
- Removal of the fragile and ugly CFI dwarf debuginfo annotations.
Conversion to C will reintroduce many of them - but meanwhile
they are only getting in the way, and the upstream kernel does
not rely on them (Ingo Molnar)
- NOP handling refinements. (Borislav Petkov)
* x86/mm changes:
- Big PAT and MTRR rework: making the code more robust and
preparing to phase out exposing direct MTRR interfaces to drivers -
in favor of using PAT driven interfaces (Toshi Kani, Luis R
Rodriguez, Borislav Petkov)
- New ioremap_wt()/set_memory_wt() interfaces to support
Write-Through cached memory mappings. This is especially
important for good performance on NVDIMM hardware (Toshi Kani)
* x86/ras changes:
- Add support for deferred errors on AMD (Aravind Gopalakrishnan)
This is an important RAS feature which adds hardware support for
poisoned data. That means roughly that the hardware marks data
which it has detected as corrupted but wasn't able to correct, as
poisoned data and raises an APIC interrupt to signal that in the
form of a deferred error. It is the OS's responsibility then to
take proper recovery action and thus prolonge system lifetime as
far as possible.
- Add support for Intel "Local MCE"s: upcoming CPUs will support
CPU-local MCE interrupts, as opposed to the traditional system-
wide broadcasted MCE interrupts (Ashok Raj)
- Misc cleanups (Borislav Petkov)
* x86/platform changes:
- Intel Atom SoC updates
... and lots of other cleanups, fixlets and other changes - see the
shortlog and the Git log for details"
* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
x86/hpet: Use proper hpet device number for MSI allocation
x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
genirq: Prevent crash in irq_move_irq()
genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
iommu, x86: Properly handle posted interrupts for IOMMU hotplug
iommu, x86: Provide irq_remapping_cap() interface
iommu, x86: Setup Posted-Interrupts capability for Intel iommu
iommu, x86: Add cap_pi_support() to detect VT-d PI capability
iommu, x86: Avoid migrating VT-d posted interrupts
iommu, x86: Save the mode (posted or remapped) of an IRTE
iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
iommu: dmar: Provide helper to copy shared irte fields
iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
iommu: Add new member capability to struct irq_remap_ops
x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
...
|
|
'x86/amd', 'default-domains' and 'core' into next
|
|
Make sure that we are skipping over large PTEs while walking
the page-table tree.
Cc: stable@kernel.org
Fixes: 5c34c403b723 ("iommu/amd: Fix memory leak in free_pagetable")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Without this patch only -ENOTSUPP is handled, but there are
other possible errors. Handle them too.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|