Age | Commit message (Collapse) | Author | Files | Lines |
|
This moves intel_iommu_enable_pasid() out of the scope of
CONFIG_INTEL_IOMMU_SVM with more and more features requiring
pasid function.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Intel IOMMU could be turned off with intel_iommu=off. If Intel
IOMMU is off, the intel_iommu struct will not be initialized.
When device drivers call intel_svm_bind_mm(), the NULL pointer
reference will happen there.
Add dmar_disabled check to avoid NULL pointer reference.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 2f26e0a9c9860 ("iommu/vt-d: Add basic SVM PASID support")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The change_pte() interface is tailored for PFN updates, while the
other notifier invalidate_range() should be enough for Intel IOMMU
cache flushing. Actually we've done similar thing for AMD IOMMU
already in 8301da53fbc1 ("iommu/amd: Remove change_pte mmu_notifier
call-back", 2014-07-30) but the Intel IOMMU driver still have it.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
VT-d Rev3.0 has made a few changes to the page request interface,
1. widened PRQ descriptor from 128 bits to 256 bits;
2. removed streaming response type;
3. introduced private data that requires page response even the
request is not last request in group (LPIG).
This is a supplement to commit 1c4f88b7f1f92 ("iommu/vt-d: Shared
virtual address in scalable mode") and makes the svm code compliant
with VT-d Rev3.0.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
'arm/omap', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next
|
|
Deferred invalidation is an ECS specific feature. It will not be
supported when IOMMU works in scalable mode. As we deprecated the
ECS support, remove deferred invalidation and cleanup the code.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Cc: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This patch enables the current SVA (Shared Virtual Address)
implementation to work in the scalable mode.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Intel vt-d spec rev3.0 requires software to use 256-bit
descriptors in invalidation queue. As the spec reads in
section 6.5.2:
Remapping hardware supporting Scalable Mode Translations
(ECAP_REG.SMTS=1) allow software to additionally program
the width of the descriptors (128-bits or 256-bits) that
will be written into the Queue. Software should setup the
Invalidation Queue for 256-bit descriptors before progra-
mming remapping hardware for scalable-mode translation as
128-bit descriptors are treated as invalid descriptors
(see Table 21 in Section 6.5.2.10) in scalable-mode.
This patch adds 256-bit invalidation descriptor support
if the hardware presents scalable mode capability.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
In scalable mode, pasid structure is a two level table with
a pasid directory table and a pasid table. Any pasid entry
can be identified by a pasid value in below way.
1
9 6 5 0
.-----------------------.-------.
| PASID | |
'-----------------------'-------' .-------------.
| | | |
| | | |
| | | |
| .-----------. | .-------------.
| | | |----->| PASID Entry |
| | | | '-------------'
| | | |Plus | |
| .-----------. | | |
|---->| DIR Entry |-------->| |
| '-----------' '-------------'
.---------. |Plus | |
| Context | | | |
| Entry |------->| |
'---------' '-----------'
This changes the pasid table APIs to support scalable mode
PASID directory and PASID table. It also adds a helper to
get the PASID table entry according to the pasid value.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When handling page request without pasid event, go to "no_pasid"
branch instead of "bad_req". Otherwise, a NULL pointer deference
will happen there.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Fixes: a222a7f0bb6c9 'iommu/vt-d: Implement page request handling'
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
callbacks"
Revert 5ff7091f5a2ca ("mm, mmu_notifier: annotate mmu notifiers with
blockable invalidate callbacks").
MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no
longer needed since 93065ac753e4 ("mm, oom: distinguish blockable mode for
mmu notifiers"). We now have a full support for per range !blocking
behavior so we can drop the stop gap workaround which the per notifier
flag was used for.
Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- PASID table handling updates for the Intel VT-d driver. It implements
a global PASID space now so that applications usings multiple devices
will just have one PASID.
- A new config option to make iommu passthroug mode the default.
- New sysfs attribute for iommu groups to export the type of the
default domain.
- A debugfs interface (for debug only) usable by IOMMU drivers to
export internals to user-space.
- R-Car Gen3 SoCs support for the ipmmu-vmsa driver
- The ARM-SMMU now aborts transactions from unknown devices and devices
not attached to any domain.
- Various cleanups and smaller fixes all over the place.
* tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (42 commits)
iommu/omap: Fix cache flushes on L2 table entries
iommu: Remove the ->map_sg indirection
iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel
iommu/arm-smmu-v3: Prevent any devices access to memory without registration
iommu/ipmmu-vmsa: Don't register as BUS IOMMU if machine doesn't have IPMMU-VMSA
iommu/ipmmu-vmsa: Clarify supported platforms
iommu/ipmmu-vmsa: Fix allocation in atomic context
iommu: Add config option to set passthrough as default
iommu: Add sysfs attribyte for domain type
iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register
iommu/arm-smmu: Error out only if not enough context interrupts
iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows the PTE
iommu/io-pgtable-arm: Fix pgtable allocation in selftest
iommu/vt-d: Remove the obsolete per iommu pasid tables
iommu/vt-d: Apply per pci device pasid table in SVA
iommu/vt-d: Allocate and free pasid table
iommu/vt-d: Per PCI device pasid table interfaces
iommu/vt-d: Add for_each_device_domain() helper
iommu/vt-d: Move device_domain_info to header
iommu/vt-d: Apply global PASID in SVA
...
|
|
Use new return type vm_fault_t for fault handler. For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno. Once all instances are converted, vm_fault_t will become a
distinct type.
Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")
In this patch all the caller of handle_mm_fault() are changed to return
vm_fault_t type.
Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The obsolete per iommu pasid tables are no longer used. Hence,
clean up them.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This patch applies the per pci device pasid table in the Shared
Virtual Address (SVA) implementation.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This patch adds the interfaces for per PCI device pasid
table management. Currently we allocate one pasid table
for all PCI devices under the scope of an IOMMU. It's
insecure in some cases where multiple devices under one
single IOMMU unit support PASID features. With per PCI
device pasid table, we can achieve finer protection and
isolation granularity.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This patch applies the global pasid name space in the shared
virtual address (SVA) implementation.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
idr_for_each_entry() is used to iteratte over idr elements
of a given type. It isn't suitable for the globle pasid idr
since the pasid idr consumer could specify different types
of pointers to bind with a pasid.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When drivers call intel_svm_available() to check whether the
Shared Virtual Memory(SVM) is supported by the IOMMU driver,
they will get a warning in the kernel message if the SVM is
not supported by the hardware.
[ 3.790876] WARNING: CPU: 0 PID: 267 at drivers/iommu/intel-svm.c:334 intel_svm_bind_mm+0x292/0x570
[ 3.790877] Modules linked in: dsa(+) vfio_mdev mdev vfio_iommu_type1 soundcore vfio serio_raw parport_pc ppdev lp parport autofs4 psmouse virtio_net pata_acpi
[ 3.790884] CPU: 0 PID: 267 Comm: systemd-udevd Not tainted 4.15.0+ #358
[ 3.790885] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 3.790887] RIP: 0010:intel_svm_bind_mm+0x292/0x570
[ 3.790887] RSP: 0000:ffffac72c08a3a70 EFLAGS: 00010246
[ 3.790889] RAX: 0000000000000000 RBX: ffff90447a5160a0 RCX: 0000000000000000
[ 3.790889] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff90447fc16550
[ 3.790890] RBP: ffff90447a516000 R08: 0000000000000000 R09: 0000000000000178
[ 3.790891] R10: 0000000000000220 R11: 0000000000aaaaaa R12: 0000000000000000
[ 3.790891] R13: 0000000000000002 R14: ffffac72c08a3b18 R15: ffffac72c08a3eb8
[ 3.790893] FS: 00007fb21e85b8c0(0000) GS:ffff90447fc00000(0000) knlGS:0000000000000000
[ 3.790894] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.790894] CR2: 000055c08167d148 CR3: 000000013a6f4000 CR4: 00000000000006f0
[ 3.790903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3.790904] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
This is caused by a unnecessary WARN_ON() in intel_svm_bind_mm().
Hence, remove it.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Remove unnecessary parentheses to comply with preferred coding
style.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
'arm/mediatek', 'arm/exynos', 'arm/renesas', 'arm/smmu' and 'core' into next
|
|
If caching mode is supported, the hardware will cache
none-present or erroneous translation entries. Hence,
software should explicitly invalidate the PASID cache
after a PASID table entry becomes present. We should
issue such invalidation with the PASID value that we
have changed. PASID 0 is not reserved for this case.
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Sankaran Rajesh <rajesh.sankaran@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
A memory block was allocated in intel_svm_bind_mm() but never freed
in a failure path. This patch fixes this by free it to avoid memory
leakage.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Fixes: 2f26e0a9c9860 ('iommu/vt-d: Add basic SVM PASID support')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
struct efi_uga_draw_protocol *uga = NULL, *first_uga;
efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
unsigned long nr_ugas;
- u32 *handles = (u32 *)uga_handle;;
+ u32 *handles = (u32 *)uga_handle;
efi_status_t status = EFI_INVALID_PARAMETER;
int i;
This patch is the result of the following script:
$ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
... followed by manual review to make sure it's all good.
Splitting this up is just crazy talk, let's get over with this and just do it.
Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This time there are not a lot of changes coming from the IOMMU side.
That is partly because I returned from my parental leave late in the
development process and probably partly because everyone was busy with
Spectre and Meltdown mitigation work and didn't find the time for
IOMMU work. So here are the few changes that queued up for this merge
window:
- 5-level page-table support for the Intel IOMMU.
- error reporting improvements for the AMD IOMMU driver
- additional DT bindings for ipmmu-vmsa (Renesas)
- small fixes and cleanups"
* tag 'iommu-updates-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Clean up of_iommu_init_fn
iommu/ipmmu-vmsa: Remove redundant of_iommu_init_fn hook
iommu/msm: Claim bus ops on probe
iommu/vt-d: Enable 5-level paging mode in the PASID entry
iommu/vt-d: Add a check for 5-level paging support
iommu/vt-d: Add a check for 1GB page support
iommu/vt-d: Enable upto 57 bits of domain address width
iommu/vt-d: Use domain instead of cache fetching
iommu/exynos: Don't unconditionally steal bus ops
iommu/omap: Fix debugfs_create_*() usage
iommu/vt-d: clean up pr_irq if request_threaded_irq fails
iommu: Check the result of iommu_group_get() for NULL
iommu/ipmmu-vmsa: Add r8a779(70|95) DT bindings
iommu/ipmmu-vmsa: Add r8a7796 DT binding
iommu/amd: Set the device table entry PPR bit for IOMMU V2 devices
iommu/amd - Record more information about unknown events
|
|
Commit 4d4bbd8526a8 ("mm, oom_reaper: skip mm structs with mmu
notifiers") prevented the oom reaper from unmapping private anonymous
memory with the oom reaper when the oom victim mm had mmu notifiers
registered.
The rationale is that doing mmu_notifier_invalidate_range_{start,end}()
around the unmap_page_range(), which is needed, can block and the oom
killer will stall forever waiting for the victim to exit, which may not
be possible without reaping.
That concern is real, but only true for mmu notifiers that have
blockable invalidate_range_{start,end}() callbacks. This patch adds a
"flags" field to mmu notifier ops that can set a bit to indicate that
these callbacks do not block.
The implementation is steered toward an expensive slowpath, such as
after the oom reaper has grabbed mm->mmap_sem of a still alive oom
victim.
[rientjes@google.com: mmu_notifier_invalidate_range_end() can also call the invalidate_range() must not block, fix comment]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801091339570.240101@chino.kir.corp.google.com
[akpm@linux-foundation.org: make mm_has_blockable_invalidate_notifiers() return bool, use rwsem_is_locked()]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1712141329500.74052@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the CPU has support for 5-level paging enabled and the IOMMU also
supports 5-level paging then enable the 5-level paging mode for first-
level translations - used when SVM is enabled.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add a check to verify IOMMU 5-level paging support. If the CPU supports
supports 5-level paging but the IOMMU does not support it then disable
SVM by not allocating PASID tables.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add a check to verify IOMMU 1GB page support. If the CPU supports 1GB
pages but the IOMMU does not support it then disable SVM by not
allocating PASID tables.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
It is unlikely request_threaded_irq will fail, but if it does for some
reason we should clear iommu->pr_irq in the error path. Also
intel_svm_finish_prq shouldn't try to clean up the page request
interrupt if pr_irq is 0. Without these, if request_threaded_irq were
to fail the following occurs:
fail with no fixes:
[ 0.683147] ------------[ cut here ]------------
[ 0.683148] NULL pointer, cannot free irq
[ 0.683158] WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1632 irq_domain_free_irqs+0x126/0x140
[ 0.683160] Modules linked in:
[ 0.683163] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2 #3
[ 0.683165] Hardware name: /NUC7i3BNB, BIOS BNKBL357.86A.0036.2017.0105.1112 01/05/2017
[ 0.683168] RIP: 0010:irq_domain_free_irqs+0x126/0x140
[ 0.683169] RSP: 0000:ffffc90000037ce8 EFLAGS: 00010292
[ 0.683171] RAX: 000000000000001d RBX: ffff880276283c00 RCX: ffffffff81c5e5e8
[ 0.683172] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
[ 0.683174] RBP: ffff880276283c00 R08: 0000000000000000 R09: 000000000000023c
[ 0.683175] R10: 0000000000000007 R11: 0000000000000000 R12: 000000000000007a
[ 0.683176] R13: 0000000000000001 R14: 0000000000000000 R15: 0000010010000000
[ 0.683178] FS: 0000000000000000(0000) GS:ffff88027ec80000(0000) knlGS:0000000000000000
[ 0.683180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.683181] CR2: 0000000000000000 CR3: 0000000001c09001 CR4: 00000000003606e0
[ 0.683182] Call Trace:
[ 0.683189] intel_svm_finish_prq+0x3c/0x60
[ 0.683191] free_dmar_iommu+0x1ac/0x1b0
[ 0.683195] init_dmars+0xaaa/0xaea
[ 0.683200] ? klist_next+0x19/0xc0
[ 0.683203] ? pci_do_find_bus+0x50/0x50
[ 0.683205] ? pci_get_dev_by_id+0x52/0x70
[ 0.683208] intel_iommu_init+0x498/0x5c7
[ 0.683211] pci_iommu_init+0x13/0x3c
[ 0.683214] ? e820__memblock_setup+0x61/0x61
[ 0.683217] do_one_initcall+0x4d/0x1a0
[ 0.683220] kernel_init_freeable+0x186/0x20e
[ 0.683222] ? set_debug_rodata+0x11/0x11
[ 0.683225] ? rest_init+0xb0/0xb0
[ 0.683226] kernel_init+0xa/0xff
[ 0.683229] ret_from_fork+0x1f/0x30
[ 0.683259] Code: 89 ee 44 89 e7 e8 3b e8 ff ff 5b 5d 44 89 e7 44 89 ee 41 5c 41 5d 41 5e e9 a8 84 ff ff 48 c7 c7 a8 71 a7 81 31 c0 e8 6a d3 f9 ff <0f> ff 5b 5d 41 5c 41 5d 41 5
e c3 0f 1f 44 00 00 66 2e 0f 1f 84
[ 0.683285] ---[ end trace f7650e42792627ca ]---
with iommu->pr_irq = 0, but no check in intel_svm_finish_prq:
[ 0.669561] ------------[ cut here ]------------
[ 0.669563] Trying to free already-free IRQ 0
[ 0.669573] WARNING: CPU: 3 PID: 1 at kernel/irq/manage.c:1546 __free_irq+0xa4/0x2c0
[ 0.669574] Modules linked in:
[ 0.669577] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2 #4
[ 0.669579] Hardware name: /NUC7i3BNB, BIOS BNKBL357.86A.0036.2017.0105.1112 01/05/2017
[ 0.669581] RIP: 0010:__free_irq+0xa4/0x2c0
[ 0.669582] RSP: 0000:ffffc90000037cc0 EFLAGS: 00010082
[ 0.669584] RAX: 0000000000000021 RBX: 0000000000000000 RCX: ffffffff81c5e5e8
[ 0.669585] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000046
[ 0.669587] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000023c
[ 0.669588] R10: 0000000000000007 R11: 0000000000000000 R12: ffff880276253960
[ 0.669589] R13: ffff8802762538a4 R14: ffff880276253800 R15: ffff880276283600
[ 0.669593] FS: 0000000000000000(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000
[ 0.669594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.669596] CR2: 0000000000000000 CR3: 0000000001c09001 CR4: 00000000003606e0
[ 0.669602] Call Trace:
[ 0.669616] free_irq+0x30/0x60
[ 0.669620] intel_svm_finish_prq+0x34/0x60
[ 0.669623] free_dmar_iommu+0x1ac/0x1b0
[ 0.669627] init_dmars+0xaaa/0xaea
[ 0.669631] ? klist_next+0x19/0xc0
[ 0.669634] ? pci_do_find_bus+0x50/0x50
[ 0.669637] ? pci_get_dev_by_id+0x52/0x70
[ 0.669639] intel_iommu_init+0x498/0x5c7
[ 0.669642] pci_iommu_init+0x13/0x3c
[ 0.669645] ? e820__memblock_setup+0x61/0x61
[ 0.669648] do_one_initcall+0x4d/0x1a0
[ 0.669651] kernel_init_freeable+0x186/0x20e
[ 0.669653] ? set_debug_rodata+0x11/0x11
[ 0.669656] ? rest_init+0xb0/0xb0
[ 0.669658] kernel_init+0xa/0xff
[ 0.669661] ret_from_fork+0x1f/0x30
[ 0.669662] Code: 7a 08 75 0e e9 c3 01 00 00 4c 39 7b 08 74 57 48 89 da 48 8b 5a 18 48 85 db 75 ee 89 ee 48 c7 c7 78 67 a7 81 31 c0 e8 4c 37 fa ff <0f> ff 48 8b 34 24 4c 89 ef e
8 0e 4c 68 00 49 8b 46 40 48 8b 80
[ 0.669688] ---[ end trace 58a470248700f2fc ]---
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
In intel_svm_unbind_mm(), pasid table entry must be cleared during
svm free. Otherwise, hardware may be set up with a wild pointer.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
intel_svm_alloc_pasid_tables() might return an error but never be
checked by the callers. Later when intel_svm_bind_mm() is called,
there are no checks for valid pasid tables before enabling them.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"Slightly more changes than usual this time:
- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now tries
to preserve the mappings of the kernel so that master aborts for
devices are avoided. Master aborts cause some devices to fail in
the kdump kernel, so this code makes the dump more likely to
succeed when AMD IOMMU is enabled.
- common flush queue implementation for IOVA code users. The code is
still optional, but AMD and Intel IOMMU drivers had their own
implementation which is now unified.
- finish support for iommu-groups. All drivers implement this feature
now so that IOMMU core code can rely on it.
- finish support for 'struct iommu_device' in iommu drivers. All
drivers now use the interface.
- new functions in the IOMMU-API for explicit IO/TLB flushing. This
will help to reduce the number of IO/TLB flushes when IOMMU drivers
support this interface.
- support for mt2712 in the Mediatek IOMMU driver
- new IOMMU driver for QCOM hardware
- system PM support for ARM-SMMU
- shutdown method for ARM-SMMU-v3
- some constification patches
- various other small improvements and fixes"
* tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
iommu/vt-d: Don't be too aggressive when clearing one context entry
iommu: Introduce Interface for IOMMU TLB Flushing
iommu/s390: Constify iommu_ops
iommu/vt-d: Avoid calling virt_to_phys() on null pointer
iommu/vt-d: IOMMU Page Request needs to check if address is canonical.
arm/tegra: Call bus_set_iommu() after iommu_device_register()
iommu/exynos: Constify iommu_ops
iommu/ipmmu-vmsa: Make ipmmu_gather_ops const
iommu/ipmmu-vmsa: Rereserving a free context before setting up a pagetable
iommu/amd: Rename a few flush functions
iommu/amd: Check if domain is NULL in get_domain() and return -EBUSY
iommu/mediatek: Fix a build warning of BIT(32) in ARM
iommu/mediatek: Fix a build fail of m4u_type
iommu: qcom: annotate PM functions as __maybe_unused
iommu/pamu: Fix PAMU boot crash
memory: mtk-smi: Degrade SMI init to module_init
iommu/mediatek: Enlarge the validate PA range for 4GB mode
iommu/mediatek: Disable iommu clock when system suspend
iommu/mediatek: Move pgtable allocation into domain_alloc
iommu/mediatek: Merge 2 M4U HWs into one iommu domain
...
|
|
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Page Request from devices that support device-tlb would request translation
to pre-cache them in device to avoid overhead of IOMMU lookups.
IOMMU needs to check for canonicallity of the address before performing
page-fault processing.
To: Joerg Roedel <joro@8bytes.org>
To: linux-kernel@vger.kernel.org>
Cc: iommu@lists.linux-foundation.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reported-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
A driver would need to know if there are any active references to a
a PASID before cleaning up its resources. This function helps check
if there are any active users of a PASID before it can perform any
recovery on that device.
To: Joerg Roedel <joro@8bytes.org>
To: linux-kernel@vger.kernel.org
To: David Woodhouse <dwmw2@infradead.org>
Cc: Jean-Phillipe Brucker <jean-philippe.brucker@arm.com>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
<linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
The APIs that are going to be moved first are:
mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We already have the helper, we can convert the rest of the kernel
mechanically using:
git grep -l 'atomic_inc_not_zero.*mm_users' | xargs sed -i 's/atomic_inc_not_zero(&\(.*\)->mm_users)/mmget_not_zero\(\1\)/'
This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.
Link: http://lkml.kernel.org/r/20161218123229.22952-3-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Somehow I ended up with an off-by-three error in calculating the size of
the PASID and PASID State tables, which triggers allocations failures as
those tables unfortunately have to be physically contiguous.
In fact, even the *correct* maximum size of 8MiB is problematic and is
wont to lead to allocation failures. Since I have extracted a promise
that this *will* be fixed in hardware, I'm happy to limit it on the
current hardware to a maximum of 0x20000 PASIDs, which gives us 1MiB
tables — still not ideal, but better than before.
Reported by Mika Kuoppala <mika.kuoppala@linux.intel.com> and also by
Xunlei Pang <xlpang@redhat.com> who submitted a simpler patch to fix
only the allocation (and not the free) to the "correct" limit... which
was still problematic.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
|
|
We always have vma->vm_mm around.
Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
According to the VT-d specification we need to clear the PPR bit in
the Page Request Status register when handling page requests, or the
hardware won't generate any more interrupts.
This wasn't actually necessary on SKL/KBL (which may well be the
subject of a hardware erratum, although it's harmless enough). But
other implementations do appear to get it right, and we only ever get
one interrupt unless we clear the PPR bit.
Reported-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
|
|
Holding mm_users works OK for graphics, which was the first user of SVM
with VT-d. However, it works less well for other devices, where we actually
do a mmap() from the file descriptor to which the SVM PASID state is tied.
In this case on process exit we end up with a recursive reference count:
- The MM remains alive until the file is closed and the driver's release()
call ends up unbinding the PASID.
- The VMA corresponding to the mmap() remains intact until the MM is
destroyed.
- Thus the file isn't closed, even when exit_files() runs, because the
VMA is still holding a reference to it. And the MM remains alive…
To address this issue, we *stop* holding mm_users while the PASID is bound.
We already hold mm_count by virtue of the MMU notifier, and that can be
made to be sufficient.
It means that for a period during process exit, the fun part of mmput()
has happened and exit_mmap() has been called so the MM is basically
defunct. But the PGD still exists and the PASID is still bound to it.
During this period, we have to be very careful — exit_mmap() doesn't use
mm->mmap_sem because it doesn't expect anyone else to be touching the MM
(quite reasonably, since mm_users is zero). So we also need to fix the
fault handler to just report failure if mm_users is already zero, and to
temporarily bump mm_users while handling any faults.
Additionally, exit_mmap() calls mmu_notifier_release() *before* it tears
down the page tables, which is too early for us to flush the IOTLB for
this PASID. And __mmu_notifier_release() removes every notifier from the
list, so when exit_mmap() finally *does* tear down the mappings and
clear the page tables, we don't get notified. So we work around this by
clearing the PASID table entry in our MMU notifier release() callback.
That way, the hardware *can't* get any pages back from the page tables
before they get cleared.
Hardware designers have confirmed that the resulting 'PASID not present'
faults should be handled just as gracefully as 'page not present' faults,
the important criterion being that they don't perturb the operation for
any *other* PASID in the system.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
|
|
Not doing so is a bug and might trigger a BUG_ON in
handle_mm_fault(). So add the proper permission checks
before calling into mm code.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This is the downside of using bitfields in the struct definition, rather
than doing all the explicit masking and shifting.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Not entirely clear why, but it seems we need to reserve PASID zero and
flush it when we make a PASID entry present.
Quite we we couldn't use the true PASID value, isn't clear.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Change the 'pages' parameter to 'unsigned long' to avoid overflow.
Fix the device-IOTLB flush parameter calculation — the size of the IOTLB
flush is indicated by the position of the least significant zero bit in
the address field. For example, a value of 0x12345f000 will flush from
0x123440000 to 0x12347ffff (256KiB).
Finally, the cap_pgsel_inv() is not relevant to SVM; the spec says that
*all* implementations must support page-selective invaliation for
"first-level" translations. So don't check for it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
There is an extra semi-colon on this if statement so we always break on
the first iteration.
Fixes: 0204a4960982 ('iommu/vt-d: Add callback to device driver on page faults')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
When flushing kernel-mode PASIDs, we need to flush global pages too.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
This really should be VTD_PAGE_SHIFT, not PAGE_SHIFT. Not that we ever
really anticipate seeing this used on IA64, but we should get it right
anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The "req->addr" variable is a bit field declared as "u64 addr:52;".
The "address" variable is a u64. We need to cast "req->addr" to a u64
before the shift or the result is truncated to 52 bits.
Fixes: a222a7f0bb6c ('iommu/vt-d: Implement page request handling')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|