summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2016-08-01MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()James Hogan1-1/+1
kvm_mips_trans_replace() passes a pointer to KVM_GUEST_KSEGX(). This breaks on 64-bit builds due to the cast of that 64-bit pointer to a different sized 32-bit int. Cast the pointer argument to an unsigned long to work around the warning. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Sign extend MFC0/RDHWR resultsJames Hogan1-3/+4
When emulating MFC0 instructions to load 32-bit values from guest COP0 registers and the RDHWR instruction to read the CC (Count) register, sign extend the result to comply with the MIPS64 architecture. The result must be in canonical 32-bit form or the guest may malfunction. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Fix 64-bit big endian dynamic translationJames Hogan1-0/+8
The MFC0 and MTC0 instructions in the guest which cause traps can be replaced with 32-bit loads and stores to the commpage, however on big endian 64-bit builds the offset needs to have 4 added so as to load/store the least significant half of the long instead of the most significant half. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Fail if ebase doesn't fit in CP0_EBaseJames Hogan1-0/+12
Fail if the address of the allocated exception base doesn't fit into the CP0_EBase register. This can happen on MIPS64 if CP0_EBase.WG isn't implemented but RAM is available outside of the range of KSeg0. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Use 64-bit CP0_EBase when appropriateJames Hogan1-3/+22
Update the KVM entry point to write CP0_EBase as a 64-bit register when it is 64-bits wide, and to set the WG (write gate) bit if it exists in order to write bits 63:30 (or 31:30 on MIPS32). Prior to MIPS64r6 it was UNDEFINED to perform a 64-bit read or write of a 32-bit COP0 register. Since this is dynamically generated code, generate the right type of access depending on whether the kernel is 64-bit and cpu_has_ebase_wg. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Set CP0_Status.KX on MIPS64James Hogan1-2/+8
Update the KVM entry code to set the CP0_Entry.KX bit on 64-bit kernels. This is important to allow the entry code, running in kernel mode, to access the full 64-bit address space right up to the point of entering the guest, and immediately after exiting the guest, so it can safely restore & save the guest context from 64-bit segments. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Make entry code MIPS64 friendlyJames Hogan1-24/+24
The MIPS KVM entry code (originally kvm_locore.S, later locore.S, and now entry.c) has never quite been right when built for 64-bit, using 32-bit instructions when 64-bit instructions were needed for handling 64-bit registers and pointers. Fix several cases of this now. The changes roughly fall into the following categories. - COP0 scratch registers contain guest register values and the VCPU pointer, and are themselves full width. Similarly CP0_EPC and CP0_BadVAddr registers are full width (even though technically we don't support 64-bit guest address spaces with trap & emulate KVM). Use MFC0/MTC0 for accessing them. - Handling of stack pointers and the VCPU pointer must match the pointer size of the kernel ABI (always o32 or n64), so use ADDIU. - The CPU number in thread_info, and the guest_{user,kernel}_asid arrays in kvm_vcpu_arch are all 32 bit integers, so use lw (instead of LW) to load them. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Use kmap instead of CKSEG0ADDR()James Hogan2-7/+17
There are several unportable uses of CKSEG0ADDR() in MIPS KVM, which implicitly assume that a host physical address will be in the low 512MB of the physical address space (accessible in KSeg0). These assumptions don't hold for highmem or on 64-bit kernels. When interpreting the guest physical address when reading or overwriting a trapping instruction, use kmap_atomic() to get a usable virtual address to access guest memory, which is portable to 64-bit and highmem kernels. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: KVM: Use virt_to_phys() to get commpage PFNJames Hogan1-1/+1
Calculate the PFN of the commpage using virt_to_phys() instead of CPHYSADDR(). This is more portable as kzalloc() may allocate from XKPhys instead of KSeg0 on 64-bit kernels, which CPHYSADDR() doesn't handle. This is sufficient for highmem kernels too since kzalloc() will allocate from lowmem in KSeg0. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01MIPS: Fix definition of KSEGX() for 64-bitJames Hogan1-1/+1
The KSEGX() macro is defined to 32-bit sign extend the address argument and logically AND the result with 0xe0000000, with the final result usually compared against one of the CKSEG macros. However the literal 0xe0000000 is unsigned as the high bit is set, and is therefore zero-extended on 64-bit kernels, resulting in the sign extension bits of the argument being masked to zero. This results in the odd situation where: KSEGX(CKSEG) != CKSEG (0xffffffff80000000 & 0x00000000e0000000) != 0xffffffff80000000) Fix this by 32-bit sign extending the 0xe0000000 literal using _ACAST32_. This will help some MIPS KVM code handling 32-bit guest addresses to work on 64-bit host kernels, but will also affect KSEGX in dec_kn01_be_backend() on a 64-bit DECstation kernel, and the SiByte DMA page ops KSEGX check in clear_page() and copy_page() on 64-bit SB1 kernels, neither of which appear to be designed with 64-bit segments in mind anyway. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: linux-mips@linux-mips.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLDJim Mattson1-11/+15
Kexec needs to know the addresses of all VMCSs that are active on each CPU, so that it can flush them from the VMCS caches. It is safe to record superfluous addresses that are not associated with an active VMCS, but it is not safe to omit an address associated with an active VMCS. After a call to vmcs_load, the VMCS that was loaded is active on the CPU. The VMCS should be added to the CPU's list of active VMCSs before it is loaded. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-01kvm: x86: nVMX: maintain internal copy of current VMCSDavid Matlack1-3/+28
KVM maintains L1's current VMCS in guest memory, at the guest physical page identified by the argument to VMPTRLD. This makes hairy time-of-check to time-of-use bugs possible,as VCPUs can be writing the the VMCS page in memory while KVM is emulating VMLAUNCH and VMRESUME. The spec documents that writing to the VMCS page while it is loaded is "undefined". Therefore it is reasonable to load the entire VMCS into an internal cache during VMPTRLD and ignore writes to the VMCS page -- the guest should be using VMREAD and VMWRITE to access the current VMCS. To adhere to the spec, KVM should flush the current VMCS during VMPTRLD, and the target VMCS during VMCLEAR (as given by the operand to VMCLEAR). Since this implementation of VMCS caching only maintains the the current VMCS, VMCLEAR will only do a flush if the operand to VMCLEAR is the current VMCS pointer. KVM will also flush during VMXOFF, which is not mandated by the spec, but also not in conflict with the spec. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-28KVM: PPC: Book3S HV: Save/restore TM state in H_CEDEPaul Mackerras1-0/+13
It turns out that if the guest does a H_CEDE while the CPU is in a transactional state, and the H_CEDE does a nap, and the nap loses the architected state of the CPU (which is is allowed to do), then we lose the checkpointed state of the virtual CPU. In addition, the transactional-memory state recorded in the MSR gets reset back to non-transactional, and when we try to return to the guest, we take a TM bad thing type of program interrupt because we are trying to transition from non-transactional to transactional with a hrfid instruction, which is not permitted. The result of the program interrupt occurring at that point is that the host CPU will hang in an infinite loop with interrupts disabled. Thus this is a denial of service vulnerability in the host which can be triggered by any guest (and depending on the guest kernel, it can potentially triggered by unprivileged userspace in the guest). This vulnerability has been assigned the ID CVE-2016-5412. To fix this, we save the TM state before napping and restore it on exit from the nap, when handling a H_CEDE in real mode. The case where H_CEDE exits to host virtual mode is already OK (as are other hcalls which exit to host virtual mode) because the exit path saves the TM state. Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-28KVM: PPC: Book3S HV: Pull out TM state save/restore into separate proceduresPaul Mackerras1-212/+237
This moves the transactional memory state save and restore sequences out of the guest entry/exit paths into separate procedures. This is so that these sequences can be used in going into and out of nap in a subsequent patch. The only code changes here are (a) saving and restore LR on the stack, since these new procedures get called with a bl instruction, (b) explicitly saving r1 into the PACA instead of assuming that HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary and redundant setting of MSR[TM] that should have been removed by commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory support", 2013-09-24) but wasn't. Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-22Merge tag 'kvm-arm-for-4.8' of ↵Radim Krčmář28-362/+274
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM changes for Linux 4.8 - GICv3 ITS emulation - Simpler idmap management that fixes potential TLB conflicts - Honor the kernel protection in HYP mode - Removal of the old vgic implementation
2016-07-18KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controllerAndre Przywara3-0/+8
Now that all ITS emulation functionality is in place, we advertise MSI functionality to userland and also the ITS device to the guest - if userland has configured that. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: arm64: vgic-its: Introduce new KVM ITS deviceAndre Przywara2-0/+3
Introduce a new KVM device that represents an ARM Interrupt Translation Service (ITS) controller. Since there can be multiple of this per guest, we can't piggy back on the existing GICv3 distributor device, but create a new type of KVM device. On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data structure and store the pointer in the kvm_device data. Upon an explicit init ioctl from userland (after having setup the MMIO address) we register the handlers with the kvm_io_bus framework. Any reference to an ITS thus has to go via this interface. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: arm/arm64: Extend arch CAP checks to allow per-VM capabilitiesAndre Przywara4-4/+4
KVM capabilities can be a per-VM property, though ARM/ARM64 currently does not pass on the VM pointer to the architecture specific capability handlers. Add a "struct kvm*" parameter to those function to later allow proper per-VM capability reporting. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: s390: let ptff intercepts result in cc=3David Hildenbrand1-0/+8
We don't emulate ptff subfunctions, therefore react on any attempt of execution by setting cc=3 (Requested function not available). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-07-18KVM: s390: allow user space to handle instr 0x0000David Hildenbrand3-2/+29
We will use illegal instruction 0x0000 for handling 2 byte sw breakpoints from user space. As it can be enabled dynamically via a capability, let's move setting of ICTL_OPEREXC to the post creation step, so we avoid any races when enabling that capability just while adding new cpus. Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-07-14arm64: KVM: Clean up a conditionDan Carpenter1-2/+2
My static checker complains that this condition looks like it should be == instead of =. This isn't a fast path, so we don't need to be fancy. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-14KVM: x86: bump KVM_MAX_VCPU_ID to 1023Radim Krčmář3-4/+12
kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and rcu had to be modified to cope with it. The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower value was chosen in case there were bugs. 1023 is sufficient maximum APIC ID for 288 VCPUs. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: bump MAX_VCPUS to 288Radim Krčmář1-1/+1
288 is in high demand because of Knights Landing CPU. We cannot set the limit to 640k, because that would be wasting space. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: add a flag to disable KVM x2apic broadcast quirkRadim Krčmář3-14/+45
Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to KVM_CAP_X2APIC_API. The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode. The enableable capability is needed in order to support standard x2APIC and remain backward compatible. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> [Expand kvm_apic_mda comment. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: add KVM_CAP_X2APIC_APIRadim Krčmář5-11/+52
KVM_CAP_X2APIC_API is a capability for features related to x2APIC enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to extend APIC ID in get/set ioctl and MSI addresses to 32 bits. Both are needed to support x2APIC. The feature has to be enableable and disabled by default, because get/set ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol inspired by xAPIC and the change is not backward-compatible. Changes to MSI addresses follow the format used by interrupt remapping unit. The upper address word, that used to be 0, contains upper 24 bits of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as 0. Using the upper address word is not backward-compatible either as we didn't check that userspace zeroed the word. Reserved bits are still not explicitly checked, but non-zero data will affect LAPIC addresses, which will cause a bug. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: pass struct kvm to kvm_set_routing_entryRadim Krčmář3-3/+6
Arch-specific code will use it. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: reset lapic base in kvm_lapic_resetRadim Krčmář1-4/+4
LAPIC is reset in xAPIC mode and the surrounding code expects that. KVM never resets after initialization. This patch is just for sanity. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: VMX: optimize APIC ID read with APICvRadim Krčmář1-3/+0
The register is in hardware-compatible format now, so there is not need to intercept. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: reset APIC ID when enabling LAPICRadim Krčmář1-2/+3
APIC ID should be set to the initial APIC ID when enabling LAPIC. This only matters if the guest changes APIC ID. No sane OS does that. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: use hardware-compatible format for APIC ID registerRadim Krčmář3-22/+52
We currently always shift APIC ID as if APIC was in xAPIC mode. x2APIC mode wants to use more bits and storing a hardware-compabible value is the the sanest option. KVM API to set the lapic expects that bottom 8 bits of APIC ID are in top 8 bits of APIC_ID register, so the register needs to be shifted in x2APIC mode. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: use generic function for MSI parsingRadim Krčmář1-12/+7
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: dynamic kvm_apic_mapRadim Krčmář3-7/+16
x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will probably has fewer VCPUs. Dynamic size saves memory at the cost of turning one constant into a variable. apic_map mutex had to be moved before allocation to avoid races with cpu hotplug. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: use physical LAPIC array for logical x2APICRadim Krčmář2-38/+41
Logical x2APIC IDs map injectively to physical x2APIC IDs, so we can reuse the physical array for them. This allows us to save space by sizing the logical maps according to the needs of xAPIC. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: add kvm_apic_map_get_dest_lapicRadim Krčmář1-132/+98
kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast both compute the interrupt destination. Factor the code. 'struct kvm_lapic **dst = NULL' had to be added to silence GCC. GCC might complain about potential NULL access in the future, because it missed conditions that avoided uninitialized uses of dst. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240Radim Krčmář1-1/+1
240 has been well tested by Red Hat. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14kvm: vmx: advertise support for ept execute onlyBandan Das1-0/+3
MMU now knows about execute only mappings, so advertise the feature to L1 hypervisors Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14kvm: mmu: track read permission explicitly for shadow EPT page tablesBandan Das3-10/+23
To support execute only mappings on behalf of L1 hypervisors, reuse ACC_USER_MASK to signify if the L1 hypervisor has the R bit set. For the nested EPT case, we assumed that the U bit was always set since there was no equivalent in EPT page tables. Strictly speaking, this was not necessary because handle_ept_violation never set PFERR_USER_MASK in the error code (uf=0 in the parlance of update_permission_bitmask). We now have to set both U and UF correctly, respectively in FNAME(gpte_access) and in handle_ept_violation. Also in handle_ept_violation bit 3 of the exit qualification is not enough to detect a present PTE; all three bits 3-5 have to be checked. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14kvm: mmu: don't set the present bit unconditionallyBandan Das4-9/+11
To support execute only mappings on behalf of L1 hypervisors, we need to teach set_spte() to honor all three of L1's XWR bits. As a start, add a new variable "shadow_present_mask" that will be set for non-EPT shadow paging and clear for EPT. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14kvm: mmu: remove is_present_gpte()Bandan Das4-8/+3
We have two versions of the above function. To prevent confusion and bugs in the future, remove the non-FNAME version entirely and replace all calls with the actual check. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14kvm: mmu: extend the is_present check to 32 bitsBandan Das1-1/+1
This is safe because this function is called on host controlled page table and non-present/non-MMIO sptes never use bits 1..31. For the EPT case, this ensures that cases where only the execute bit is set is marked valid. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini13-9/+407
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
2016-07-11KVM: VMX: introduce vm_{entry,exit}_control_reset_shadowPaolo Bonzini1-2/+12
There is no reason to read the entry/exit control fields of the VMCS and immediately write back the same value. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11KVM: nVMX: keep preemption timer enabled during L2 executionPaolo Bonzini1-2/+13
Because the vmcs12 preemption timer is emulated through a separate hrtimer, we can keep on using the preemption timer in the vmcs02 to emulare L1's TSC deadline timer. However, the corresponding bit in the pin-based execution control field must be kept consistent between vmcs01 and vmcs02. On vmentry we copy it into the vmcs02; on vmexit the preemption timer must be disabled in the vmcs01 if a preemption timer vmexit happened while in guest mode. The preemption timer value in the vmcs02 is set by vmx_vcpu_run, so it need not be considered in prepare_vmcs02. Cc: Yunhong Jiang <yunhong.jiang@intel.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Tested-by: Wanpeng Li <kernellwp@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11KVM: nVMX: avoid incorrect preemption timer vmexit in nested guestWanpeng Li1-0/+2
The preemption timer for nested VMX is emulated by hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the check_nested_events hook. However, nested_vmx_exit_handled is always returning true for preemption timer vmexit. Then, the L1 preemption timer vmexit is captured and be treated as a L2 preemption timer vmexit, causing NULL pointer dereferences or worse in the L1 guest's vmexit handler: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP Call Trace: ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm] handle_preemption_timer+0xe/0x20 [kvm_intel] vmx_handle_exit+0x169/0x15a0 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm] kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm] ? vcpu_load+0x1c/0x60 [kvm] ? kvm_arch_vcpu_load+0x57/0x260 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] do_vfs_ioctl+0x96/0x6a0 ? __fget_light+0x2a/0x90 SyS_ioctl+0x79/0x90 do_syscall_64+0x68/0x180 entry_SYSCALL64_slow_path+0x25/0x25 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8800b5263c48> CR2: 0000000000000000 ---[ end trace 9c70c48b1a2bc66e ]--- This can be reproduced readily by preemption timer enabled on L0 and disabled on L1. Return false since preemption timer vmexits must never be reflected to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-11KVM: VMX: reflect broken preemption timer in vmcs_configPaolo Bonzini1-3/+2
Simplify cpu_has_vmx_preemption_timer. This is consistent with the rest of setup_vmcs_config and preparatory for the next patch. Tested-by: Wanpeng Li <kernellwp@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05MIPS: KVM: Emulate generic QEMU machine on r6 T&EJames Hogan1-1/+7
Default the guest PRId register to represent a generic QEMU machine instead of a 24kc on MIPSr6. 24kc isn't supported by r6 Linux kernels. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05MIPS: KVM: Decode RDHWR more strictlyJames Hogan1-1/+3
When KVM emulates the RDHWR instruction, decode the instruction more strictly. The rs field (bits 25:21) should be zero, as should bits 10:9. Bits 8:6 is the register select field in MIPSr6, so we aren't strict about those bits (no other operations should use that encoding space). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05MIPS: KVM: Recognise r6 CACHE encodingJames Hogan2-2/+24
Recognise the new MIPSr6 CACHE instruction encoding rather than the pre-r6 one when an r6 kernel is being built. A SPECIAL3 opcode is used and the immediate field is reduced to 9 bits wide since MIPSr6. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05MIPS: KVM: Support r6 compact branch emulationJames Hogan1-6/+46
Add support in KVM for emulation of instructions in the forbidden slot of MIPSr6 compact branches. If we hit an exception on the forbidden slot, then the branch must not have been taken, which makes calculation of the resume PC trivial. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05MIPS: KVM: Don't save/restore lo/hi for r6James Hogan2-12/+10
MIPSr6 doesn't have lo/hi registers, so don't bother saving or restoring them, and don't expose them to userland with the KVM ioctl interface either. In fact the lo/hi registers aren't callee saved in the MIPS ABIs anyway, so there is no need to preserve the host lo/hi values at all when transitioning to and from the guest (which happens via a function call). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>