diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-12-13 15:47:02 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-12-13 15:47:02 -0800 |
commit | 93173b5bf2841da7e3a9b0cb1312ef5c87251524 (patch) | |
tree | 629de2735f465ce0437f8dba85a00f766bbec31c /Documentation/virtual | |
parent | 1c59e1edb13d60b97b7b03b332ceed5d967d4227 (diff) | |
parent | f673b5b2a66332da5358def524dbfb3305c76d8c (diff) |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small release, the most interesting stuff is x86 nested virt
improvements.
x86:
- userspace can now hide nested VMX features from guests
- nested VMX can now run Hyper-V in a guest
- support for AVX512_4VNNIW and AVX512_FMAPS in KVM
- infrastructure support for virtual Intel GPUs.
PPC:
- support for KVM guests on POWER9
- improved support for interrupt polling
- optimizations and cleanups.
s390:
- two small optimizations, more stuff is in flight and will be in
4.11.
ARM:
- support for the GICv3 ITS on 32bit platforms"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest
KVM: arm/arm64: timer: Check for properly initialized timer on init
KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs
KVM: x86: Handle the kthread worker using the new API
KVM: nVMX: invvpid handling improvements
KVM: nVMX: check host CR3 on vmentry and vmexit
KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry
KVM: nVMX: propagate errors from prepare_vmcs02
KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM: nVMX: support restore of VMX capability MSRs
KVM: nVMX: generate non-true VMX MSRs based on true versions
KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.
KVM: x86: Add kvm_skip_emulated_instruction and use it.
KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12
KVM: VMX: Reorder some skip_emulated_instruction calls
KVM: x86: Add a return value to kvm_emulate_cpuid
KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h
...
Diffstat (limited to 'Documentation/virtual')
-rw-r--r-- | Documentation/virtual/kvm/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/virtual/kvm/api.txt | 5 | ||||
-rw-r--r-- | Documentation/virtual/kvm/halt-polling.txt | 127 |
3 files changed, 133 insertions, 1 deletions
diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX index fee9f2bf9c64..69fe1a8b7ad1 100644 --- a/Documentation/virtual/kvm/00-INDEX +++ b/Documentation/virtual/kvm/00-INDEX @@ -6,6 +6,8 @@ cpuid.txt - KVM-specific cpuid leaves (x86). devices/ - KVM_CAP_DEVICE_CTRL userspace API. +halt-polling.txt + - notes on halt-polling hypercalls.txt - KVM hypercalls. locking.txt diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6bbceb9a3a19..03145b7cafaa 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2034,6 +2034,8 @@ registers, find a list below: PPC | KVM_REG_PPC_WORT | 64 PPC | KVM_REG_PPC_SPRG9 | 64 PPC | KVM_REG_PPC_DBSR | 32 + PPC | KVM_REG_PPC_TIDR | 64 + PPC | KVM_REG_PPC_PSSCR | 64 PPC | KVM_REG_PPC_TM_GPR0 | 64 ... PPC | KVM_REG_PPC_TM_GPR31 | 64 @@ -2050,6 +2052,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TM_VSCR | 32 PPC | KVM_REG_PPC_TM_DSCR | 64 PPC | KVM_REG_PPC_TM_TAR | 64 + PPC | KVM_REG_PPC_TM_XER | 64 | | MIPS | KVM_REG_MIPS_R0 | 64 ... @@ -2209,7 +2212,7 @@ after pausing the vcpu, but before it is resumed. 4.71 KVM_SIGNAL_MSI Capability: KVM_CAP_SIGNAL_MSI -Architectures: x86 arm64 +Architectures: x86 arm arm64 Type: vm ioctl Parameters: struct kvm_msi (in) Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error diff --git a/Documentation/virtual/kvm/halt-polling.txt b/Documentation/virtual/kvm/halt-polling.txt new file mode 100644 index 000000000000..4a8418318769 --- /dev/null +++ b/Documentation/virtual/kvm/halt-polling.txt @@ -0,0 +1,127 @@ +The KVM halt polling system +=========================== + +The KVM halt polling system provides a feature within KVM whereby the latency +of a guest can, under some circumstances, be reduced by polling in the host +for some time period after the guest has elected to no longer run by cedeing. +That is, when a guest vcpu has ceded, or in the case of powerpc when all of the +vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions +before giving up the cpu to the scheduler in order to let something else run. + +Polling provides a latency advantage in cases where the guest can be run again +very quickly by at least saving us a trip through the scheduler, normally on +the order of a few micro-seconds, although performance benefits are workload +dependant. In the event that no wakeup source arrives during the polling +interval or some other task on the runqueue is runnable the scheduler is +invoked. Thus halt polling is especially useful on workloads with very short +wakeup periods where the time spent halt polling is minimised and the time +savings of not invoking the scheduler are distinguishable. + +The generic halt polling code is implemented in: + + virt/kvm/kvm_main.c: kvm_vcpu_block() + +The powerpc kvm-hv specific case is implemented in: + + arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked() + +Halt Polling Interval +===================== + +The maximum time for which to poll before invoking the scheduler, referred to +as the halt polling interval, is increased and decreased based on the perceived +effectiveness of the polling in an attempt to limit pointless polling. +This value is stored in either the vcpu struct: + + kvm_vcpu->halt_poll_ns + +or in the case of powerpc kvm-hv, in the vcore struct: + + kvmppc_vcore->halt_poll_ns + +Thus this is a per vcpu (or vcore) value. + +During polling if a wakeup source is received within the halt polling interval, +the interval is left unchanged. In the event that a wakeup source isn't +received during the polling interval (and thus schedule is invoked) there are +two options, either the polling interval and total block time[0] were less than +the global max polling interval (see module params below), or the total block +time was greater than the global max polling interval. + +In the event that both the polling interval and total block time were less than +the global max polling interval then the polling interval can be increased in +the hope that next time during the longer polling interval the wake up source +will be received while the host is polling and the latency benefits will be +received. The polling interval is grown in the function grow_halt_poll_ns() and +is multiplied by the module parameter halt_poll_ns_grow. + +In the event that the total block time was greater than the global max polling +interval then the host will never poll for long enough (limited by the global +max) to wakeup during the polling interval so it may as well be shrunk in order +to avoid pointless polling. The polling interval is shrunk in the function +shrink_halt_poll_ns() and is divided by the module parameter +halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0. + +It is worth noting that this adjustment process attempts to hone in on some +steady state polling interval but will only really do a good job for wakeups +which come at an approximately constant rate, otherwise there will be constant +adjustment of the polling interval. + +[0] total block time: the time between when the halt polling function is + invoked and a wakeup source received (irrespective of + whether the scheduler is invoked within that function). + +Module Parameters +================= + +The kvm module has 3 tuneable module parameters to adjust the global max +polling interval as well as the rate at which the polling interval is grown and +shrunk. These variables are defined in include/linux/kvm_host.h and as module +parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the +powerpc kvm-hv case. + +Module Parameter | Description | Default Value +-------------------------------------------------------------------------------- +halt_poll_ns | The global max polling interval | KVM_HALT_POLL_NS_DEFAULT + | which defines the ceiling value | + | of the polling interval for | (per arch value) + | each vcpu. | +-------------------------------------------------------------------------------- +halt_poll_ns_grow | The value by which the halt | 2 + | polling interval is multiplied | + | in the grow_halt_poll_ns() | + | function. | +-------------------------------------------------------------------------------- +halt_poll_ns_shrink | The value by which the halt | 0 + | polling interval is divided in | + | the shrink_halt_poll_ns() | + | function. | +-------------------------------------------------------------------------------- + +These module parameters can be set from the debugfs files in: + + /sys/module/kvm/parameters/ + +Note: that these module parameters are system wide values and are not able to + be tuned on a per vm basis. + +Further Notes +============= + +- Care should be taken when setting the halt_poll_ns module parameter as a +large value has the potential to drive the cpu usage to 100% on a machine which +would be almost entirely idle otherwise. This is because even if a guest has +wakeups during which very little work is done and which are quite far apart, if +the period is shorter than the global max polling interval (halt_poll_ns) then +the host will always poll for the entire block time and thus cpu utilisation +will go to 100%. + +- Halt polling essentially presents a trade off between power usage and latency +and the module parameters should be used to tune the affinity for this. Idle +cpu time is essentially converted to host kernel time with the aim of decreasing +latency when entering the guest. + +- Halt polling will only be conducted by the host when no other tasks are +runnable on that cpu, otherwise the polling will cease immediately and +schedule will be invoked to allow that other task to run. Thus this doesn't +allow a guest to denial of service the cpu. |