Commit Graph

358 Commits (656dff55da19eb889f2b1df5a5f2aa1d28f024fd)

Author SHA1 Message Date
Sean Christopherson af5366bea2 KVM: x86: Drop the now unused KVM_X86_DISABLE_VALID_EXITS
Drop the KVM_X86_DISABLE_VALID_EXITS definition, as it is misleading, and
unused in KVM *because* it is misleading.  The set of exits that can be
disabled is dynamic, i.e. userspace (and KVM) must check KVM's actual
capabilities.

Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20241128013424.4096668-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18 14:19:37 -08:00
Sean Christopherson 915d2f0718 KVM: Move KVM_REG_SIZE() definition to common uAPI header
Define KVM_REG_SIZE() in the common kvm.h header, and delete the arm64 and
RISC-V versions.  As evidenced by the surrounding definitions, all aspects
of the register size encoding are generic, i.e. RISC-V should have moved
arm64's definition to common code instead of copy+pasting.

Acked-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20241128005547.4077116-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-17 08:49:48 -08:00
Xianglai Li e785dfacf7 LoongArch: KVM: Add PCHPIC device support
Add device model for PCHPIC interrupt controller, implemente basic
create & destroy interface, and register device model to kvm device
table.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li 2e8b9df826 LoongArch: KVM: Add EIOINTC device support
Add device model for EIOINTC interrupt controller, implement basic
create & destroy interfaces, and register device model to kvm device
table.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Xianglai Li c532de5a67 LoongArch: KVM: Add IPI device support
Add device model for IPI interrupt controller, implement basic create &
destroy interfaces, and register device model to kvm device table.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:27 +08:00
Paolo Bonzini 5dcc1e7614 KVM x86 misc changes for 6.11
- Add a global struct to consolidate tracking of host values, e.g. EFER, and
    move "shadow_phys_bits" into the structure as "maxphyaddr".
 
  - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
    bus frequency, because TDX.
 
  - Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
 
  - Clean up KVM's handling of vendor specific emulation to consistently act on
    "compatible with Intel/AMD", versus checking for a specific vendor.
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmaRub0ACgkQOlYIJqCj
 N/2LMxAArGzhcWZ6Qdo2aMRaMIPtSBJHmbEgEuHvHMumgsTZQzDcn9cxDi/hNSrc
 l8ODOwAM2qNcq95YfwjU7F0ae3E+HRzGvKcBnmZWuQeCDp2HhVEoCphFu1sHst+t
 XEJTL02b6OgyJUEU3h40mYk12eiq2S4FCnFYXPCqijwwuL6Y5KQvvTqek3c2/SDn
 c+VneutYGax/S0GiiCkYh4wrwWh9g7qm0IX70ycBwJbW5qBFKgyglvHxvL8JLJC9
 Nkkw/p2657wcOdraH+fOBuRy2dMwE5fv++1tOjWwB5WAAhSOJPZh0BGYvgA2yfN7
 OE+k7APKUQd9Xxtud8H3LrTPoyMA4hz2sdDFyqrrWK9yjpBY7zXNyN50Fxi7VVsm
 T8nTIiKAGyRbjotY+m7krXQPXjfZYhVqrJ/jtxESOZLZ93q2gSWU2p/ZXpUPVHnH
 +YOBAI1owP3wepaYlrthtI4LQx9lF422dnmeSflztfKFGabRbQZxg3uHMCCxIaGc
 lJ6CD546+D45f/uBXRDMqk//qFTqXhKUbDk9sutmU/C2oWufMwW0R8kOyItGPyvk
 9PP1vd8vSsIHj+tpwg+i04jBqYDaAcPBOcTZaHm9SYYP+1e11Uu5Vjep37JL1bkA
 xJWxnDZOCGcfKQi2jkh51HJ/dOAHXY1GQKMfyAoPQOSonYHvGVY=
 =Cf2R
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.11

 - Add a global struct to consolidate tracking of host values, e.g. EFER, and
   move "shadow_phys_bits" into the structure as "maxphyaddr".

 - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
   bus frequency, because TDX.

 - Print the name of the APICv/AVIC inhibits in the relevant tracepoint.

 - Clean up KVM's handling of vendor specific emulation to consistently act on
   "compatible with Intel/AMD", versus checking for a specific vendor.

 - Misc cleanups
2024-07-16 09:53:05 -04:00
Paolo Bonzini 86014c1e20 KVM generic changes for 6.11
- Enable halt poll shrinking by default, as Intel found it to be a clear win.
 
  - Setup empty IRQ routing when creating a VM to avoid having to synchronize
    SRCU when creating a split IRQCHIP on x86.
 
  - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
    that arch code can use for hooking both sched_in() and sched_out().
 
  - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
    truncating a bogus value from userspace, e.g. to help userspace detect bugs.
 
  - Mark a vCPU as preempted if and only if it's scheduled out while in the
    KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
    memory when retrieving guest state during live migration blackout.
 
  - A few minor cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmaRuOYACgkQOlYIJqCj
 N/1UnQ/8CI5Qfr+/0gzYgtWmtEMczGG+rMNpzD3XVqPjJjXcMcBiQnplnzUVLhha
 vlPdYVK7vgmEt003XGzV55mik46LHL+DX/v4hI3HEdblfyCeNLW3fKEWVRB44qJe
 o+YUQwSK42SORUp9oXuQINxhA//U9EnI7CQxlJ8w8wenv5IJKfIGr01DefmfGPAV
 PKm9t6WLcNqvhZMEyy/zmzM3KVPCJL0NcwI97x6sHxFpQYIDtL0E/VexA4AFqMoT
 QK7cSDC/2US41Zvem/r/GzM/ucdF6vb9suzZYBohwhxtVhwJe2CDeYQZvtNKJ1U7
 GOHPaKL6nBWdZCm/yyWbbX2nstY1lHqxhN3JD0X8wqU5rNcwm2b8Vfyav0Ehc7H+
 jVbDTshOx4YJmIgajoKjgM050rdBK59TdfVL+l+AAV5q/TlHocalYtvkEBdGmIDg
 2td9UHSime6sp20vQfczUEz4bgrQsh4l2Fa/qU2jFwLievnBw0AvEaMximkSGMJe
 b8XfjmdTjlOesWAejANKtQolfrq14+1wYw0zZZ8PA+uNVpKdoovmcqSOcaDC9bT8
 GO/NFUvoG+lkcvJcIlo1SSl81SmGLosijwxWfGvFAqsgpR3/3l3dYp0QtztoCNJO
 d3+HnjgYn5o5FwufuTD3eUOXH4AFjG108DH0o25XrIkb2Kymy0o=
 =BalU
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEAD

KVM generic changes for 6.11

 - Enable halt poll shrinking by default, as Intel found it to be a clear win.

 - Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.

 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().

 - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.

 - Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.

 - A few minor cleanups
2024-07-16 09:51:36 -04:00
Isaku Yamahata bc1a5cd002 KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
memory range and calls the arch-specific function.  The implementation is
optional and enabled by a Kconfig symbol.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-12 11:17:35 -04:00
David Matlack 4b23e0c199 KVM: Ensure new code that references immediate_exit gets extra scrutiny
Ensure that any new KVM code that references immediate_exit gets extra
scrutiny by renaming it to immediate_exit__unsafe in kernel code.

All fields in struct kvm_run are subject to TOCTOU races since they are
mapped into userspace, which may be malicious or buggy. To protect KVM,
introduces a new macro that appends __unsafe to select field names in
struct kvm_run, hinting to developers and reviewers that accessing such
fields must be done carefully.

Apply the new macro to immediate_exit, since userspace can make
immediate_exit inconsistent with vcpu->wants_to_run, i.e. accessing
immediate_exit directly could lead to unexpected bugs in the future.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20240503181734.1467938-3-dmatlack@google.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-18 09:20:53 -07:00
Thomas Prescher 85542adb65 KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flag
When a vCPU is interrupted by a signal while running a nested guest,
KVM will exit to userspace with L2 state. However, userspace has no
way to know whether it sees L1 or L2 state (besides calling
KVM_GET_STATS_FD, which does not have a stable ABI).

This causes multiple problems:

The simplest one is L2 state corruption when userspace marks the sregs
as dirty. See this mailing list thread [1] for a complete discussion.

Another problem is that if userspace decides to continue by emulating
instructions, it will unknowingly emulate with L2 state as if L1
doesn't exist, which can be considered a weird guest escape.

Introduce a new flag, KVM_RUN_X86_GUEST_MODE, in the kvm_run data
structure, which is set when the vCPU exited while running a nested
guest.  Also introduce a new capability, KVM_CAP_X86_GUEST_MODE, to
advertise the functionality to userspace.

[1] https://lore.kernel.org/kvm/20240416123558.212040-1-julian.stecklina@cyberus-technology.de/T/#m280aadcb2e10ae02c191a7dc4ed4b711a74b1f55

Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
Link: https://lore.kernel.org/r/20240508132502.184428-1-julian.stecklina@cyberus-technology.de
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-11 09:24:31 -07:00
Isaku Yamahata 6fef518594 KVM: x86: Add a capability to configure bus frequency for APIC timer
Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
bus clock frequency for APIC timer emulation.
Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
frequency in nanoseconds. When using this capability, the user space
VMM should configure CPUID leaf 0x15 to advertise the frequency.

Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
frequency but ends up getting interrupts at a significantly higher rate.

The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.

In addition, per Intel SDM:
    "The APIC timer frequency will be the processor’s bus clock or core
     crystal clock frequency (when TSC/core crystal clock ratio is
     enumerated in CPUID leaf 0x15) divided by the value specified in
     the divide configuration register."

The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.

The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
enumerated, the guest kernel uses it as the APIC bus frequency. If not,
the guest kernel measures the frequency based on other known timers like
the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
kernel expects a 25MHz timer frequency but gets timer interrupt more
frequently due to the 1GHz frequency used by KVM.

To ensure that the guest doesn't have a conflicting view of the APIC bus
frequency, allow the userspace to tell KVM to use the same frequency that
TDX mandates instead of the default 1Ghz.

Reported-by: Vishal Annapurve <vannapurve@google.com>
Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/6748a4c12269e756f0c48680da8ccc5367c31ce7.1714081726.git.reinette.chatre@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-05 06:18:27 -07:00
Joel Stanley 651d61bc8b KVM: PPC: Fix documentation for ppc mmu caps
The documentation mentions KVM_CAP_PPC_RADIX_MMU, but the defines in the
kvm headers spell it KVM_CAP_PPC_MMU_RADIX. Similarly with
KVM_CAP_PPC_MMU_HASH_V3.

Fixes: c927013227 ("KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230411061446.26324-1-joel@jms.id.au
2024-05-08 01:28:00 +10:00
Paolo Bonzini db7d6fbc10 KVM: remove unnecessary #ifdef
KVM_CAP_IRQ_ROUTING is always defined, so there is no need to check if it is.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:06 -05:00
Paolo Bonzini 6bda055d62 KVM: define __KVM_HAVE_GUEST_DEBUG unconditionally
Since all architectures (for historical reasons) have to define
struct kvm_guest_debug_arch, and since userspace has to check
KVM_CHECK_EXTENSION(KVM_CAP_SET_GUEST_DEBUG) anyway, there is
no advantage in masking the capability #define itself.  Remove
the #define __KVM_HAVE_GUEST_DEBUG from architecture-specific
headers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:06 -05:00
Paolo Bonzini 5d9cb71642 KVM: arm64: move ARM-specific defines to uapi/asm/kvm.h
While this in principle breaks userspace code that mentions KVM_ARM_DEV_*
on architectures other than aarch64, this seems unlikely to be
a problem considering that run->s.regs.device_irq_level is only
defined on that architecture.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:05 -05:00
Paolo Bonzini 71cd774ad2 KVM: s390: move s390-specific structs to uapi/asm/kvm.h
While this in principle breaks the appearance of KVM_S390_* ioctls on architectures
other than s390, this seems unlikely to be a problem considering that there are
already many "struct kvm_s390_*" definitions in arch/s390/include/uapi.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:05 -05:00
Paolo Bonzini d750951c9e KVM: powerpc: move powerpc-specific structs to uapi/asm/kvm.h
While this in principle breaks the appearance of KVM_PPC_* ioctls on architectures
other than powerpc, this seems unlikely to be a problem considering that there are
already many "struct kvm_ppc_*" definitions in arch/powerpc/include/uapi.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:05 -05:00
Paolo Bonzini bcac047727 KVM: x86: move x86-specific structs to uapi/asm/kvm.h
Several capabilities that exist only on x86 nevertheless have their
structs defined in include/uapi/linux/kvm.h.  Move them to
arch/x86/include/uapi/asm/kvm.h for cleanliness.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:04 -05:00
Paolo Bonzini c0a411904e KVM: remove more traces of device assignment UAPI
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:04 -05:00
Paolo Bonzini 3115d2de39 KVM Xen change for 6.8:
To workaround Xen guests that don't expect Xen PV clocks to be marked as being
 based on a stable TSC, add a Xen config knob to allow userspace to opt out of
 KVM setting the "TSC stable" bit in Xen PV clocks.  Note, the "TSC stable" bit
 was added to the PVCLOCK ABI by KVM without an ack from Xen, i.e. KVM isn't
 entirely blameless for the buggy guest behavior.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmWXASsSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5R54P/iQPQBs4dJmNkPiA6uSq1O5/8hN4P59z
 aapJNgDiny/D9/zPbOxGWR31W7lvCgiES/lp3KcHZmwbeAwJpdT6a0cJWGRlGuov
 gccK8AoYcnwSU98sPisnFv7dJ66ogJfXVkPKKaWo+zVW53XUq2XpIie4eWaOweBt
 QsXpTGYpGajv1Bf/MgRtNtlkVAo1w8XL1L0NWRugzCk2CAYezz8IT1874GNZoJbd
 GJfVP+76FdNw+4/CxiaBwxP0gHfBIiAsJzGqbmMPhGG2xJn+KGs5FTEf37Pta8cl
 aMHAq6/JAoabJfP39MexVkopMaFlPbDwIWfkLWf6wSP86KHei+t9kLC0E4/R2NJ+
 GKlrBB6Gj+gzFR4fZ75hIwS/4REMt6zVCbS7uSRrCduqrlEFcY5ED2NesoL9wZrB
 WMDIxIGIVDdRxc9WLypKmBj7KTgL0qXBxnsAcPiDRf1sk6SGajkesWxA1C1Nzo/H
 yNfqq0gjdPZVB2RIGN6DpWQFu3d+ZQnG2ToKIBW7OkvJ5USYiDSo4VozhESgYHRZ
 UJDhJ73QYESynClP6ST+9cxNof3FXCEPDeKr5NcmjVZxlJcdeUDNRqv0LUxQ56BI
 FvHMHtSs4WLYHZZVzsdh+Yhnc9rEGfoL0NwDPBCcOXjuNMvNQmuzSldc/VDGm/qt
 sCtxYMms5n7u
 =3v8F
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-xen-6.8' of https://github.com/kvm-x86/linux into HEAD

KVM Xen change for 6.8:

To workaround Xen guests that don't expect Xen PV clocks to be marked as being
based on a stable TSC, add a Xen config knob to allow userspace to opt out of
KVM setting the "TSC stable" bit in Xen PV clocks.  Note, the "TSC stable" bit
was added to the PVCLOCK ABI by KVM without an ack from Xen, i.e. KVM isn't
entirely blameless for the buggy guest behavior.
2024-01-08 08:10:20 -05:00
Paolo Bonzini a5d3df8ae1 KVM: remove deprecated UAPIs
The deprecated interfaces were removed 15 years ago.  KVM's
device assignment was deprecated in 4.2 and removed 6.5 years
ago; the only interest might be in compiling ancient versions
of QEMU, but QEMU has been using its own imported copy of the
kernel headers since June 2011.  So again we go into archaeology
territory; just remove the cruft.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-08 15:43:33 -05:00
Paul Durrant 6d72283526 KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT
Unless explicitly told to do so (by passing 'clocksource=tsc' and
'tsc=stable:socket', and then jumping through some hoops concerning
potential CPU hotplug) Xen will never use TSC as its clocksource.
Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
in either the primary or secondary pvclock memory areas. This has
led to bugs in some guest kernels which only become evident if
PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
such guests, give the VMM a new Xen HVM config flag to tell KVM to
forcibly clear the bit in the Xen pvclocks.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20231102162128.2353459-1-paul@xen.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-07 15:52:57 -08:00
Sean Christopherson 89ea60c2c7 KVM: x86: Add support for "protected VMs" that can utilize private memory
Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development
and testing vehicle for Confidential (CoCo) VMs, and potentially to even
become a "real" product in the distant future, e.g. a la pKVM.

The private memory support in KVM x86 is aimed at AMD's SEV-SNP and
Intel's TDX, but those technologies are extremely complex (understatement),
difficult to debug, don't support running as nested guests, and require
hardware that's isn't universally accessible.  I.e. relying SEV-SNP or TDX
for maintaining guest private memory isn't a realistic option.

At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of
selftests for guest_memfd and private memory support without requiring
unique hardware.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231027182217.3615211-24-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:05 -05:00
Chao Peng 8dd2eee9d5 KVM: x86/mmu: Handle page fault for private memory
Add support for resolving page faults on guest private memory for VMs
that differentiate between "shared" and "private" memory.  For such VMs,
KVM_MEM_GUEST_MEMFD memslots can include both fd-based private memory and
hva-based shared memory, and KVM needs to map in the "correct" variant,
i.e. KVM needs to map the gfn shared/private as appropriate based on the
current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag.

For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request
shared vs. private via a bit in the guest page tables, i.e. what the guest
wants may conflict with the current memory attributes.  To support such
"implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT
to forward the request to userspace.  Add a new flag for memory faults,
KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to
map memory as shared vs. private.

Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory
so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace
needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to
exit on missing mappings when handling guest page fault VM-Exits.  In
that case, userspace will want to know RWX information in order to
correctly/precisely resolve the fault.

Note, private memory *must* be backed by guest_memfd, i.e. shared mappings
always come from the host userspace page tables, and private mappings
always come from a guest_memfd instance.

Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-21-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:04 -05:00
Sean Christopherson a7800aa80e KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory
Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based
memory that is tied to a specific KVM virtual machine and whose primary
purpose is to serve guest memory.

A guest-first memory subsystem allows for optimizations and enhancements
that are kludgy or outright infeasible to implement/support in a generic
memory subsystem.  With guest_memfd, guest protections and mapping sizes
are fully decoupled from host userspace mappings.   E.g. KVM currently
doesn't support mapping memory as writable in the guest without it also
being writable in host userspace, as KVM's ABI uses VMA protections to
define the allow guest protection.  Userspace can fudge this by
establishing two mappings, a writable mapping for the guest and readable
one for itself, but that’s suboptimal on multiple fronts.

Similarly, KVM currently requires the guest mapping size to be a strict
subset of the host userspace mapping size, e.g. KVM doesn’t support
creating a 1GiB guest mapping unless userspace also has a 1GiB guest
mapping.  Decoupling the mappings sizes would allow userspace to precisely
map only what is needed without impacting guest performance, e.g. to
harden against unintentional accesses to guest memory.

Decoupling guest and userspace mappings may also allow for a cleaner
alternative to high-granularity mappings for HugeTLB, which has reached a
bit of an impasse and is unlikely to ever be merged.

A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to mmap() guest memory).

More immediately, being able to map memory into KVM guests without mapping
said memory into the host is critical for Confidential VMs (CoCo VMs), the
initial use case for guest_memfd.  While AMD's SEV and Intel's TDX prevent
untrusted software from reading guest private data by encrypting guest
memory with a key that isn't usable by the untrusted host, projects such
as Protected KVM (pKVM) provide confidentiality and integrity *without*
relying on memory encryption.  And with SEV-SNP and TDX, accessing guest
private memory can be fatal to the host, i.e. KVM must be prevent host
userspace from accessing guest memory irrespective of hardware behavior.

Attempt #1 to support CoCo VMs was to add a VMA flag to mark memory as
being mappable only by KVM (or a similarly enlightened kernel subsystem).
That approach was abandoned largely due to it needing to play games with
PROT_NONE to prevent userspace from accessing guest memory.

Attempt #2 to was to usurp PG_hwpoison to prevent the host from mapping
guest private memory into userspace, but that approach failed to meet
several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel
wouldn't easily be able to enforce a 1:1 page:guest association, let alone
a 1:1 pfn:gfn mapping.  And using PG_hwpoison does not work for memory
that isn't backed by 'struct page', e.g. if devices gain support for
exposing encrypted memory regions to guests.

Attempt #3 was to extend the memfd() syscall and wrap shmem to provide
dedicated file-based guest memory.  That approach made it as far as v10
before feedback from Hugh Dickins and Christian Brauner (and others) led
to it demise.

Hugh's objection was that piggybacking shmem made no sense for KVM's use
case as KVM didn't actually *want* the features provided by shmem.  I.e.
KVM was using memfd() and shmem to avoid having to manage memory directly,
not because memfd() and shmem were the optimal solution, e.g. things like
read/write/mmap in shmem were dead weight.

Christian pointed out flaws with implementing a partial overlay (wrapping
only _some_ of shmem), e.g. poking at inode_operations or super_operations
would show shmem stuff, but address_space_operations and file_operations
would show KVM's overlay.  Paraphrashing heavily, Christian suggested KVM
stop being lazy and create a proper API.

Link: https://lore.kernel.org/all/20201020061859.18385-1-kirill.shutemov@linux.intel.com
Link: https://lore.kernel.org/all/20210416154106.23721-1-kirill.shutemov@linux.intel.com
Link: https://lore.kernel.org/all/20210824005248.200037-1-seanjc@google.com
Link: https://lore.kernel.org/all/20211111141352.26311-1-chao.p.peng@linux.intel.com
Link: https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.peng@linux.intel.com
Link: https://lore.kernel.org/all/ff5c5b97-acdf-9745-ebe5-c6609dd6322e@google.com
Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner
Link: https://lore.kernel.org/all/ZEM5Zq8oo+xnApW9@google.com
Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey
Link: https://lore.kernel.org/linux-mm/ZII1p8ZHlHaQ3dDl@casper.infradead.org
Cc: Fuad Tabba <tabba@google.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Maciej Szmigiero <mail@maciej.szmigiero.name>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Wang <wei.w.wang@intel.com>
Cc: Liam Merwick <liam.merwick@oracle.com>
Cc: Isaku Yamahata <isaku.yamahata@gmail.com>
Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-17-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:03 -05:00
Chao Peng 5a475554db KVM: Introduce per-page memory attributes
In confidential computing usages, whether a page is private or shared is
necessary information for KVM to perform operations like page fault
handling, page zapping etc. There are other potential use cases for
per-page memory attributes, e.g. to make memory read-only (or no-exec,
or exec-only, etc.) without having to modify memslots.

Introduce the KVM_SET_MEMORY_ATTRIBUTES ioctl, advertised by
KVM_CAP_MEMORY_ATTRIBUTES, to allow userspace to set the per-page memory
attributes to a guest memory range.

Use an xarray to store the per-page attributes internally, with a naive,
not fully optimized implementation, i.e. prioritize correctness over
performance for the initial implementation.

Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX
attributes/protections in the future, e.g. to give userspace fine-grained
control over read, write, and execute protections for guest memory.

Provide arch hooks for handling attribute changes before and after common
code sets the new attributes, e.g. x86 will use the "pre" hook to zap all
relevant mappings, and the "post" hook to track whether or not hugepages
can be used to map the range.

To simplify the implementation wrap the entire sequence with
kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly
guaranteed to be an invalidation.  For the initial use case, x86 *will*
always invalidate memory, and preventing arch code from creating new
mappings while the attributes are in flux makes it much easier to reason
about the correctness of consuming attributes.

It's possible that future usages may not require an invalidation, e.g.
if KVM ends up supporting RWX protections and userspace grants _more_
protections, but again opt for simplicity and punt optimizations to
if/when they are needed.

Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com
Cc: Fuad Tabba <tabba@google.com>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:31:38 -05:00
Chao Peng 16f95f3b95 KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace
Add a new KVM exit type to allow userspace to handle memory faults that
KVM cannot resolve, but that userspace *may* be able to handle (without
terminating the guest).

KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit
conversions between private and shared memory.  With guest private memory,
there will be two kind of memory conversions:

  - explicit conversion: happens when the guest explicitly calls into KVM
    to map a range (as private or shared)

  - implicit conversion: happens when the guest attempts to access a gfn
    that is configured in the "wrong" state (private vs. shared)

On x86 (first architecture to support guest private memory), explicit
conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE,
but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable
as there is (obviously) no hypercall, and there is no guarantee that the
guest actually intends to convert between private and shared, i.e. what
KVM thinks is an implicit conversion "request" could actually be the
result of a guest code bug.

KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to
be implicit conversions.

Note!  To allow for future possibilities where KVM reports
KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved
fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's
perspective), not '0'!  Due to historical baggage within KVM, exiting to
userspace with '0' from deep callstacks, e.g. in emulation paths, is
infeasible as doing so would require a near-complete overhaul of KVM,
whereas KVM already propagates -errno return codes to userspace even when
the -errno originated in a low level helper.

Report the gpa+size instead of a single gfn even though the initial usage
is expected to always report single pages.  It's entirely possible, likely
even, that KVM will someday support sub-page granularity faults, e.g.
Intel's sub-page protection feature allows for additional protections at
128-byte granularity.

Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com
Link: https://lore.kernel.org/all/ZQ3AmLO2SYv3DszH@google.com
Cc: Anish Moorthy <amoorthy@google.com>
Cc: David Matlack <dmatlack@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231027182217.3615211-10-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:31:11 -05:00
Sean Christopherson bb58b90b1a KVM: Introduce KVM_SET_USER_MEMORY_REGION2
Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional
information can be supplied without setting userspace up to fail.  The
padding in the new kvm_userspace_memory_region2 structure will be used to
pass a file descriptor in addition to the userspace_addr, i.e. allow
userspace to point at a file descriptor and map memory into a guest that
is NOT mapped into host userspace.

Alternatively, KVM could simply add "struct kvm_userspace_memory_region2"
without a new ioctl(), but as Paolo pointed out, adding a new ioctl()
makes detection of bad flags a bit more robust, e.g. if the new fd field
is guarded only by a flag and not a new ioctl(), then a userspace bug
(setting a "bad" flag) would generate out-of-bounds access instead of an
-EINVAL error.

Cc: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-9-seanjc@google.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:30:41 -05:00
Paolo Bonzini 45b890f768 KVM/arm64 updates for 6.7
- Generalized infrastructure for 'writable' ID registers, effectively
    allowing userspace to opt-out of certain vCPU features for its guest
 
  - Optimization for vSGI injection, opportunistically compressing MPIDR
    to vCPU mapping into a table
 
  - Improvements to KVM's PMU emulation, allowing userspace to select
    the number of PMCs available to a VM
 
  - Guest support for memory operation instructions (FEAT_MOPS)
 
  - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
    bugs and getting rid of useless code
 
  - Changes to the way the SMCCC filter is constructed, avoiding wasted
    memory allocations when not in use
 
  - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
    the overhead of errata mitigations
 
  - Miscellaneous kernel and selftest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZUFJRgAKCRCivnWIJHzd
 FtgYAP9cMsc1Mhlw3jNQnTc6q0cbTulD/SoEDPUat1dXMqjs+gEAnskwQTrTX834
 fgGQeCAyp7Gmar+KeP64H0xm8kPSpAw=
 =R4M7
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.7

 - Generalized infrastructure for 'writable' ID registers, effectively
   allowing userspace to opt-out of certain vCPU features for its guest

 - Optimization for vSGI injection, opportunistically compressing MPIDR
   to vCPU mapping into a table

 - Improvements to KVM's PMU emulation, allowing userspace to select
   the number of PMCs available to a VM

 - Guest support for memory operation instructions (FEAT_MOPS)

 - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
   bugs and getting rid of useless code

 - Changes to the way the SMCCC filter is constructed, avoiding wasted
   memory allocations when not in use

 - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
   the overhead of errata mitigations

 - Miscellaneous kernel and selftest fixes
2023-10-31 16:37:07 -04:00
Jing Zhang 3f9cd0ca84 KVM: arm64: Allow userspace to get the writable masks for feature ID registers
While the Feature ID range is well defined and pretty large, it isn't
inconceivable that the architecture will eventually grow some other
ranges that will need to similarly be described to userspace.

Add a VM ioctl to allow userspace to get writable masks for feature ID
registers in below system register space:
op0 = 3, op1 = {0, 1, 3}, CRn = 0, CRm = {0 - 7}, op2 = {0 - 7}
This is used to support mix-and-match userspace and kernels for writable
ID registers, where userspace may want to know upfront whether it can
actually tweak the contents of an idreg or not.

Add a new capability (KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES) that
returns a bitmap of the valid ranges, which can subsequently be
retrieved, one at a time by setting the index of the set bit as the
range identifier.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04 17:09:39 +00:00
Tianrui Zhao b37e6b680e LoongArch: KVM: Add kvm related header files
Add LoongArch KVM related header files, including kvm.h, kvm_host.h and
kvm_types.h. All of those are about LoongArch virtualization features
and kvm interfaces.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-10-02 10:01:20 +08:00
Yi Liu dcc31ea60b kvm/vfio: Accept vfio device file from userspace
This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-6-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-25 10:18:42 -06:00
Paolo Bonzini d74669ebae Common KVM changes for 6.5:
- Fix unprotected vcpu->pid dereference via debugfs
 
  - Fix KVM_BUG() and KVM_BUG_ON() macros with 64-bit conditionals
 
  - Refactor failure path in kvm_io_bus_unregister_dev() to simplify the code
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaFkUSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5BYQP/j3BOjShMl1q7jOf+eaE6ro48ffHmPxk
 ujMLS1bMXY1PT937gG1uHokNp4GaotKJXILVwSFUYcrkR8LKtfou31/X9OPhUIMT
 nf0BBRS2KpvZSHfeqrYc0eUMgZKSkAjmtwU5Hob+EdisUZ0yT9kO3Sr4fvSqDd65
 2fvcCcOdKccKfmuUQQ6a3zJBV7FDJeGAZ3GSF7eb6MzpPyRMbMu+w4K0i6AKc4uA
 F2U3a4DB8uH++JVwZVfBla0Rz8wvarGiIE5FRonistDTJYLzhY+VypiBHFc+mcyp
 KqX3TxXClndlZolqOyvFFkiIcNBFOfPJ0O5gk4whFRR3dS70Y9Ji1/Gzm9S8w52h
 3Q16wLbUzyFONxznB2THTU6W+9ZKKRAPVP5R6/xkqZgnr+qWxfMQD7cNGgHuK3Nv
 QyNF2K1/FUrxdIkPzKa/UWDRX8rwjTDdASxLA9Zl8uP7qmfJGSWIRfjljgJYuAkr
 D37lDNzeoxkmuMeAQ2Rcpn1ghYOG6tm5OdZdkzjQdBjlugOTQOsZ05c/Ab2mSThy
 h7K6okw5a4gjsfuxCepnbmbUJ3IDedbRYzKtzwhoYuLZ/qf2F5NqbXMXpiyvHYTJ
 fAZpkKkF5adgagz2kU57nqzb1W0uWZdASQCXXIdEN8QdyQe/FGqgxMWKEOaywB32
 C/fdKRFQoCyL
 =EGXd
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-6.5' of https://github.com/kvm-x86/linux into HEAD

Common KVM changes for 6.5:

 - Fix unprotected vcpu->pid dereference via debugfs

 - Fix KVM_BUG() and KVM_BUG_ON() macros with 64-bit conditionals

 - Refactor failure path in kvm_io_bus_unregister_dev() to simplify the code

 - Misc cleanups
2023-07-01 07:07:55 -04:00
Paolo Bonzini cc744042d9 KVM/arm64 updates for 6.5
- Eager page splitting optimization for dirty logging, optionally
    allowing for a VM to avoid the cost of block splitting in the stage-2
    fault path.
 
  - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
    services that live in the Secure world. pKVM intervenes on FF-A calls
    to guarantee the host doesn't misuse memory donated to the hyp or a
    pKVM guest.
 
  - Support for running the split hypervisor with VHE enabled, known as
    'hVHE' mode. This is extremely useful for testing the split
    hypervisor on VHE-only systems, and paves the way for new use cases
    that depend on having two TTBRs available at EL2.
 
  - Generalized framework for configurable ID registers from userspace.
    KVM/arm64 currently prevents arbitrary CPU feature set configuration
    from userspace, but the intent is to relax this limitation and allow
    userspace to select a feature set consistent with the CPU.
 
  - Enable the use of Branch Target Identification (FEAT_BTI) in the
    hypervisor.
 
  - Use a separate set of pointer authentication keys for the hypervisor
    when running in protected mode, as the host is untrusted at runtime.
 
  - Ensure timer IRQs are consistently released in the init failure
    paths.
 
  - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
    (FEAT_EVT), as it is a register commonly read from userspace.
 
  - Erratum workaround for the upcoming AmpereOne part, which has broken
    hardware A/D state management.
 
 As a consequence of the hVHE series reworking the arm64 software
 features framework, the for-next/module-alloc branch from the arm64 tree
 comes along for the ride.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZJWrhwAKCRCivnWIJHzd
 Fs07AP9xliv5yIoQtRgPZXc0QDPyUm7zg8f5KDgqCVJtyHXcvAEAmmerBr39nbPc
 XoMXALKx6NGt836P0C+bhvRcHdFPGwE=
 =c/Xh
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.5

 - Eager page splitting optimization for dirty logging, optionally
   allowing for a VM to avoid the cost of block splitting in the stage-2
   fault path.

 - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
   services that live in the Secure world. pKVM intervenes on FF-A calls
   to guarantee the host doesn't misuse memory donated to the hyp or a
   pKVM guest.

 - Support for running the split hypervisor with VHE enabled, known as
   'hVHE' mode. This is extremely useful for testing the split
   hypervisor on VHE-only systems, and paves the way for new use cases
   that depend on having two TTBRs available at EL2.

 - Generalized framework for configurable ID registers from userspace.
   KVM/arm64 currently prevents arbitrary CPU feature set configuration
   from userspace, but the intent is to relax this limitation and allow
   userspace to select a feature set consistent with the CPU.

 - Enable the use of Branch Target Identification (FEAT_BTI) in the
   hypervisor.

 - Use a separate set of pointer authentication keys for the hypervisor
   when running in protected mode, as the host is untrusted at runtime.

 - Ensure timer IRQs are consistently released in the init failure
   paths.

 - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
   (FEAT_EVT), as it is a register commonly read from userspace.

 - Erratum workaround for the upcoming AmpereOne part, which has broken
   hardware A/D state management.

As a consequence of the hVHE series reworking the arm64 software
features framework, the for-next/module-alloc branch from the arm64 tree
comes along for the ride.
2023-07-01 07:04:29 -04:00
Anup Patel 89d01306e3 RISC-V: KVM: Implement device interface for AIA irqchip
We implement KVM device interface for in-kernel AIA irqchip so that
user-space can use KVM device ioctls to create, configure, and destroy
in-kernel AIA irqchip.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-06-18 21:24:43 +05:30
Binbin Wu 22725266bd KVM: Fix comment for KVM_ENABLE_CAP
Fix comment for vcpu ioctl version of KVM_ENABLE_CAP.

KVM provides ioctl KVM_ENABLE_CAP to allow userspace to enable an
extension which is not enabled by default. For vcpu ioctl version,
it is available with the capability KVM_CAP_ENABLE_CAP. For vm ioctl
version, it is available with the capability KVM_CAP_ENABLE_CAP_VM.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230518091339.1102-2-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-05 12:38:22 -07:00
Ricardo Koller 2f440b72e8 KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
Add a capability for userspace to specify the eager split chunk size.
The chunk size specifies how many pages to break at a time, using a
single allocation. Bigger the chunk size, more pages need to be
allocated ahead of time.

Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20230426172330.1439644-6-ricarkol@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-05-16 17:39:18 +00:00
Paolo Bonzini 4f382a79a6 KVM/arm64 updates for 6.4
- Numerous fixes for the pathological lock inversion issue that
   plagued KVM/arm64 since... forever.
 
 - New framework allowing SMCCC-compliant hypercalls to be forwarded
   to userspace, hopefully paving the way for some more features
   being moved to VMMs rather than be implemented in the kernel.
 
 - Large rework of the timer code to allow a VM-wide offset to be
   applied to both virtual and physical counters as well as a
   per-timer, per-vcpu offset that complements the global one.
   This last part allows the NV timer code to be implemented on
   top.
 
 - A small set of fixes to make sure that we don't change anything
   affecting the EL1&0 translation regime just after having having
   taken an exception to EL2 until we have executed a DSB. This
   ensures that speculative walks started in EL1&0 have completed.
 
 - The usual selftest fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmRCZIwPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDoZ8P/ioXAdDbAE4hTuyD2YdKJ3IGWN3pg52Z7xc2
 rBXXFrbK9+n9FEc3AVdHoGsRPDP0Ynl+apj+aB0Klr/Fl0KKqac+W0ARX9rn1mI1
 HjeygFPaGnXjMUp0BjeSLS+g3b0gebELJ6R1QEe1/MIPb8Se7M1y3ZpMWdhe0PPL
 vyzw3LZq2OAlLgWKZhAfhh03qdr2kqJxypYs6nMrcexfn8dXT78dsYKW1nXmqKcE
 61Gg23MDPUoexYpUhm+ym5t8hltoI1di8faPmxEpaFzpSDyAg8V5vo6LiW9jn3cf
 RX0Sikk1laiRAhVbbIFCKC148vFyKxum3scpKyb91Qc+sK1kmIcxvEqlc6SfG9je
 +5ndZwAfXtW6SMSOyX8y5fXbee7M0sx3n3le9BNgwXfmLWg/GHXJ544dJgVIlf/e
 0Z+8QnP1IUDfARR/b2FlW7A7XLzNHQzO379ekcAdUptbGwlS9CrW6SJ83QR7K6fB
 bh0aSSELKsD7pX8wnNyNACvmz2zL12ITlDKdZWUr8MSxyTjgVy7s0BDsQT3sbrA1
 1sH++RvUWfC2k7tVT3vjZFzUDlPw3bnZmo5YMWRTMbXEdr1V5rDw5F5IXit13KeT
 8bk0hnJgnLmyoX2A17v5dkFMIKD7p13tqDRdfFcn0ru63HIKxgkS3ITkDmsAQELK
 DHT7RBE0
 =Bhta
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.4

- Numerous fixes for the pathological lock inversion issue that
  plagued KVM/arm64 since... forever.

- New framework allowing SMCCC-compliant hypercalls to be forwarded
  to userspace, hopefully paving the way for some more features
  being moved to VMMs rather than be implemented in the kernel.

- Large rework of the timer code to allow a VM-wide offset to be
  applied to both virtual and physical counters as well as a
  per-timer, per-vcpu offset that complements the global one.
  This last part allows the NV timer code to be implemented on
  top.

- A small set of fixes to make sure that we don't change anything
  affecting the EL1&0 translation regime just after having having
  taken an exception to EL2 until we have executed a DSB. This
  ensures that speculative walks started in EL1&0 have completed.

- The usual selftest fixes and improvements.
2023-04-26 15:46:52 -04:00
Marc Zyngier 6dcf7316e0 Merge branch kvm-arm64/smccc-filtering into kvmarm-master/next
* kvm-arm64/smccc-filtering:
  : .
  : SMCCC call filtering and forwarding to userspace, courtesy of
  : Oliver Upton. From the cover letter:
  :
  : "The Arm SMCCC is rather prescriptive in regards to the allocation of
  : SMCCC function ID ranges. Many of the hypercall ranges have an
  : associated specification from Arm (FF-A, PSCI, SDEI, etc.) with some
  : room for vendor-specific implementations.
  :
  : The ever-expanding SMCCC surface leaves a lot of work within KVM for
  : providing new features. Furthermore, KVM implements its own
  : vendor-specific ABI, with little room for other implementations (like
  : Hyper-V, for example). Rather than cramming it all into the kernel we
  : should provide a way for userspace to handle hypercalls."
  : .
  KVM: selftests: Fix spelling mistake "KVM_HYPERCAL_EXIT_SMC" -> "KVM_HYPERCALL_EXIT_SMC"
  KVM: arm64: Test that SMC64 arch calls are reserved
  KVM: arm64: Prevent userspace from handling SMC64 arch range
  KVM: arm64: Expose SMC/HVC width to userspace
  KVM: selftests: Add test for SMCCC filter
  KVM: selftests: Add a helper for SMCCC calls with SMC instruction
  KVM: arm64: Let errors from SMCCC emulation to reach userspace
  KVM: arm64: Return NOT_SUPPORTED to guest for unknown PSCI version
  KVM: arm64: Introduce support for userspace SMCCC filtering
  KVM: arm64: Add support for KVM_EXIT_HYPERCALL
  KVM: arm64: Use a maple tree to represent the SMCCC filter
  KVM: arm64: Refactor hvc filtering to support different actions
  KVM: arm64: Start handling SMCs from EL1
  KVM: arm64: Rename SMC/HVC call handler to reflect reality
  KVM: arm64: Add vm fd device attribute accessors
  KVM: arm64: Add a helper to check if a VM has ran once
  KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL

Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21 09:44:32 +01:00
Oliver Upton e65733b5c5 KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL
The 'longmode' field is a bit annoying as it blows an entire __u32 to
represent a boolean value. Since other architectures are looking to add
support for KVM_EXIT_HYPERCALL, now is probably a good time to clean it
up.

Redefine the field (and the remaining padding) as a set of flags.
Preserve the existing ABI by using bit 0 to indicate if the guest was in
long mode and requiring that the remaining 31 bits must be zero.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230404154050.2270077-2-oliver.upton@linux.dev
2023-04-05 12:07:41 +01:00
Marc Zyngier 30ec7997d1 KVM: arm64: timers: Allow userspace to set the global counter offset
And this is the moment you have all been waiting for: setting the
counter offset from userspace.

We expose a brand new capability that reports the ability to set
the offset for both the virtual and physical sides.

In keeping with the architecture, the offset is expressed as
a delta that is substracted from the physical counter value.

Once this new API is used, there is no going back, and the counters
cannot be written to to set the offsets implicitly (the writes
are instead ignored).

Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-8-maz@kernel.org
2023-03-30 19:01:10 +01:00
Thomas Huth c5edd753a0 KVM: x86: Remove the KVM_GET_NR_MMU_PAGES ioctl
The KVM_GET_NR_MMU_PAGES ioctl is quite questionable on 64-bit hosts
since it fails to return the full 64 bits of the value that can be
set with the corresponding KVM_SET_NR_MMU_PAGES call. Its "long" return
value is truncated into an "int" in the kvm_arch_vm_ioctl() function.

Since this ioctl also never has been used by userspace applications
(QEMU, Google's internal VMM, kvmtool and CrosVM have been checked),
it's likely the best if we remove this badly designed ioctl before
anybody really tries to use it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208140105.655814-4-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16 10:18:06 -04:00
Paolo Bonzini e4922088f8 * Two more V!=R patches
* The last part of the cmpxchg patches
 * A few fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmPkwH0ACgkQ41TmuOI4
 ufhrshAAmv9OlCNVsGTmQLpEnGdnxGM2vBPDEygdi+oVHtpMBFn27R3fu295aUR0
 v0o3xsSImhaOU03OxWrsLqPanEL5BqnicLwkL4xou3NXXD4Wo0Zrstd3ykfaODhq
 bTDx7zC2zMQ5J+LPuwDaYUat5R0bHv7cULv1CKLdyISnPGafy0kpUPvC30nymJZi
 nV7/DjvDYbuOFfhdTEOklGRXvMSEBPLGhIJk/cYZzJECNeNJFUeSs+00uNJ8P6WO
 BQD/FLWie+Fn6lTGIUhulZCPf65KI4bHHLB6WFXA5Jy+O08urdtLiZwlBC4iNsFV
 NFIwangpJ/RnupJoOMwQfw31op5SZuiOYn91njaGIiLpHgvA9+iaERsqXtjp8NW7
 /ne1TZqtrGbYY71XvZ/yPQU5VGc/MG1CyCGX1CPNSQO7v4yl27BNChxdkBHzzm2u
 C0IuLZuXl25XwAt8xbdi65fb84pJOeWRU4Zoe4cUZ3drBy5cZsmFXe3lhEAqs7nf
 MB9XekTLpZ6pCqTE1u/BOrobVg5es/lDQiDeLCvDe1I3I5inSD6ehjJz7qjK0w8o
 3pn0rb+Kb4Ijzfi4RNbgJXmBNzkwwSSPPwYt4THHOZtr8p0fZMBeGHqq1wTJmKcq
 M/+9w4cZqgFpdyNqitj8NyTayX1Lj4LWayexCBYaGkLuHTD6cCk=
 =HOly
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-6.3-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

* Two more V!=R patches
* The last part of the cmpxchg patches
* A few fixes
2023-02-15 12:35:26 -05:00
Janis Schoetterl-Glausch 3fd49805d1 KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg
User space can use the MEM_OP ioctl to make storage key checked reads
and writes to the guest, however, it has no way of performing atomic,
key checked, accesses to the guest.
Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
op. For now, support this op for absolute accesses only.

This op can be used, for example, to set the device-state-change
indicator and the adapter-local-summary indicator atomically.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20230206164602.138068-13-scgl@linux.ibm.com
Message-Id: <20230206164602.138068-13-scgl@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07 18:06:00 +01:00
Aaron Lewis 14329b825f KVM: x86/pmu: Introduce masked events to the pmu event filter
When building a list of filter events, it can sometimes be a challenge
to fit all the events needed to adequately restrict the guest into the
limited space available in the pmu event filter.  This stems from the
fact that the pmu event filter requires each event (i.e. event select +
unit mask) be listed, when the intention might be to restrict the
event select all together, regardless of it's unit mask.  Instead of
increasing the number of filter events in the pmu event filter, add a
new encoding that is able to do a more generalized match on the unit mask.

Introduce masked events as another encoding the pmu event filter
understands.  Masked events has the fields: mask, match, and exclude.
When filtering based on these events, the mask is applied to the guest's
unit mask to see if it matches the match value (i.e. umask & mask ==
match).  The exclude bit can then be used to exclude events from that
match.  E.g. for a given event select, if it's easier to say which unit
mask values shouldn't be filtered, a masked event can be set up to match
all possible unit mask values, then another masked event can be set up to
match the unit mask values that shouldn't be filtered.

Userspace can query to see if this feature exists by looking for the
capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS.

This feature is enabled by setting the flags field in the pmu event
filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS.

Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY().

It is an error to have a bit set outside the valid bits for a masked
event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in
such cases, including the high bits of the event select (35:32) if
called on Intel.

With these updates the filter matching code has been updated to match on
a common event.  Masked events were flexible enough to handle both event
types, so they were used as the common event.  This changes how guest
events get filtered because regardless of the type of event used in the
uAPI, they will be converted to masked events.  Because of this there
could be a slight performance hit because instead of matching the filter
event with a lookup on event select + unit mask, it does a lookup on event
select then walks the unit masks to find the match.  This shouldn't be a
big problem because I would expect the set of common event selects to be
small, and if they aren't the set can likely be reduced by using masked
events to generalize the unit mask.  Using one type of event when
filtering guest events allows for a common code path to be used.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:06:12 -08:00
David Woodhouse b0305c1e0e KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi
These are (uint64_t)-1 magic values are a userspace ABI, allowing the
shared info pages and other enlightenments to be disabled. This isn't
a Xen ABI because Xen doesn't let the guest turn these off except with
the full SHUTDOWN_soft_reset mechanism. Under KVM, the userspace VMM is
expected to handle soft reset, and tear down the kernel parts of the
enlightenments accordingly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27 06:01:49 -05:00
Paolo Bonzini 9352e7470a Merge remote-tracking branch 'kvm/queue' into HEAD
x86 Xen-for-KVM:

* Allow the Xen runstate information to cross a page boundary

* Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

* add support for 32-bit guests in SCHEDOP_poll

x86 fixes:

* One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

* Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
   years back when eliminating unnecessary barriers when switching between
   vmcs01 and vmcs02.

* Clean up the MSR filter docs.

* Clean up vmread_error_trampoline() to make it more obvious that params
  must be passed on the stack, even for x86-64.

* Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
  of the current guest CPUID.

* Fudge around a race with TSC refinement that results in KVM incorrectly
  thinking a guest needs TSC scaling when running on a CPU with a
  constant TSC, but no hardware-enumerated TSC frequency.

* Advertise (on AMD) that the SMM_CTL MSR is not supported

* Remove unnecessary exports

Selftests:

* Fix an inverted check in the access tracking perf test, and restore
  support for asserting that there aren't too many idle pages when
  running on bare metal.

* Fix an ordering issue in the AMX test introduced by recent conversions
  to use kvm_cpu_has(), and harden the code to guard against similar bugs
  in the future.  Anything that tiggers caching of KVM's supported CPUID,
  kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
  the caching occurs before the test opts in via prctl().

* Fix build errors that occur in certain setups (unsure exactly what is
  unique about the problematic setup) due to glibc overriding
  static_assert() to a variant that requires a custom message.

* Introduce actual atomics for clear/set_bit() in selftests

Documentation:

* Remove deleted ioctls from documentation

* Various fixes
2022-12-12 15:54:07 -05:00
Paolo Bonzini eb5618911a KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
   option to keep the good old dirty log around for pages that are
   dirtied by something other than a vcpu.
 
 - Switch to the relaxed parallel fault handling, using RCU to delay
   page table reclaim and giving better performance under load.
 
 - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
   option, which multi-process VMMs such as crosvm rely on.
 
 - Merge the pKVM shadow vcpu state tracking that allows the hypervisor
   to have its own view of a vcpu, keeping that state private.
 
 - Add support for the PMUv3p5 architecture revision, bringing support
   for 64bit counters on systems that support it, and fix the
   no-quite-compliant CHAIN-ed counter support for the machines that
   actually exist out there.
 
 - Fix a handful of minor issues around 52bit VA/PA support (64kB pages
   only) as a prefix of the oncoming support for 4kB and 16kB pages.
 
 - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
   stage-2 faults and access tracking. You name it, we got it, we
   probably broke it.
 
 - Pick a small set of documentation and spelling fixes, because no
   good merge window would be complete without those.
 
 As a side effect, this tag also drags:
 
 - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
   series
 
 - A shared branch with the arm64 tree that repaints all the system
   registers to match the ARM ARM's naming, and resulting in
   interesting conflicts
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOODb0PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDztsQAInRnsgLl57/SpqhZzExNCllN6AT/bdeB3uz
 rnw3ScJOV174uNKp8lnPWoTvu2YUGiVtBp6tFHhDI8le7zHX438ZT8KE5mcs8p5i
 KfFKnb8SHV2DDpqkcy24c0Xl/6vsg1qkKrdfJb49yl5ZakRITDpynW/7tn6dXsxX
 wASeGFdCYeW4g2xMQzsCbtx6LgeQ8uomBmzRfPrOtZHYYxAn6+4Mj4595EC1sWxM
 AQnbp8tW3Vw46saEZAQvUEOGOW9q0Nls7G21YqQ52IA+ZVDK1LmAF2b1XY3edjkk
 pX8EsXOURfqdasBxfSfF3SgnUazoz9GHpSzp1cTVTktrPp40rrT7Ldtml0ktq69d
 1malPj47KVMDsIq0kNJGnMxciXFgAHw+VaCQX+k4zhIatNwviMbSop2fEoxj22jc
 4YGgGOxaGrnvmAJhreCIbr4CkZk5CJ8Zvmtfg+QM6npIp8BY8896nvORx/d4i6tT
 H4caadd8AAR56ANUyd3+KqF3x0WrkaU0PLHJLy1tKwOXJUUTjcpvIfahBAAeUlSR
 qEFrtb+EEMPgAwLfNOICcNkPZR/yyuYvM+FiUQNVy5cNiwFkpztpIctfOFaHySGF
 K07O2/a1F6xKL0OKRUg7hGKknF9ecmux4vHhiUMuIk9VOgNTWobHozBDorLKXMzC
 aWa6oGVC
 =iIPT
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.2

- Enable the per-vcpu dirty-ring tracking mechanism, together with an
  option to keep the good old dirty log around for pages that are
  dirtied by something other than a vcpu.

- Switch to the relaxed parallel fault handling, using RCU to delay
  page table reclaim and giving better performance under load.

- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
  option, which multi-process VMMs such as crosvm rely on.

- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
  to have its own view of a vcpu, keeping that state private.

- Add support for the PMUv3p5 architecture revision, bringing support
  for 64bit counters on systems that support it, and fix the
  no-quite-compliant CHAIN-ed counter support for the machines that
  actually exist out there.

- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
  only) as a prefix of the oncoming support for 4kB and 16kB pages.

- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
  stage-2 faults and access tracking. You name it, we got it, we
  probably broke it.

- Pick a small set of documentation and spelling fixes, because no
  good merge window would be complete without those.

As a side effect, this tag also drags:

- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
  series

- A shared branch with the arm64 tree that repaints all the system
  registers to match the ARM ARM's naming, and resulting in
  interesting conflicts
2022-12-09 09:12:12 +01:00
Javier Martinez Canillas 30ee198ce4 KVM: Reference to kvm_userspace_memory_region in doc and comments
There are still references to the removed kvm_memory_region data structure
but the doc and comments should mention struct kvm_userspace_memory_region
instead, since that is what's used by the ioctl that replaced the old one
and this data structure support the same set of flags.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-4-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 12:54:40 -05:00
Javier Martinez Canillas 66a9221d73 KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-3-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 12:54:40 -05:00