Merge branch kvm-arm64/pkvm-protected-guest into kvmarm-master/next
* kvm-arm64/pkvm-protected-guest: (41 commits) : . : pKVM support for protected guests, implementing the very long : awaited support for anonymous memory, as the elusive guestmem : has failed to deliver on its promises despite a multi-year : effort. Patches courtesy of Will Deacon. From the initial cover : letter: : : "[...] this patch series implements support for protected guest : memory with pKVM, where pages are unmapped from the host as they are : faulted into the guest and can be shared back from the guest using pKVM : hypercalls. Protected guests are created using a new machine type : identifier and can be booted to a shell using the kvmtool patches : available at [2], which finally means that we are able to test the pVM : logic in pKVM. Since this is an incremental step towards full isolation : from the host (for example, the CPU register state and DMA accesses are : not yet isolated), creating a pVM requires a developer Kconfig option to : be enabled in addition to booting with 'kvm-arm.mode=protected' and : results in a kernel taint." : . KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm' drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL KVM: arm64: Rename PKVM_PAGE_STATE_MASK KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim KVM: arm64: Register 'selftest_vm' in the VM table KVM: arm64: Extend pKVM page ownership selftests to cover guest donation KVM: arm64: Add some initial documentation for pKVM KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler KVM: arm64: Introduce hypercall to force reclaim of a protected page KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 KVM: arm64: Change 'pkvm_handle_t' to u16 KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() ... Signed-off-by: Marc Zyngier <maz@kernel.org>master
commit
83a3980750
|
|
@ -3247,8 +3247,8 @@ Kernel parameters
|
|||
for the host. To force nVHE on VHE hardware, add
|
||||
"arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" to the
|
||||
command-line.
|
||||
"nested" is experimental and should be used with
|
||||
extreme caution.
|
||||
"nested" and "protected" are experimental and should be
|
||||
used with extreme caution.
|
||||
|
||||
kvm-arm.vgic_v3_group0_trap=
|
||||
[KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0
|
||||
|
|
|
|||
|
|
@ -10,6 +10,7 @@ ARM
|
|||
fw-pseudo-registers
|
||||
hyp-abi
|
||||
hypercalls
|
||||
pkvm
|
||||
pvtime
|
||||
ptp_kvm
|
||||
vcpu-features
|
||||
|
|
|
|||
|
|
@ -0,0 +1,106 @@
|
|||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================
|
||||
Protected KVM (pKVM)
|
||||
====================
|
||||
|
||||
**NOTE**: pKVM is currently an experimental, development feature and
|
||||
subject to breaking changes as new isolation features are implemented.
|
||||
Please reach out to the developers at kvmarm@lists.linux.dev if you have
|
||||
any questions.
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
Booting a host kernel with '``kvm-arm.mode=protected``' enables
|
||||
"Protected KVM" (pKVM). During boot, pKVM installs a stage-2 identity
|
||||
map page-table for the host and uses it to isolate the hypervisor
|
||||
running at EL2 from the rest of the host running at EL1/0.
|
||||
|
||||
pKVM permits creation of protected virtual machines (pVMs) by passing
|
||||
the ``KVM_VM_TYPE_ARM_PROTECTED`` machine type identifier to the
|
||||
``KVM_CREATE_VM`` ioctl(). The hypervisor isolates pVMs from the host by
|
||||
unmapping pages from the stage-2 identity map as they are accessed by a
|
||||
pVM. Hypercalls are provided for a pVM to share specific regions of its
|
||||
IPA space back with the host, allowing for communication with the VMM.
|
||||
A Linux guest must be configured with ``CONFIG_ARM_PKVM_GUEST=y`` in
|
||||
order to issue these hypercalls.
|
||||
|
||||
See hypercalls.rst for more details.
|
||||
|
||||
Isolation mechanisms
|
||||
====================
|
||||
|
||||
pKVM relies on a number of mechanisms to isolate PVMs from the host:
|
||||
|
||||
CPU memory isolation
|
||||
--------------------
|
||||
|
||||
Status: Isolation of anonymous memory and metadata pages.
|
||||
|
||||
Metadata pages (e.g. page-table pages and '``struct kvm_vcpu``' pages)
|
||||
are donated from the host to the hypervisor during pVM creation and
|
||||
are consequently unmapped from the stage-2 identity map until the pVM is
|
||||
destroyed.
|
||||
|
||||
Similarly to regular KVM, pages are lazily mapped into the guest in
|
||||
response to stage-2 page faults handled by the host. However, when
|
||||
running a pVM, these pages are first pinned and then unmapped from the
|
||||
stage-2 identity map as part of the donation procedure. This gives rise
|
||||
to some user-visible differences when compared to non-protected VMs,
|
||||
largely due to the lack of MMU notifiers:
|
||||
|
||||
* Memslots cannot be moved or deleted once the pVM has started running.
|
||||
* Read-only memslots and dirty logging are not supported.
|
||||
* With the exception of swap, file-backed pages cannot be mapped into a
|
||||
pVM.
|
||||
* Donated pages are accounted against ``RLIMIT_MLOCK`` and so the VMM
|
||||
must have a sufficient resource limit or be granted ``CAP_IPC_LOCK``.
|
||||
The lack of a runtime reclaim mechanism means that memory locked for
|
||||
a pVM will remain locked until the pVM is destroyed.
|
||||
* Changes to the VMM address space (e.g. a ``MAP_FIXED`` mmap() over a
|
||||
mapping associated with a memslot) are not reflected in the guest and
|
||||
may lead to loss of coherency.
|
||||
* Accessing pVM memory that has not been shared back will result in the
|
||||
delivery of a SIGSEGV.
|
||||
* If a system call accesses pVM memory that has not been shared back
|
||||
then it will either return ``-EFAULT`` or forcefully reclaim the
|
||||
memory pages. Reclaimed memory is zeroed by the hypervisor and a
|
||||
subsequent attempt to access it in the pVM will return ``-EFAULT``
|
||||
from the ``VCPU_RUN`` ioctl().
|
||||
|
||||
CPU state isolation
|
||||
-------------------
|
||||
|
||||
Status: **Unimplemented.**
|
||||
|
||||
DMA isolation using an IOMMU
|
||||
----------------------------
|
||||
|
||||
Status: **Unimplemented.**
|
||||
|
||||
Proxying of Trustzone services
|
||||
------------------------------
|
||||
|
||||
Status: FF-A and PSCI calls from the host are proxied by the pKVM
|
||||
hypervisor.
|
||||
|
||||
The FF-A proxy ensures that the host cannot share pVM or hypervisor
|
||||
memory with Trustzone as part of a "confused deputy" attack.
|
||||
|
||||
The PSCI proxy ensures that CPUs always have the stage-2 identity map
|
||||
installed when they are executing in the host.
|
||||
|
||||
Protected VM firmware (pvmfw)
|
||||
-----------------------------
|
||||
|
||||
Status: **Unimplemented.**
|
||||
|
||||
Resources
|
||||
=========
|
||||
|
||||
Quentin Perret's KVM Forum 2022 talk entitled "Protected KVM on arm64: A
|
||||
technical deep dive" remains a good resource for learning more about
|
||||
pKVM, despite some of the details having changed in the meantime:
|
||||
|
||||
https://www.youtube.com/watch?v=9npebeVFbFw
|
||||
|
|
@ -51,7 +51,7 @@
|
|||
#include <linux/mm.h>
|
||||
|
||||
enum __kvm_host_smccc_func {
|
||||
/* Hypercalls available only prior to pKVM finalisation */
|
||||
/* Hypercalls that are unavailable once pKVM has finalised. */
|
||||
/* __KVM_HOST_SMCCC_FUNC___kvm_hyp_init */
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_init = __KVM_HOST_SMCCC_FUNC___kvm_hyp_init + 1,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping,
|
||||
|
|
@ -60,16 +60,9 @@ enum __kvm_host_smccc_func {
|
|||
__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
|
||||
__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
|
||||
__KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
|
||||
|
||||
/* Hypercalls available after pKVM finalisation */
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
|
||||
/* Hypercalls that are always available and common to [nh]VHE/pKVM. */
|
||||
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
|
||||
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
|
||||
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
|
||||
|
|
@ -83,11 +76,27 @@ enum __kvm_host_smccc_func {
|
|||
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
|
||||
__KVM_HOST_SMCCC_FUNC___vgic_v5_save_apr,
|
||||
__KVM_HOST_SMCCC_FUNC___vgic_v5_restore_vmcr_apr,
|
||||
__KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM = __KVM_HOST_SMCCC_FUNC___vgic_v5_restore_vmcr_apr,
|
||||
|
||||
/* Hypercalls that are available only when pKVM has finalised. */
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_reserve_vm,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_in_poison_fault,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
|
||||
__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
|
||||
|
|
|
|||
|
|
@ -251,7 +251,7 @@ struct kvm_smccc_features {
|
|||
unsigned long vendor_hyp_bmap_2; /* Function numbers 64-127 */
|
||||
};
|
||||
|
||||
typedef unsigned int pkvm_handle_t;
|
||||
typedef u16 pkvm_handle_t;
|
||||
|
||||
struct kvm_protected_vm {
|
||||
pkvm_handle_t handle;
|
||||
|
|
@ -259,6 +259,13 @@ struct kvm_protected_vm {
|
|||
struct kvm_hyp_memcache stage2_teardown_mc;
|
||||
bool is_protected;
|
||||
bool is_created;
|
||||
|
||||
/*
|
||||
* True when the guest is being torn down. When in this state, the
|
||||
* guest's vCPUs can't be loaded anymore, but its pages can be
|
||||
* reclaimed by the host.
|
||||
*/
|
||||
bool is_dying;
|
||||
};
|
||||
|
||||
struct kvm_mpidr_data {
|
||||
|
|
|
|||
|
|
@ -99,14 +99,30 @@ typedef u64 kvm_pte_t;
|
|||
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
|
||||
KVM_PTE_LEAF_ATTR_HI_S2_XN)
|
||||
|
||||
#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
|
||||
#define KVM_MAX_OWNER_ID 1
|
||||
/* pKVM invalid pte encodings */
|
||||
#define KVM_INVALID_PTE_TYPE_MASK GENMASK(63, 60)
|
||||
#define KVM_INVALID_PTE_ANNOT_MASK ~(KVM_PTE_VALID | \
|
||||
KVM_INVALID_PTE_TYPE_MASK)
|
||||
|
||||
/*
|
||||
* Used to indicate a pte for which a 'break-before-make' sequence is in
|
||||
* progress.
|
||||
*/
|
||||
#define KVM_INVALID_PTE_LOCKED BIT(10)
|
||||
enum kvm_invalid_pte_type {
|
||||
/*
|
||||
* Used to indicate a pte for which a 'break-before-make'
|
||||
* sequence is in progress.
|
||||
*/
|
||||
KVM_INVALID_PTE_TYPE_LOCKED = 1,
|
||||
|
||||
/*
|
||||
* pKVM has unmapped the page from the host due to a change of
|
||||
* ownership.
|
||||
*/
|
||||
KVM_HOST_INVALID_PTE_TYPE_DONATION,
|
||||
|
||||
/*
|
||||
* The page has been forcefully reclaimed from the guest by the
|
||||
* host.
|
||||
*/
|
||||
KVM_GUEST_INVALID_PTE_TYPE_POISONED,
|
||||
};
|
||||
|
||||
static inline bool kvm_pte_valid(kvm_pte_t pte)
|
||||
{
|
||||
|
|
@ -658,14 +674,18 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
|||
void *mc, enum kvm_pgtable_walk_flags flags);
|
||||
|
||||
/**
|
||||
* kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
|
||||
* track ownership.
|
||||
* kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
|
||||
* to track ownership (and more).
|
||||
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
|
||||
* @addr: Base intermediate physical address to annotate.
|
||||
* @size: Size of the annotated range.
|
||||
* @mc: Cache of pre-allocated and zeroed memory from which to allocate
|
||||
* page-table pages.
|
||||
* @owner_id: Unique identifier for the owner of the page.
|
||||
* @type: The type of the annotation, determining its meaning and format.
|
||||
* @annotation: A 59-bit value that will be stored in the page tables.
|
||||
* @annotation[0] and @annotation[63:60] must be 0.
|
||||
* @annotation[59:1] is stored in the page tables, along
|
||||
* with @type.
|
||||
*
|
||||
* By default, all page-tables are owned by identifier 0. This function can be
|
||||
* used to mark portions of the IPA space as owned by other entities. When a
|
||||
|
|
@ -674,8 +694,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
|||
*
|
||||
* Return: 0 on success, negative error code on failure.
|
||||
*/
|
||||
int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
||||
void *mc, u8 owner_id);
|
||||
int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
||||
void *mc, enum kvm_invalid_pte_type type,
|
||||
kvm_pte_t annotation);
|
||||
|
||||
/**
|
||||
* kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
|
||||
|
|
|
|||
|
|
@ -17,7 +17,7 @@
|
|||
|
||||
#define HYP_MEMBLOCK_REGIONS 128
|
||||
|
||||
int pkvm_init_host_vm(struct kvm *kvm);
|
||||
int pkvm_init_host_vm(struct kvm *kvm, unsigned long type);
|
||||
int pkvm_create_hyp_vm(struct kvm *kvm);
|
||||
bool pkvm_hyp_vm_is_created(struct kvm *kvm);
|
||||
void pkvm_destroy_hyp_vm(struct kvm *kvm);
|
||||
|
|
@ -40,8 +40,6 @@ static inline bool kvm_pkvm_ext_allowed(struct kvm *kvm, long ext)
|
|||
case KVM_CAP_MAX_VCPU_ID:
|
||||
case KVM_CAP_MSI_DEVID:
|
||||
case KVM_CAP_ARM_VM_IPA_SIZE:
|
||||
case KVM_CAP_ARM_PMU_V3:
|
||||
case KVM_CAP_ARM_SVE:
|
||||
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
|
||||
case KVM_CAP_ARM_PTRAUTH_GENERIC:
|
||||
return true;
|
||||
|
|
|
|||
|
|
@ -94,6 +94,15 @@ static inline bool is_pkvm_initialized(void)
|
|||
static_branch_likely(&kvm_protected_mode_initialized);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_KVM
|
||||
bool pkvm_force_reclaim_guest_page(phys_addr_t phys);
|
||||
#else
|
||||
static inline bool pkvm_force_reclaim_guest_page(phys_addr_t phys)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
#endif
|
||||
|
||||
/* Reports the availability of HYP mode */
|
||||
static inline bool is_hyp_mode_available(void)
|
||||
{
|
||||
|
|
|
|||
|
|
@ -208,6 +208,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
|||
{
|
||||
int ret;
|
||||
|
||||
if (type & ~KVM_VM_TYPE_ARM_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_init(&kvm->arch.config_lock);
|
||||
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
|
|
@ -239,9 +242,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
|||
* If any failures occur after this is successful, make sure to
|
||||
* call __pkvm_unreserve_vm to unreserve the VM in hyp.
|
||||
*/
|
||||
ret = pkvm_init_host_vm(kvm);
|
||||
ret = pkvm_init_host_vm(kvm, type);
|
||||
if (ret)
|
||||
goto err_free_cpumask;
|
||||
goto err_uninit_mmu;
|
||||
} else if (type & KVM_VM_TYPE_ARM_PROTECTED) {
|
||||
ret = -EINVAL;
|
||||
goto err_uninit_mmu;
|
||||
}
|
||||
|
||||
kvm_vgic_early_init(kvm);
|
||||
|
|
@ -257,6 +263,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
|||
|
||||
return 0;
|
||||
|
||||
err_uninit_mmu:
|
||||
kvm_uninit_stage2_mmu(kvm);
|
||||
err_free_cpumask:
|
||||
free_cpumask_var(kvm->arch.supported_cpus);
|
||||
err_unshare_kvm:
|
||||
|
|
|
|||
|
|
@ -27,16 +27,22 @@ extern struct host_mmu host_mmu;
|
|||
enum pkvm_component_id {
|
||||
PKVM_ID_HOST,
|
||||
PKVM_ID_HYP,
|
||||
PKVM_ID_FFA,
|
||||
PKVM_ID_GUEST,
|
||||
};
|
||||
|
||||
int __pkvm_prot_finalize(void);
|
||||
int __pkvm_host_share_hyp(u64 pfn);
|
||||
int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
|
||||
int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
|
||||
int __pkvm_host_unshare_hyp(u64 pfn);
|
||||
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
|
||||
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
|
||||
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
|
||||
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
|
||||
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
|
||||
int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu);
|
||||
int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
|
||||
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
|
||||
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
|
||||
enum kvm_pgtable_prot prot);
|
||||
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
|
||||
|
|
@ -68,6 +74,8 @@ static __always_inline void __load_host_stage2(void)
|
|||
|
||||
#ifdef CONFIG_NVHE_EL2_DEBUG
|
||||
void pkvm_ownership_selftest(void *base);
|
||||
struct pkvm_hyp_vcpu *init_selftest_vm(void *virt);
|
||||
void teardown_selftest_vm(void);
|
||||
#else
|
||||
static inline void pkvm_ownership_selftest(void *base) { }
|
||||
#endif
|
||||
|
|
|
|||
|
|
@ -30,8 +30,14 @@ enum pkvm_page_state {
|
|||
* struct hyp_page.
|
||||
*/
|
||||
PKVM_NOPAGE = BIT(0) | BIT(1),
|
||||
|
||||
/*
|
||||
* 'Meta-states' which aren't encoded directly in the PTE's SW bits (or
|
||||
* the hyp_vmemmap entry for the host)
|
||||
*/
|
||||
PKVM_POISON = BIT(2),
|
||||
};
|
||||
#define PKVM_PAGE_STATE_MASK (BIT(0) | BIT(1))
|
||||
#define PKVM_PAGE_STATE_VMEMMAP_MASK (BIT(0) | BIT(1))
|
||||
|
||||
#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
|
||||
static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
|
||||
|
|
@ -108,12 +114,12 @@ static inline void set_host_state(struct hyp_page *p, enum pkvm_page_state state
|
|||
|
||||
static inline enum pkvm_page_state get_hyp_state(struct hyp_page *p)
|
||||
{
|
||||
return p->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
|
||||
return p->__hyp_state_comp ^ PKVM_PAGE_STATE_VMEMMAP_MASK;
|
||||
}
|
||||
|
||||
static inline void set_hyp_state(struct hyp_page *p, enum pkvm_page_state state)
|
||||
{
|
||||
p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
|
||||
p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_VMEMMAP_MASK;
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
|
|||
|
|
@ -73,8 +73,12 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
|
|||
unsigned long pgd_hva);
|
||||
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
|
||||
unsigned long vcpu_hva);
|
||||
int __pkvm_teardown_vm(pkvm_handle_t handle);
|
||||
|
||||
int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
|
||||
int __pkvm_start_teardown_vm(pkvm_handle_t handle);
|
||||
int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
|
||||
|
||||
struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle);
|
||||
struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
|
||||
unsigned int vcpu_idx);
|
||||
void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
|
||||
|
|
@ -84,6 +88,7 @@ struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
|
|||
struct pkvm_hyp_vm *get_np_pkvm_hyp_vm(pkvm_handle_t handle);
|
||||
void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
|
||||
|
||||
bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
|
||||
bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
|
||||
bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
|
||||
void kvm_init_pvm_id_regs(struct kvm_vcpu *vcpu);
|
||||
|
|
|
|||
|
|
@ -16,4 +16,6 @@
|
|||
__always_unused int ___check_reg_ ## reg; \
|
||||
type name = (type)cpu_reg(ctxt, (reg))
|
||||
|
||||
void inject_host_exception(u64 esr);
|
||||
|
||||
#endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */
|
||||
|
|
|
|||
|
|
@ -173,9 +173,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
|
|||
DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
|
||||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
return;
|
||||
|
||||
hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
|
||||
if (!hyp_vcpu)
|
||||
return;
|
||||
|
|
@ -192,12 +189,8 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
|
|||
|
||||
static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
return;
|
||||
|
||||
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
|
||||
if (hyp_vcpu)
|
||||
pkvm_put_hyp_vcpu(hyp_vcpu);
|
||||
}
|
||||
|
|
@ -252,6 +245,26 @@ static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
|
|||
&host_vcpu->arch.pkvm_memcache);
|
||||
}
|
||||
|
||||
static void handle___pkvm_host_donate_guest(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(u64, pfn, host_ctxt, 1);
|
||||
DECLARE_REG(u64, gfn, host_ctxt, 2);
|
||||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
int ret = -EINVAL;
|
||||
|
||||
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
|
||||
if (!hyp_vcpu || !pkvm_hyp_vcpu_is_protected(hyp_vcpu))
|
||||
goto out;
|
||||
|
||||
ret = pkvm_refill_memcache(hyp_vcpu);
|
||||
if (ret)
|
||||
goto out;
|
||||
|
||||
ret = __pkvm_host_donate_guest(pfn, gfn, hyp_vcpu);
|
||||
out:
|
||||
cpu_reg(host_ctxt, 1) = ret;
|
||||
}
|
||||
|
||||
static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(u64, pfn, host_ctxt, 1);
|
||||
|
|
@ -261,9 +274,6 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
|
|||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
goto out;
|
||||
|
||||
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
|
||||
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
|
||||
goto out;
|
||||
|
|
@ -285,9 +295,6 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
|
|||
struct pkvm_hyp_vm *hyp_vm;
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
goto out;
|
||||
|
||||
hyp_vm = get_np_pkvm_hyp_vm(handle);
|
||||
if (!hyp_vm)
|
||||
goto out;
|
||||
|
|
@ -305,9 +312,6 @@ static void handle___pkvm_host_relax_perms_guest(struct kvm_cpu_context *host_ct
|
|||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
goto out;
|
||||
|
||||
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
|
||||
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
|
||||
goto out;
|
||||
|
|
@ -325,9 +329,6 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
|
|||
struct pkvm_hyp_vm *hyp_vm;
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
goto out;
|
||||
|
||||
hyp_vm = get_np_pkvm_hyp_vm(handle);
|
||||
if (!hyp_vm)
|
||||
goto out;
|
||||
|
|
@ -347,9 +348,6 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
|
|||
struct pkvm_hyp_vm *hyp_vm;
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
goto out;
|
||||
|
||||
hyp_vm = get_np_pkvm_hyp_vm(handle);
|
||||
if (!hyp_vm)
|
||||
goto out;
|
||||
|
|
@ -366,9 +364,6 @@ static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
|
|||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
goto out;
|
||||
|
||||
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
|
||||
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
|
||||
goto out;
|
||||
|
|
@ -428,12 +423,8 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
|
|||
static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
|
||||
struct pkvm_hyp_vm *hyp_vm;
|
||||
struct pkvm_hyp_vm *hyp_vm = get_np_pkvm_hyp_vm(handle);
|
||||
|
||||
if (!is_protected_kvm_enabled())
|
||||
return;
|
||||
|
||||
hyp_vm = get_np_pkvm_hyp_vm(handle);
|
||||
if (!hyp_vm)
|
||||
return;
|
||||
|
||||
|
|
@ -584,11 +575,42 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
|
|||
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
|
||||
}
|
||||
|
||||
static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
|
||||
static void handle___pkvm_vcpu_in_poison_fault(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
int ret;
|
||||
struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
|
||||
|
||||
ret = hyp_vcpu ? __pkvm_vcpu_in_poison_fault(hyp_vcpu) : -EINVAL;
|
||||
cpu_reg(host_ctxt, 1) = ret;
|
||||
}
|
||||
|
||||
static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
|
||||
|
||||
cpu_reg(host_ctxt, 1) = __pkvm_host_force_reclaim_page_guest(phys);
|
||||
}
|
||||
|
||||
static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
|
||||
DECLARE_REG(u64, gfn, host_ctxt, 2);
|
||||
|
||||
cpu_reg(host_ctxt, 1) = __pkvm_reclaim_dying_guest_page(handle, gfn);
|
||||
}
|
||||
|
||||
static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
|
||||
|
||||
cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
|
||||
cpu_reg(host_ctxt, 1) = __pkvm_start_teardown_vm(handle);
|
||||
}
|
||||
|
||||
static void handle___pkvm_finalize_teardown_vm(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
|
||||
|
||||
cpu_reg(host_ctxt, 1) = __pkvm_finalize_teardown_vm(handle);
|
||||
}
|
||||
|
||||
static void handle___tracing_load(struct kvm_cpu_context *host_ctxt)
|
||||
|
|
@ -678,14 +700,6 @@ static const hcall_t host_hcall[] = {
|
|||
HANDLE_FUNC(__vgic_v3_get_gic_config),
|
||||
HANDLE_FUNC(__pkvm_prot_finalize),
|
||||
|
||||
HANDLE_FUNC(__pkvm_host_share_hyp),
|
||||
HANDLE_FUNC(__pkvm_host_unshare_hyp),
|
||||
HANDLE_FUNC(__pkvm_host_share_guest),
|
||||
HANDLE_FUNC(__pkvm_host_unshare_guest),
|
||||
HANDLE_FUNC(__pkvm_host_relax_perms_guest),
|
||||
HANDLE_FUNC(__pkvm_host_wrprotect_guest),
|
||||
HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
|
||||
HANDLE_FUNC(__pkvm_host_mkyoung_guest),
|
||||
HANDLE_FUNC(__kvm_adjust_pc),
|
||||
HANDLE_FUNC(__kvm_vcpu_run),
|
||||
HANDLE_FUNC(__kvm_flush_vm_context),
|
||||
|
|
@ -699,11 +713,25 @@ static const hcall_t host_hcall[] = {
|
|||
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
|
||||
HANDLE_FUNC(__vgic_v5_save_apr),
|
||||
HANDLE_FUNC(__vgic_v5_restore_vmcr_apr),
|
||||
|
||||
HANDLE_FUNC(__pkvm_host_share_hyp),
|
||||
HANDLE_FUNC(__pkvm_host_unshare_hyp),
|
||||
HANDLE_FUNC(__pkvm_host_donate_guest),
|
||||
HANDLE_FUNC(__pkvm_host_share_guest),
|
||||
HANDLE_FUNC(__pkvm_host_unshare_guest),
|
||||
HANDLE_FUNC(__pkvm_host_relax_perms_guest),
|
||||
HANDLE_FUNC(__pkvm_host_wrprotect_guest),
|
||||
HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
|
||||
HANDLE_FUNC(__pkvm_host_mkyoung_guest),
|
||||
HANDLE_FUNC(__pkvm_reserve_vm),
|
||||
HANDLE_FUNC(__pkvm_unreserve_vm),
|
||||
HANDLE_FUNC(__pkvm_init_vm),
|
||||
HANDLE_FUNC(__pkvm_init_vcpu),
|
||||
HANDLE_FUNC(__pkvm_teardown_vm),
|
||||
HANDLE_FUNC(__pkvm_vcpu_in_poison_fault),
|
||||
HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
|
||||
HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
|
||||
HANDLE_FUNC(__pkvm_start_teardown_vm),
|
||||
HANDLE_FUNC(__pkvm_finalize_teardown_vm),
|
||||
HANDLE_FUNC(__pkvm_vcpu_load),
|
||||
HANDLE_FUNC(__pkvm_vcpu_put),
|
||||
HANDLE_FUNC(__pkvm_tlb_flush_vmid),
|
||||
|
|
@ -720,7 +748,7 @@ static const hcall_t host_hcall[] = {
|
|||
static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
DECLARE_REG(unsigned long, id, host_ctxt, 0);
|
||||
unsigned long hcall_min = 0;
|
||||
unsigned long hcall_min = 0, hcall_max = -1;
|
||||
hcall_t hfn;
|
||||
|
||||
/*
|
||||
|
|
@ -732,14 +760,19 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
|
|||
* basis. This is all fine, however, since __pkvm_prot_finalize
|
||||
* returns -EPERM after the first call for a given CPU.
|
||||
*/
|
||||
if (static_branch_unlikely(&kvm_protected_mode_initialized))
|
||||
hcall_min = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize;
|
||||
if (static_branch_unlikely(&kvm_protected_mode_initialized)) {
|
||||
hcall_min = __KVM_HOST_SMCCC_FUNC_MIN_PKVM;
|
||||
} else {
|
||||
hcall_max = __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM;
|
||||
}
|
||||
|
||||
id &= ~ARM_SMCCC_CALL_HINTS;
|
||||
id -= KVM_HOST_SMCCC_ID(0);
|
||||
|
||||
if (unlikely(id < hcall_min || id >= ARRAY_SIZE(host_hcall)))
|
||||
if (unlikely(id < hcall_min || id > hcall_max ||
|
||||
id >= ARRAY_SIZE(host_hcall))) {
|
||||
goto inval;
|
||||
}
|
||||
|
||||
hfn = host_hcall[id];
|
||||
if (unlikely(!hfn))
|
||||
|
|
@ -777,43 +810,52 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
|
|||
kvm_skip_host_instr();
|
||||
}
|
||||
|
||||
/*
|
||||
* Inject an Undefined Instruction exception into the host.
|
||||
*
|
||||
* This is open-coded to allow control over PSTATE construction without
|
||||
* complicating the generic exception entry helpers.
|
||||
*/
|
||||
static void inject_undef64(void)
|
||||
void inject_host_exception(u64 esr)
|
||||
{
|
||||
u64 spsr_mask, vbar, sctlr, old_spsr, new_spsr, esr, offset;
|
||||
u64 sctlr, spsr_el1, spsr_el2, exc_offset = except_type_sync;
|
||||
const u64 spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT |
|
||||
PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
|
||||
|
||||
spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
|
||||
spsr_el1 = spsr_el2 = read_sysreg_el2(SYS_SPSR);
|
||||
switch (spsr_el1 & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
|
||||
case PSR_MODE_EL0t:
|
||||
exc_offset += LOWER_EL_AArch64_VECTOR;
|
||||
break;
|
||||
case PSR_MODE_EL0t | PSR_MODE32_BIT:
|
||||
exc_offset += LOWER_EL_AArch32_VECTOR;
|
||||
break;
|
||||
default:
|
||||
exc_offset += CURRENT_EL_SP_ELx_VECTOR;
|
||||
}
|
||||
|
||||
spsr_el2 &= spsr_mask;
|
||||
spsr_el2 |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
|
||||
PSR_MODE_EL1h;
|
||||
|
||||
vbar = read_sysreg_el1(SYS_VBAR);
|
||||
sctlr = read_sysreg_el1(SYS_SCTLR);
|
||||
old_spsr = read_sysreg_el2(SYS_SPSR);
|
||||
|
||||
new_spsr = old_spsr & spsr_mask;
|
||||
new_spsr |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT;
|
||||
new_spsr |= PSR_MODE_EL1h;
|
||||
|
||||
if (!(sctlr & SCTLR_EL1_SPAN))
|
||||
new_spsr |= PSR_PAN_BIT;
|
||||
spsr_el2 |= PSR_PAN_BIT;
|
||||
|
||||
if (sctlr & SCTLR_ELx_DSSBS)
|
||||
new_spsr |= PSR_SSBS_BIT;
|
||||
spsr_el2 |= PSR_SSBS_BIT;
|
||||
|
||||
if (system_supports_mte())
|
||||
new_spsr |= PSR_TCO_BIT;
|
||||
spsr_el2 |= PSR_TCO_BIT;
|
||||
|
||||
esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT) | ESR_ELx_IL;
|
||||
offset = CURRENT_EL_SP_ELx_VECTOR + except_type_sync;
|
||||
if (esr_fsc_is_translation_fault(esr))
|
||||
write_sysreg_el1(read_sysreg_el2(SYS_FAR), SYS_FAR);
|
||||
|
||||
write_sysreg_el1(esr, SYS_ESR);
|
||||
write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
|
||||
write_sysreg_el1(old_spsr, SYS_SPSR);
|
||||
write_sysreg_el2(vbar + offset, SYS_ELR);
|
||||
write_sysreg_el2(new_spsr, SYS_SPSR);
|
||||
write_sysreg_el1(spsr_el1, SYS_SPSR);
|
||||
write_sysreg_el2(read_sysreg_el1(SYS_VBAR) + exc_offset, SYS_ELR);
|
||||
write_sysreg_el2(spsr_el2, SYS_SPSR);
|
||||
}
|
||||
|
||||
static void inject_host_undef64(void)
|
||||
{
|
||||
inject_host_exception((ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT) |
|
||||
ESR_ELx_IL);
|
||||
}
|
||||
|
||||
static bool handle_host_mte(u64 esr)
|
||||
|
|
@ -836,7 +878,7 @@ static bool handle_host_mte(u64 esr)
|
|||
return false;
|
||||
}
|
||||
|
||||
inject_undef64();
|
||||
inject_host_undef64();
|
||||
return true;
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@
|
|||
#include <nvhe/memory.h>
|
||||
#include <nvhe/mem_protect.h>
|
||||
#include <nvhe/mm.h>
|
||||
#include <nvhe/trap_handler.h>
|
||||
|
||||
#define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_AS_S1 | KVM_PGTABLE_S2_IDMAP)
|
||||
|
||||
|
|
@ -461,8 +462,15 @@ static bool range_is_memory(u64 start, u64 end)
|
|||
static inline int __host_stage2_idmap(u64 start, u64 end,
|
||||
enum kvm_pgtable_prot prot)
|
||||
{
|
||||
/*
|
||||
* We don't make permission changes to the host idmap after
|
||||
* initialisation, so we can squash -EAGAIN to save callers
|
||||
* having to treat it like success in the case that they try to
|
||||
* map something that is already mapped.
|
||||
*/
|
||||
return kvm_pgtable_stage2_map(&host_mmu.pgt, start, end - start, start,
|
||||
prot, &host_s2_pool, 0);
|
||||
prot, &host_s2_pool,
|
||||
KVM_PGTABLE_WALK_IGNORE_EAGAIN);
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
@ -504,7 +512,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
|
|||
return ret;
|
||||
|
||||
if (kvm_pte_valid(pte))
|
||||
return -EAGAIN;
|
||||
return -EEXIST;
|
||||
|
||||
if (pte) {
|
||||
WARN_ON(addr_is_memory(addr) &&
|
||||
|
|
@ -541,24 +549,99 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
|
|||
set_host_state(page, state);
|
||||
}
|
||||
|
||||
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
|
||||
#define KVM_HOST_DONATION_PTE_OWNER_MASK GENMASK(3, 1)
|
||||
#define KVM_HOST_DONATION_PTE_EXTRA_MASK GENMASK(59, 4)
|
||||
static int host_stage2_set_owner_metadata_locked(phys_addr_t addr, u64 size,
|
||||
u8 owner_id, u64 meta)
|
||||
{
|
||||
kvm_pte_t annotation;
|
||||
int ret;
|
||||
|
||||
if (owner_id == PKVM_ID_HOST)
|
||||
return -EINVAL;
|
||||
|
||||
if (!range_is_memory(addr, addr + size))
|
||||
return -EPERM;
|
||||
|
||||
ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
|
||||
addr, size, &host_s2_pool, owner_id);
|
||||
if (ret)
|
||||
return ret;
|
||||
if (!FIELD_FIT(KVM_HOST_DONATION_PTE_OWNER_MASK, owner_id))
|
||||
return -EINVAL;
|
||||
|
||||
/* Don't forget to update the vmemmap tracking for the host */
|
||||
if (owner_id == PKVM_ID_HOST)
|
||||
__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
|
||||
else
|
||||
if (!FIELD_FIT(KVM_HOST_DONATION_PTE_EXTRA_MASK, meta))
|
||||
return -EINVAL;
|
||||
|
||||
annotation = FIELD_PREP(KVM_HOST_DONATION_PTE_OWNER_MASK, owner_id) |
|
||||
FIELD_PREP(KVM_HOST_DONATION_PTE_EXTRA_MASK, meta);
|
||||
ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
|
||||
addr, size, &host_s2_pool,
|
||||
KVM_HOST_INVALID_PTE_TYPE_DONATION, annotation);
|
||||
if (!ret)
|
||||
__host_update_page_state(addr, size, PKVM_NOPAGE);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
|
||||
{
|
||||
int ret = -EINVAL;
|
||||
|
||||
switch (owner_id) {
|
||||
case PKVM_ID_HOST:
|
||||
if (!range_is_memory(addr, addr + size))
|
||||
return -EPERM;
|
||||
|
||||
ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
|
||||
if (!ret)
|
||||
__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
|
||||
break;
|
||||
case PKVM_ID_HYP:
|
||||
ret = host_stage2_set_owner_metadata_locked(addr, size,
|
||||
owner_id, 0);
|
||||
break;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
#define KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK GENMASK(15, 0)
|
||||
/* We need 40 bits for the GFN to cover a 52-bit IPA with 4k pages and LPA2 */
|
||||
#define KVM_HOST_PTE_OWNER_GUEST_GFN_MASK GENMASK(55, 16)
|
||||
static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
|
||||
{
|
||||
pkvm_handle_t handle = vm->kvm.arch.pkvm.handle;
|
||||
|
||||
BUILD_BUG_ON((pkvm_handle_t)-1 > KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK);
|
||||
WARN_ON(!FIELD_FIT(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn));
|
||||
|
||||
return FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, handle) |
|
||||
FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn);
|
||||
}
|
||||
|
||||
static int host_stage2_decode_gfn_meta(kvm_pte_t pte, struct pkvm_hyp_vm **vm,
|
||||
u64 *gfn)
|
||||
{
|
||||
pkvm_handle_t handle;
|
||||
u64 meta;
|
||||
|
||||
if (WARN_ON(kvm_pte_valid(pte)))
|
||||
return -EINVAL;
|
||||
|
||||
if (FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) !=
|
||||
KVM_HOST_INVALID_PTE_TYPE_DONATION) {
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (FIELD_GET(KVM_HOST_DONATION_PTE_OWNER_MASK, pte) != PKVM_ID_GUEST)
|
||||
return -EPERM;
|
||||
|
||||
meta = FIELD_GET(KVM_HOST_DONATION_PTE_EXTRA_MASK, pte);
|
||||
handle = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, meta);
|
||||
*vm = get_vm_by_handle(handle);
|
||||
if (!*vm) {
|
||||
/* We probably raced with teardown; try again */
|
||||
return -EAGAIN;
|
||||
}
|
||||
|
||||
*gfn = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, meta);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
|
@ -605,11 +688,43 @@ unlock:
|
|||
return ret;
|
||||
}
|
||||
|
||||
static void host_inject_mem_abort(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
u64 ec, esr, spsr;
|
||||
|
||||
esr = read_sysreg_el2(SYS_ESR);
|
||||
spsr = read_sysreg_el2(SYS_SPSR);
|
||||
|
||||
/* Repaint the ESR to report a same-level fault if taken from EL1 */
|
||||
if ((spsr & PSR_MODE_MASK) != PSR_MODE_EL0t) {
|
||||
ec = ESR_ELx_EC(esr);
|
||||
if (ec == ESR_ELx_EC_DABT_LOW)
|
||||
ec = ESR_ELx_EC_DABT_CUR;
|
||||
else if (ec == ESR_ELx_EC_IABT_LOW)
|
||||
ec = ESR_ELx_EC_IABT_CUR;
|
||||
else
|
||||
WARN_ON(1);
|
||||
esr &= ~ESR_ELx_EC_MASK;
|
||||
esr |= ec << ESR_ELx_EC_SHIFT;
|
||||
}
|
||||
|
||||
/*
|
||||
* Since S1PTW should only ever be set for stage-2 faults, we're pretty
|
||||
* much guaranteed that it won't be set in ESR_EL1 by the hardware. So,
|
||||
* let's use that bit to allow the host abort handler to differentiate
|
||||
* this abort from normal userspace faults.
|
||||
*
|
||||
* Note: although S1PTW is RES0 at EL1, it is guaranteed by the
|
||||
* architecture to be backed by flops, so it should be safe to use.
|
||||
*/
|
||||
esr |= ESR_ELx_S1PTW;
|
||||
inject_host_exception(esr);
|
||||
}
|
||||
|
||||
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
|
||||
{
|
||||
struct kvm_vcpu_fault_info fault;
|
||||
u64 esr, addr;
|
||||
int ret = 0;
|
||||
|
||||
esr = read_sysreg_el2(SYS_ESR);
|
||||
if (!__get_fault_info(esr, &fault)) {
|
||||
|
|
@ -628,8 +743,16 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
|
|||
BUG_ON(!(fault.hpfar_el2 & HPFAR_EL2_NS));
|
||||
addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
|
||||
|
||||
ret = host_stage2_idmap(addr);
|
||||
BUG_ON(ret && ret != -EAGAIN);
|
||||
switch (host_stage2_idmap(addr)) {
|
||||
case -EPERM:
|
||||
host_inject_mem_abort(host_ctxt);
|
||||
fallthrough;
|
||||
case -EEXIST:
|
||||
case 0:
|
||||
break;
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
}
|
||||
|
||||
struct check_walk_data {
|
||||
|
|
@ -707,8 +830,20 @@ static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_pa
|
|||
return 0;
|
||||
}
|
||||
|
||||
static bool guest_pte_is_poisoned(kvm_pte_t pte)
|
||||
{
|
||||
if (kvm_pte_valid(pte))
|
||||
return false;
|
||||
|
||||
return FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) ==
|
||||
KVM_GUEST_INVALID_PTE_TYPE_POISONED;
|
||||
}
|
||||
|
||||
static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
|
||||
{
|
||||
if (guest_pte_is_poisoned(pte))
|
||||
return PKVM_POISON;
|
||||
|
||||
if (!kvm_pte_valid(pte))
|
||||
return PKVM_NOPAGE;
|
||||
|
||||
|
|
@ -727,6 +862,77 @@ static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
|
|||
return check_page_state_range(&vm->pgt, addr, size, &d);
|
||||
}
|
||||
|
||||
static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep, u64 *physp)
|
||||
{
|
||||
kvm_pte_t pte;
|
||||
u64 phys;
|
||||
s8 level;
|
||||
int ret;
|
||||
|
||||
ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
|
||||
if (ret)
|
||||
return ret;
|
||||
if (guest_pte_is_poisoned(pte))
|
||||
return -EHWPOISON;
|
||||
if (!kvm_pte_valid(pte))
|
||||
return -ENOENT;
|
||||
if (level != KVM_PGTABLE_LAST_LEVEL)
|
||||
return -E2BIG;
|
||||
|
||||
phys = kvm_pte_to_phys(pte);
|
||||
ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
|
||||
if (WARN_ON(ret))
|
||||
return ret;
|
||||
|
||||
*ptep = pte;
|
||||
*physp = phys;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu)
|
||||
{
|
||||
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
|
||||
kvm_pte_t pte;
|
||||
s8 level;
|
||||
u64 ipa;
|
||||
int ret;
|
||||
|
||||
switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) {
|
||||
case ESR_ELx_EC_DABT_LOW:
|
||||
case ESR_ELx_EC_IABT_LOW:
|
||||
if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu))
|
||||
break;
|
||||
fallthrough;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* The host has the faulting IPA when it calls us from the guest
|
||||
* fault handler but we retrieve it ourselves from the FAR so as
|
||||
* to avoid exposing an "oracle" that could reveal data access
|
||||
* patterns of the guest after initial donation of its pages.
|
||||
*/
|
||||
ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
|
||||
ipa |= FAR_TO_FIPA_OFFSET(kvm_vcpu_get_hfar(&hyp_vcpu->vcpu));
|
||||
|
||||
guest_lock_component(vm);
|
||||
ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
|
||||
if (ret)
|
||||
goto unlock;
|
||||
|
||||
if (level != KVM_PGTABLE_LAST_LEVEL) {
|
||||
ret = -EINVAL;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
ret = guest_pte_is_poisoned(pte);
|
||||
unlock:
|
||||
guest_unlock_component(vm);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_host_share_hyp(u64 pfn)
|
||||
{
|
||||
u64 phys = hyp_pfn_to_phys(pfn);
|
||||
|
|
@ -753,6 +959,72 @@ unlock:
|
|||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
|
||||
{
|
||||
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
|
||||
u64 phys, ipa = hyp_pfn_to_phys(gfn);
|
||||
kvm_pte_t pte;
|
||||
int ret;
|
||||
|
||||
host_lock_component();
|
||||
guest_lock_component(vm);
|
||||
|
||||
ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
|
||||
if (ret)
|
||||
goto unlock;
|
||||
|
||||
ret = -EPERM;
|
||||
if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_OWNED)
|
||||
goto unlock;
|
||||
if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE))
|
||||
goto unlock;
|
||||
|
||||
ret = 0;
|
||||
WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
|
||||
pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED),
|
||||
&vcpu->vcpu.arch.pkvm_memcache, 0));
|
||||
WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
|
||||
unlock:
|
||||
guest_unlock_component(vm);
|
||||
host_unlock_component();
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
|
||||
{
|
||||
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
|
||||
u64 meta, phys, ipa = hyp_pfn_to_phys(gfn);
|
||||
kvm_pte_t pte;
|
||||
int ret;
|
||||
|
||||
host_lock_component();
|
||||
guest_lock_component(vm);
|
||||
|
||||
ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
|
||||
if (ret)
|
||||
goto unlock;
|
||||
|
||||
ret = -EPERM;
|
||||
if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_SHARED_OWNED)
|
||||
goto unlock;
|
||||
if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED))
|
||||
goto unlock;
|
||||
|
||||
ret = 0;
|
||||
meta = host_stage2_encode_gfn_meta(vm, gfn);
|
||||
WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
|
||||
PKVM_ID_GUEST, meta));
|
||||
WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
|
||||
pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
|
||||
&vcpu->vcpu.arch.pkvm_memcache, 0));
|
||||
unlock:
|
||||
guest_unlock_component(vm);
|
||||
host_unlock_component();
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_host_unshare_hyp(u64 pfn)
|
||||
{
|
||||
u64 phys = hyp_pfn_to_phys(pfn);
|
||||
|
|
@ -960,6 +1232,176 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
|
|||
return 0;
|
||||
}
|
||||
|
||||
static void hyp_poison_page(phys_addr_t phys)
|
||||
{
|
||||
void *addr = hyp_fixmap_map(phys);
|
||||
|
||||
memset(addr, 0, PAGE_SIZE);
|
||||
/*
|
||||
* Prefer kvm_flush_dcache_to_poc() over __clean_dcache_guest_page()
|
||||
* here as the latter may elide the CMO under the assumption that FWB
|
||||
* will be enabled on CPUs that support it. This is incorrect for the
|
||||
* host stage-2 and would otherwise lead to a malicious host potentially
|
||||
* being able to read the contents of newly reclaimed guest pages.
|
||||
*/
|
||||
kvm_flush_dcache_to_poc(addr, PAGE_SIZE);
|
||||
hyp_fixmap_unmap();
|
||||
}
|
||||
|
||||
static int host_stage2_get_guest_info(phys_addr_t phys, struct pkvm_hyp_vm **vm,
|
||||
u64 *gfn)
|
||||
{
|
||||
enum pkvm_page_state state;
|
||||
kvm_pte_t pte;
|
||||
s8 level;
|
||||
int ret;
|
||||
|
||||
if (!addr_is_memory(phys))
|
||||
return -EFAULT;
|
||||
|
||||
state = get_host_state(hyp_phys_to_page(phys));
|
||||
switch (state) {
|
||||
case PKVM_PAGE_OWNED:
|
||||
case PKVM_PAGE_SHARED_OWNED:
|
||||
case PKVM_PAGE_SHARED_BORROWED:
|
||||
/* The access should no longer fault; try again. */
|
||||
return -EAGAIN;
|
||||
case PKVM_NOPAGE:
|
||||
break;
|
||||
default:
|
||||
return -EPERM;
|
||||
}
|
||||
|
||||
ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, &level);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
if (WARN_ON(level != KVM_PGTABLE_LAST_LEVEL))
|
||||
return -EINVAL;
|
||||
|
||||
return host_stage2_decode_gfn_meta(pte, vm, gfn);
|
||||
}
|
||||
|
||||
int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys)
|
||||
{
|
||||
struct pkvm_hyp_vm *vm;
|
||||
u64 gfn, ipa, pa;
|
||||
kvm_pte_t pte;
|
||||
int ret;
|
||||
|
||||
phys &= PAGE_MASK;
|
||||
|
||||
hyp_spin_lock(&vm_table_lock);
|
||||
host_lock_component();
|
||||
|
||||
ret = host_stage2_get_guest_info(phys, &vm, &gfn);
|
||||
if (ret)
|
||||
goto unlock_host;
|
||||
|
||||
ipa = hyp_pfn_to_phys(gfn);
|
||||
guest_lock_component(vm);
|
||||
ret = get_valid_guest_pte(vm, ipa, &pte, &pa);
|
||||
if (ret)
|
||||
goto unlock_guest;
|
||||
|
||||
WARN_ON(pa != phys);
|
||||
if (guest_get_page_state(pte, ipa) != PKVM_PAGE_OWNED) {
|
||||
ret = -EPERM;
|
||||
goto unlock_guest;
|
||||
}
|
||||
|
||||
/* We really shouldn't be allocating, so don't pass a memcache */
|
||||
ret = kvm_pgtable_stage2_annotate(&vm->pgt, ipa, PAGE_SIZE, NULL,
|
||||
KVM_GUEST_INVALID_PTE_TYPE_POISONED,
|
||||
0);
|
||||
if (ret)
|
||||
goto unlock_guest;
|
||||
|
||||
hyp_poison_page(phys);
|
||||
WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
|
||||
unlock_guest:
|
||||
guest_unlock_component(vm);
|
||||
unlock_host:
|
||||
host_unlock_component();
|
||||
hyp_spin_unlock(&vm_table_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
|
||||
{
|
||||
u64 ipa = hyp_pfn_to_phys(gfn);
|
||||
kvm_pte_t pte;
|
||||
u64 phys;
|
||||
int ret;
|
||||
|
||||
host_lock_component();
|
||||
guest_lock_component(vm);
|
||||
|
||||
ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
|
||||
if (ret)
|
||||
goto unlock;
|
||||
|
||||
switch (guest_get_page_state(pte, ipa)) {
|
||||
case PKVM_PAGE_OWNED:
|
||||
WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE));
|
||||
hyp_poison_page(phys);
|
||||
break;
|
||||
case PKVM_PAGE_SHARED_OWNED:
|
||||
WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
|
||||
break;
|
||||
default:
|
||||
ret = -EPERM;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
|
||||
WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
|
||||
|
||||
unlock:
|
||||
guest_unlock_component(vm);
|
||||
host_unlock_component();
|
||||
|
||||
/*
|
||||
* -EHWPOISON implies that the page was forcefully reclaimed already
|
||||
* so return success for the GUP pin to be dropped.
|
||||
*/
|
||||
return ret && ret != -EHWPOISON ? ret : 0;
|
||||
}
|
||||
|
||||
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
|
||||
{
|
||||
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
|
||||
u64 phys = hyp_pfn_to_phys(pfn);
|
||||
u64 ipa = hyp_pfn_to_phys(gfn);
|
||||
u64 meta;
|
||||
int ret;
|
||||
|
||||
host_lock_component();
|
||||
guest_lock_component(vm);
|
||||
|
||||
ret = __host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED);
|
||||
if (ret)
|
||||
goto unlock;
|
||||
|
||||
ret = __guest_check_page_state_range(vm, ipa, PAGE_SIZE, PKVM_NOPAGE);
|
||||
if (ret)
|
||||
goto unlock;
|
||||
|
||||
meta = host_stage2_encode_gfn_meta(vm, gfn);
|
||||
WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
|
||||
PKVM_ID_GUEST, meta));
|
||||
WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
|
||||
pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
|
||||
&vcpu->vcpu.arch.pkvm_memcache, 0));
|
||||
|
||||
unlock:
|
||||
guest_unlock_component(vm);
|
||||
host_unlock_component();
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
|
||||
enum kvm_pgtable_prot prot)
|
||||
{
|
||||
|
|
@ -1206,53 +1648,18 @@ struct pkvm_expected_state {
|
|||
|
||||
static struct pkvm_expected_state selftest_state;
|
||||
static struct hyp_page *selftest_page;
|
||||
|
||||
static struct pkvm_hyp_vm selftest_vm = {
|
||||
.kvm = {
|
||||
.arch = {
|
||||
.mmu = {
|
||||
.arch = &selftest_vm.kvm.arch,
|
||||
.pgt = &selftest_vm.pgt,
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
static struct pkvm_hyp_vcpu selftest_vcpu = {
|
||||
.vcpu = {
|
||||
.arch = {
|
||||
.hw_mmu = &selftest_vm.kvm.arch.mmu,
|
||||
},
|
||||
.kvm = &selftest_vm.kvm,
|
||||
},
|
||||
};
|
||||
|
||||
static void init_selftest_vm(void *virt)
|
||||
{
|
||||
struct hyp_page *p = hyp_virt_to_page(virt);
|
||||
int i;
|
||||
|
||||
selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
|
||||
WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
|
||||
|
||||
for (i = 0; i < pkvm_selftest_pages(); i++) {
|
||||
if (p[i].refcount)
|
||||
continue;
|
||||
p[i].refcount = 1;
|
||||
hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
|
||||
}
|
||||
}
|
||||
static struct pkvm_hyp_vcpu *selftest_vcpu;
|
||||
|
||||
static u64 selftest_ipa(void)
|
||||
{
|
||||
return BIT(selftest_vm.pgt.ia_bits - 1);
|
||||
return BIT(selftest_vcpu->vcpu.arch.hw_mmu->pgt->ia_bits - 1);
|
||||
}
|
||||
|
||||
static void assert_page_state(void)
|
||||
{
|
||||
void *virt = hyp_page_to_virt(selftest_page);
|
||||
u64 size = PAGE_SIZE << selftest_page->order;
|
||||
struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
|
||||
struct pkvm_hyp_vcpu *vcpu = selftest_vcpu;
|
||||
u64 phys = hyp_virt_to_phys(virt);
|
||||
u64 ipa[2] = { selftest_ipa(), selftest_ipa() + PAGE_SIZE };
|
||||
struct pkvm_hyp_vm *vm;
|
||||
|
|
@ -1267,10 +1674,10 @@ static void assert_page_state(void)
|
|||
WARN_ON(__hyp_check_page_state_range(phys, size, selftest_state.hyp));
|
||||
hyp_unlock_component();
|
||||
|
||||
guest_lock_component(&selftest_vm);
|
||||
guest_lock_component(vm);
|
||||
WARN_ON(__guest_check_page_state_range(vm, ipa[0], size, selftest_state.guest[0]));
|
||||
WARN_ON(__guest_check_page_state_range(vm, ipa[1], size, selftest_state.guest[1]));
|
||||
guest_unlock_component(&selftest_vm);
|
||||
guest_unlock_component(vm);
|
||||
}
|
||||
|
||||
#define assert_transition_res(res, fn, ...) \
|
||||
|
|
@ -1283,14 +1690,15 @@ void pkvm_ownership_selftest(void *base)
|
|||
{
|
||||
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RWX;
|
||||
void *virt = hyp_alloc_pages(&host_s2_pool, 0);
|
||||
struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
|
||||
struct pkvm_hyp_vm *vm = &selftest_vm;
|
||||
struct pkvm_hyp_vcpu *vcpu;
|
||||
u64 phys, size, pfn, gfn;
|
||||
struct pkvm_hyp_vm *vm;
|
||||
|
||||
WARN_ON(!virt);
|
||||
selftest_page = hyp_virt_to_page(virt);
|
||||
selftest_page->refcount = 0;
|
||||
init_selftest_vm(base);
|
||||
selftest_vcpu = vcpu = init_selftest_vm(base);
|
||||
vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
|
||||
|
||||
size = PAGE_SIZE << selftest_page->order;
|
||||
phys = hyp_virt_to_phys(virt);
|
||||
|
|
@ -1309,6 +1717,7 @@ void pkvm_ownership_selftest(void *base)
|
|||
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
|
||||
selftest_state.host = PKVM_PAGE_OWNED;
|
||||
selftest_state.hyp = PKVM_NOPAGE;
|
||||
|
|
@ -1328,6 +1737,7 @@ void pkvm_ownership_selftest(void *base)
|
|||
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
|
||||
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
|
||||
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
|
||||
|
|
@ -1340,6 +1750,7 @@ void pkvm_ownership_selftest(void *base)
|
|||
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
|
||||
hyp_unpin_shared_mem(virt, virt + size);
|
||||
assert_page_state();
|
||||
|
|
@ -1359,6 +1770,7 @@ void pkvm_ownership_selftest(void *base)
|
|||
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
|
||||
|
||||
selftest_state.host = PKVM_PAGE_OWNED;
|
||||
|
|
@ -1375,6 +1787,7 @@ void pkvm_ownership_selftest(void *base)
|
|||
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
|
||||
|
||||
selftest_state.guest[1] = PKVM_PAGE_SHARED_BORROWED;
|
||||
|
|
@ -1388,10 +1801,70 @@ void pkvm_ownership_selftest(void *base)
|
|||
selftest_state.host = PKVM_PAGE_OWNED;
|
||||
assert_transition_res(0, __pkvm_host_unshare_guest, gfn + 1, 1, vm);
|
||||
|
||||
selftest_state.host = PKVM_NOPAGE;
|
||||
selftest_state.guest[0] = PKVM_PAGE_OWNED;
|
||||
assert_transition_res(0, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
|
||||
|
||||
selftest_state.host = PKVM_PAGE_SHARED_BORROWED;
|
||||
selftest_state.guest[0] = PKVM_PAGE_SHARED_OWNED;
|
||||
assert_transition_res(0, __pkvm_guest_share_host, vcpu, gfn);
|
||||
assert_transition_res(-EPERM, __pkvm_guest_share_host, vcpu, gfn);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
|
||||
|
||||
selftest_state.host = PKVM_NOPAGE;
|
||||
selftest_state.guest[0] = PKVM_PAGE_OWNED;
|
||||
assert_transition_res(0, __pkvm_guest_unshare_host, vcpu, gfn);
|
||||
assert_transition_res(-EPERM, __pkvm_guest_unshare_host, vcpu, gfn);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
|
||||
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
|
||||
|
||||
selftest_state.host = PKVM_PAGE_OWNED;
|
||||
selftest_state.guest[0] = PKVM_POISON;
|
||||
assert_transition_res(0, __pkvm_host_force_reclaim_page_guest, phys);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
assert_transition_res(-EHWPOISON, __pkvm_guest_share_host, vcpu, gfn);
|
||||
assert_transition_res(-EHWPOISON, __pkvm_guest_unshare_host, vcpu, gfn);
|
||||
|
||||
selftest_state.host = PKVM_NOPAGE;
|
||||
selftest_state.guest[1] = PKVM_PAGE_OWNED;
|
||||
assert_transition_res(0, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
|
||||
|
||||
selftest_state.host = PKVM_PAGE_OWNED;
|
||||
selftest_state.guest[1] = PKVM_NOPAGE;
|
||||
assert_transition_res(0, __pkvm_host_reclaim_page_guest, gfn + 1, vm);
|
||||
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
|
||||
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
|
||||
|
||||
selftest_state.host = PKVM_NOPAGE;
|
||||
selftest_state.hyp = PKVM_PAGE_OWNED;
|
||||
assert_transition_res(0, __pkvm_host_donate_hyp, pfn, 1);
|
||||
|
||||
teardown_selftest_vm();
|
||||
selftest_page->refcount = 1;
|
||||
hyp_put_page(&host_s2_pool, virt);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -4,6 +4,8 @@
|
|||
* Author: Fuad Tabba <tabba@google.com>
|
||||
*/
|
||||
|
||||
#include <kvm/arm_hypercalls.h>
|
||||
|
||||
#include <linux/kvm_host.h>
|
||||
#include <linux/mm.h>
|
||||
|
||||
|
|
@ -222,6 +224,7 @@ static struct pkvm_hyp_vm **vm_table;
|
|||
|
||||
void pkvm_hyp_vm_table_init(void *tbl)
|
||||
{
|
||||
BUILD_BUG_ON((u64)HANDLE_OFFSET + KVM_MAX_PVMS > (pkvm_handle_t)-1);
|
||||
WARN_ON(vm_table);
|
||||
vm_table = tbl;
|
||||
}
|
||||
|
|
@ -229,10 +232,12 @@ void pkvm_hyp_vm_table_init(void *tbl)
|
|||
/*
|
||||
* Return the hyp vm structure corresponding to the handle.
|
||||
*/
|
||||
static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
|
||||
struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
|
||||
{
|
||||
unsigned int idx = vm_handle_to_idx(handle);
|
||||
|
||||
hyp_assert_lock_held(&vm_table_lock);
|
||||
|
||||
if (unlikely(idx >= KVM_MAX_PVMS))
|
||||
return NULL;
|
||||
|
||||
|
|
@ -255,7 +260,10 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
|
|||
|
||||
hyp_spin_lock(&vm_table_lock);
|
||||
hyp_vm = get_vm_by_handle(handle);
|
||||
if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
|
||||
if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying)
|
||||
goto unlock;
|
||||
|
||||
if (hyp_vm->kvm.created_vcpus <= vcpu_idx)
|
||||
goto unlock;
|
||||
|
||||
hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
|
||||
|
|
@ -719,6 +727,55 @@ void __pkvm_unreserve_vm(pkvm_handle_t handle)
|
|||
hyp_spin_unlock(&vm_table_lock);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_NVHE_EL2_DEBUG
|
||||
static struct pkvm_hyp_vm selftest_vm = {
|
||||
.kvm = {
|
||||
.arch = {
|
||||
.mmu = {
|
||||
.arch = &selftest_vm.kvm.arch,
|
||||
.pgt = &selftest_vm.pgt,
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
static struct pkvm_hyp_vcpu selftest_vcpu = {
|
||||
.vcpu = {
|
||||
.arch = {
|
||||
.hw_mmu = &selftest_vm.kvm.arch.mmu,
|
||||
},
|
||||
.kvm = &selftest_vm.kvm,
|
||||
},
|
||||
};
|
||||
|
||||
struct pkvm_hyp_vcpu *init_selftest_vm(void *virt)
|
||||
{
|
||||
struct hyp_page *p = hyp_virt_to_page(virt);
|
||||
int i;
|
||||
|
||||
selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
|
||||
WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
|
||||
|
||||
for (i = 0; i < pkvm_selftest_pages(); i++) {
|
||||
if (p[i].refcount)
|
||||
continue;
|
||||
p[i].refcount = 1;
|
||||
hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
|
||||
}
|
||||
|
||||
selftest_vm.kvm.arch.pkvm.handle = __pkvm_reserve_vm();
|
||||
insert_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle, &selftest_vm);
|
||||
return &selftest_vcpu;
|
||||
}
|
||||
|
||||
void teardown_selftest_vm(void)
|
||||
{
|
||||
hyp_spin_lock(&vm_table_lock);
|
||||
remove_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle);
|
||||
hyp_spin_unlock(&vm_table_lock);
|
||||
}
|
||||
#endif /* CONFIG_NVHE_EL2_DEBUG */
|
||||
|
||||
/*
|
||||
* Initialize the hypervisor copy of the VM state using host-donated memory.
|
||||
*
|
||||
|
|
@ -859,7 +916,54 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
|
|||
unmap_donated_memory_noclear(addr, size);
|
||||
}
|
||||
|
||||
int __pkvm_teardown_vm(pkvm_handle_t handle)
|
||||
int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn)
|
||||
{
|
||||
struct pkvm_hyp_vm *hyp_vm = get_pkvm_hyp_vm(handle);
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (!hyp_vm)
|
||||
return ret;
|
||||
|
||||
if (hyp_vm->kvm.arch.pkvm.is_dying)
|
||||
ret = __pkvm_host_reclaim_page_guest(gfn, hyp_vm);
|
||||
|
||||
put_pkvm_hyp_vm(hyp_vm);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static struct pkvm_hyp_vm *get_pkvm_unref_hyp_vm_locked(pkvm_handle_t handle)
|
||||
{
|
||||
struct pkvm_hyp_vm *hyp_vm;
|
||||
|
||||
hyp_assert_lock_held(&vm_table_lock);
|
||||
|
||||
hyp_vm = get_vm_by_handle(handle);
|
||||
if (!hyp_vm || hyp_page_count(hyp_vm))
|
||||
return NULL;
|
||||
|
||||
return hyp_vm;
|
||||
}
|
||||
|
||||
int __pkvm_start_teardown_vm(pkvm_handle_t handle)
|
||||
{
|
||||
struct pkvm_hyp_vm *hyp_vm;
|
||||
int ret = 0;
|
||||
|
||||
hyp_spin_lock(&vm_table_lock);
|
||||
hyp_vm = get_pkvm_unref_hyp_vm_locked(handle);
|
||||
if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying) {
|
||||
ret = -EINVAL;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
hyp_vm->kvm.arch.pkvm.is_dying = true;
|
||||
unlock:
|
||||
hyp_spin_unlock(&vm_table_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
|
||||
{
|
||||
struct kvm_hyp_memcache *mc, *stage2_mc;
|
||||
struct pkvm_hyp_vm *hyp_vm;
|
||||
|
|
@ -869,14 +973,9 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
|
|||
int err;
|
||||
|
||||
hyp_spin_lock(&vm_table_lock);
|
||||
hyp_vm = get_vm_by_handle(handle);
|
||||
if (!hyp_vm) {
|
||||
err = -ENOENT;
|
||||
goto err_unlock;
|
||||
}
|
||||
|
||||
if (WARN_ON(hyp_page_count(hyp_vm))) {
|
||||
err = -EBUSY;
|
||||
hyp_vm = get_pkvm_unref_hyp_vm_locked(handle);
|
||||
if (!hyp_vm || !hyp_vm->kvm.arch.pkvm.is_dying) {
|
||||
err = -EINVAL;
|
||||
goto err_unlock;
|
||||
}
|
||||
|
||||
|
|
@ -922,3 +1021,121 @@ err_unlock:
|
|||
hyp_spin_unlock(&vm_table_lock);
|
||||
return err;
|
||||
}
|
||||
|
||||
static u64 __pkvm_memshare_page_req(struct kvm_vcpu *vcpu, u64 ipa)
|
||||
{
|
||||
u64 elr;
|
||||
|
||||
/* Fake up a data abort (level 3 translation fault on write) */
|
||||
vcpu->arch.fault.esr_el2 = (ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT) |
|
||||
ESR_ELx_WNR | ESR_ELx_FSC_FAULT |
|
||||
FIELD_PREP(ESR_ELx_FSC_LEVEL, 3);
|
||||
|
||||
/* Shuffle the IPA around into the HPFAR */
|
||||
vcpu->arch.fault.hpfar_el2 = (HPFAR_EL2_NS | (ipa >> 8)) & HPFAR_MASK;
|
||||
|
||||
/* This is a virtual address. 0's good. Let's go with 0. */
|
||||
vcpu->arch.fault.far_el2 = 0;
|
||||
|
||||
/* Rewind the ELR so we return to the HVC once the IPA is mapped */
|
||||
elr = read_sysreg(elr_el2);
|
||||
elr -= 4;
|
||||
write_sysreg(elr, elr_el2);
|
||||
|
||||
return ARM_EXCEPTION_TRAP;
|
||||
}
|
||||
|
||||
static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
|
||||
{
|
||||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
u64 ipa = smccc_get_arg1(vcpu);
|
||||
|
||||
if (!PAGE_ALIGNED(ipa))
|
||||
goto out_guest;
|
||||
|
||||
hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
|
||||
switch (__pkvm_guest_share_host(hyp_vcpu, hyp_phys_to_pfn(ipa))) {
|
||||
case 0:
|
||||
ret[0] = SMCCC_RET_SUCCESS;
|
||||
goto out_guest;
|
||||
case -ENOENT:
|
||||
/*
|
||||
* Convert the exception into a data abort so that the page
|
||||
* being shared is mapped into the guest next time.
|
||||
*/
|
||||
*exit_code = __pkvm_memshare_page_req(vcpu, ipa);
|
||||
goto out_host;
|
||||
}
|
||||
|
||||
out_guest:
|
||||
return true;
|
||||
out_host:
|
||||
return false;
|
||||
}
|
||||
|
||||
static void pkvm_memunshare_call(u64 *ret, struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct pkvm_hyp_vcpu *hyp_vcpu;
|
||||
u64 ipa = smccc_get_arg1(vcpu);
|
||||
|
||||
if (!PAGE_ALIGNED(ipa))
|
||||
return;
|
||||
|
||||
hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
|
||||
if (!__pkvm_guest_unshare_host(hyp_vcpu, hyp_phys_to_pfn(ipa)))
|
||||
ret[0] = SMCCC_RET_SUCCESS;
|
||||
}
|
||||
|
||||
/*
|
||||
* Handler for protected VM HVC calls.
|
||||
*
|
||||
* Returns true if the hypervisor has handled the exit (and control
|
||||
* should return to the guest) or false if it hasn't (and the handling
|
||||
* should be performed by the host).
|
||||
*/
|
||||
bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
|
||||
{
|
||||
u64 val[4] = { SMCCC_RET_INVALID_PARAMETER };
|
||||
bool handled = true;
|
||||
|
||||
switch (smccc_get_function(vcpu)) {
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
|
||||
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
|
||||
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
|
||||
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
|
||||
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_UNSHARE);
|
||||
break;
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
|
||||
if (smccc_get_arg1(vcpu) ||
|
||||
smccc_get_arg2(vcpu) ||
|
||||
smccc_get_arg3(vcpu)) {
|
||||
break;
|
||||
}
|
||||
|
||||
val[0] = PAGE_SIZE;
|
||||
break;
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
|
||||
if (smccc_get_arg2(vcpu) ||
|
||||
smccc_get_arg3(vcpu)) {
|
||||
break;
|
||||
}
|
||||
|
||||
handled = pkvm_memshare_call(val, vcpu, exit_code);
|
||||
break;
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
|
||||
if (smccc_get_arg2(vcpu) ||
|
||||
smccc_get_arg3(vcpu)) {
|
||||
break;
|
||||
}
|
||||
|
||||
pkvm_memunshare_call(val, vcpu);
|
||||
break;
|
||||
default:
|
||||
/* Punt everything else back to the host, for now. */
|
||||
handled = false;
|
||||
}
|
||||
|
||||
if (handled)
|
||||
smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
|
||||
return handled;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -205,6 +205,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
|
|||
|
||||
static const exit_handler_fn pvm_exit_handlers[] = {
|
||||
[0 ... ESR_ELx_EC_MAX] = NULL,
|
||||
[ESR_ELx_EC_HVC64] = kvm_handle_pvm_hvc64,
|
||||
[ESR_ELx_EC_SYS64] = kvm_handle_pvm_sys64,
|
||||
[ESR_ELx_EC_SVE] = kvm_handle_pvm_restricted,
|
||||
[ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
|
||||
|
|
|
|||
|
|
@ -400,6 +400,14 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
|
|||
/* Cache maintenance by set/way operations are restricted. */
|
||||
|
||||
/* Debug and Trace Registers are restricted. */
|
||||
RAZ_WI(SYS_DBGBVRn_EL1(0)),
|
||||
RAZ_WI(SYS_DBGBCRn_EL1(0)),
|
||||
RAZ_WI(SYS_DBGWVRn_EL1(0)),
|
||||
RAZ_WI(SYS_DBGWCRn_EL1(0)),
|
||||
RAZ_WI(SYS_MDSCR_EL1),
|
||||
RAZ_WI(SYS_OSLAR_EL1),
|
||||
RAZ_WI(SYS_OSLSR_EL1),
|
||||
RAZ_WI(SYS_OSDLR_EL1),
|
||||
|
||||
/* Group 1 ID registers */
|
||||
HOST_HANDLED(SYS_REVIDR_EL1),
|
||||
|
|
|
|||
|
|
@ -114,11 +114,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
|
|||
return pte;
|
||||
}
|
||||
|
||||
static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
|
||||
{
|
||||
return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
|
||||
}
|
||||
|
||||
static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
|
||||
const struct kvm_pgtable_visit_ctx *ctx,
|
||||
enum kvm_pgtable_walk_flags visit)
|
||||
|
|
@ -581,7 +576,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
|
|||
struct stage2_map_data {
|
||||
const u64 phys;
|
||||
kvm_pte_t attr;
|
||||
u8 owner_id;
|
||||
kvm_pte_t pte_annot;
|
||||
|
||||
kvm_pte_t *anchor;
|
||||
kvm_pte_t *childp;
|
||||
|
|
@ -798,7 +793,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
|
|||
|
||||
static bool stage2_pte_is_locked(kvm_pte_t pte)
|
||||
{
|
||||
return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
|
||||
if (kvm_pte_valid(pte))
|
||||
return false;
|
||||
|
||||
return FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) ==
|
||||
KVM_INVALID_PTE_TYPE_LOCKED;
|
||||
}
|
||||
|
||||
static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
|
||||
|
|
@ -829,6 +828,7 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
|
|||
struct kvm_s2_mmu *mmu)
|
||||
{
|
||||
struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
|
||||
kvm_pte_t locked_pte;
|
||||
|
||||
if (stage2_pte_is_locked(ctx->old)) {
|
||||
/*
|
||||
|
|
@ -839,7 +839,9 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
|
|||
return false;
|
||||
}
|
||||
|
||||
if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
|
||||
locked_pte = FIELD_PREP(KVM_INVALID_PTE_TYPE_MASK,
|
||||
KVM_INVALID_PTE_TYPE_LOCKED);
|
||||
if (!stage2_try_set_pte(ctx, locked_pte))
|
||||
return false;
|
||||
|
||||
if (!kvm_pgtable_walk_skip_bbm_tlbi(ctx)) {
|
||||
|
|
@ -964,7 +966,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
|
|||
if (!data->annotation)
|
||||
new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
|
||||
else
|
||||
new = kvm_init_invalid_leaf_owner(data->owner_id);
|
||||
new = data->pte_annot;
|
||||
|
||||
/*
|
||||
* Skip updating the PTE if we are trying to recreate the exact
|
||||
|
|
@ -1118,16 +1120,18 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
|||
return ret;
|
||||
}
|
||||
|
||||
int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
||||
void *mc, u8 owner_id)
|
||||
int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
||||
void *mc, enum kvm_invalid_pte_type type,
|
||||
kvm_pte_t pte_annot)
|
||||
{
|
||||
int ret;
|
||||
struct stage2_map_data map_data = {
|
||||
.mmu = pgt->mmu,
|
||||
.memcache = mc,
|
||||
.owner_id = owner_id,
|
||||
.force_pte = true,
|
||||
.annotation = true,
|
||||
.pte_annot = pte_annot |
|
||||
FIELD_PREP(KVM_INVALID_PTE_TYPE_MASK, type),
|
||||
};
|
||||
struct kvm_pgtable_walker walker = {
|
||||
.cb = stage2_map_walker,
|
||||
|
|
@ -1136,7 +1140,10 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
|||
.arg = &map_data,
|
||||
};
|
||||
|
||||
if (owner_id > KVM_MAX_OWNER_ID)
|
||||
if (pte_annot & ~KVM_INVALID_PTE_ANNOT_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
if (!type || type == KVM_INVALID_PTE_TYPE_LOCKED)
|
||||
return -EINVAL;
|
||||
|
||||
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
|
||||
|
|
|
|||
|
|
@ -340,6 +340,9 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
|
|||
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
|
||||
u64 size, bool may_block)
|
||||
{
|
||||
if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
|
||||
return;
|
||||
|
||||
__unmap_stage2_range(mmu, start, size, may_block);
|
||||
}
|
||||
|
||||
|
|
@ -878,9 +881,6 @@ static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
|
|||
u64 mmfr0, mmfr1;
|
||||
u32 phys_shift;
|
||||
|
||||
if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
|
||||
if (is_protected_kvm_enabled()) {
|
||||
phys_shift = kvm_ipa_limit;
|
||||
|
|
@ -1659,6 +1659,75 @@ struct kvm_s2_fault_vma_info {
|
|||
bool map_non_cacheable;
|
||||
};
|
||||
|
||||
static int pkvm_mem_abort(const struct kvm_s2_fault_desc *s2fd)
|
||||
{
|
||||
unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
|
||||
struct kvm_vcpu *vcpu = s2fd->vcpu;
|
||||
struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
|
||||
struct mm_struct *mm = current->mm;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
void *hyp_memcache;
|
||||
struct page *page;
|
||||
int ret;
|
||||
|
||||
hyp_memcache = get_mmu_memcache(vcpu);
|
||||
ret = topup_mmu_memcache(vcpu, hyp_memcache);
|
||||
if (ret)
|
||||
return -ENOMEM;
|
||||
|
||||
ret = account_locked_vm(mm, 1, true);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
mmap_read_lock(mm);
|
||||
ret = pin_user_pages(s2fd->hva, 1, flags, &page);
|
||||
mmap_read_unlock(mm);
|
||||
|
||||
if (ret == -EHWPOISON) {
|
||||
kvm_send_hwpoison_signal(s2fd->hva, PAGE_SHIFT);
|
||||
ret = 0;
|
||||
goto dec_account;
|
||||
} else if (ret != 1) {
|
||||
ret = -EFAULT;
|
||||
goto dec_account;
|
||||
} else if (!folio_test_swapbacked(page_folio(page))) {
|
||||
/*
|
||||
* We really can't deal with page-cache pages returned by GUP
|
||||
* because (a) we may trigger writeback of a page for which we
|
||||
* no longer have access and (b) page_mkclean() won't find the
|
||||
* stage-2 mapping in the rmap so we can get out-of-whack with
|
||||
* the filesystem when marking the page dirty during unpinning
|
||||
* (see cc5095747edf ("ext4: don't BUG if someone dirty pages
|
||||
* without asking ext4 first")).
|
||||
*
|
||||
* Ideally we'd just restrict ourselves to anonymous pages, but
|
||||
* we also want to allow memfd (i.e. shmem) pages, so check for
|
||||
* pages backed by swap in the knowledge that the GUP pin will
|
||||
* prevent try_to_unmap() from succeeding.
|
||||
*/
|
||||
ret = -EIO;
|
||||
goto unpin;
|
||||
}
|
||||
|
||||
write_lock(&kvm->mmu_lock);
|
||||
ret = pkvm_pgtable_stage2_map(pgt, s2fd->fault_ipa, PAGE_SIZE,
|
||||
page_to_phys(page), KVM_PGTABLE_PROT_RWX,
|
||||
hyp_memcache, 0);
|
||||
write_unlock(&kvm->mmu_lock);
|
||||
if (ret) {
|
||||
if (ret == -EAGAIN)
|
||||
ret = 0;
|
||||
goto unpin;
|
||||
}
|
||||
|
||||
return 0;
|
||||
unpin:
|
||||
unpin_user_pages(&page, 1);
|
||||
dec_account:
|
||||
account_locked_vm(mm, 1, false);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
|
||||
struct kvm_s2_fault_vma_info *s2vi,
|
||||
struct vm_area_struct *vma)
|
||||
|
|
@ -2285,9 +2354,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
|
|||
goto out_unlock;
|
||||
}
|
||||
|
||||
VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
|
||||
!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
|
||||
|
||||
const struct kvm_s2_fault_desc s2fd = {
|
||||
.vcpu = vcpu,
|
||||
.fault_ipa = fault_ipa,
|
||||
|
|
@ -2296,10 +2362,18 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
|
|||
.hva = hva,
|
||||
};
|
||||
|
||||
if (kvm_slot_has_gmem(memslot))
|
||||
ret = gmem_abort(&s2fd);
|
||||
else
|
||||
ret = user_mem_abort(&s2fd);
|
||||
if (kvm_vm_is_protected(vcpu->kvm)) {
|
||||
ret = pkvm_mem_abort(&s2fd);
|
||||
} else {
|
||||
VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
|
||||
!write_fault &&
|
||||
!kvm_vcpu_trap_is_exec_fault(vcpu));
|
||||
|
||||
if (kvm_slot_has_gmem(memslot))
|
||||
ret = gmem_abort(&s2fd);
|
||||
else
|
||||
ret = user_mem_abort(&s2fd);
|
||||
}
|
||||
|
||||
if (ret == 0)
|
||||
ret = 1;
|
||||
|
|
@ -2313,7 +2387,7 @@ out_unlock:
|
|||
|
||||
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
|
||||
{
|
||||
if (!kvm->arch.mmu.pgt)
|
||||
if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
|
||||
return false;
|
||||
|
||||
__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
|
||||
|
|
@ -2328,7 +2402,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
|
|||
{
|
||||
u64 size = (range->end - range->start) << PAGE_SHIFT;
|
||||
|
||||
if (!kvm->arch.mmu.pgt)
|
||||
if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
|
||||
return false;
|
||||
|
||||
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
|
||||
|
|
@ -2344,7 +2418,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
|
|||
{
|
||||
u64 size = (range->end - range->start) << PAGE_SHIFT;
|
||||
|
||||
if (!kvm->arch.mmu.pgt)
|
||||
if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
|
||||
return false;
|
||||
|
||||
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
|
||||
|
|
@ -2501,6 +2575,19 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
|
|||
hva_t hva, reg_end;
|
||||
int ret = 0;
|
||||
|
||||
if (kvm_vm_is_protected(kvm)) {
|
||||
/* Cannot modify memslots once a pVM has run. */
|
||||
if (pkvm_hyp_vm_is_created(kvm) &&
|
||||
(change == KVM_MR_DELETE || change == KVM_MR_MOVE)) {
|
||||
return -EPERM;
|
||||
}
|
||||
|
||||
if (new &&
|
||||
new->flags & (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) {
|
||||
return -EPERM;
|
||||
}
|
||||
}
|
||||
|
||||
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
|
||||
change != KVM_MR_FLAGS_ONLY)
|
||||
return 0;
|
||||
|
|
|
|||
|
|
@ -88,7 +88,7 @@ void __init kvm_hyp_reserve(void)
|
|||
static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
|
||||
{
|
||||
if (pkvm_hyp_vm_is_created(kvm)) {
|
||||
WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
|
||||
WARN_ON(kvm_call_hyp_nvhe(__pkvm_finalize_teardown_vm,
|
||||
kvm->arch.pkvm.handle));
|
||||
} else if (kvm->arch.pkvm.handle) {
|
||||
/*
|
||||
|
|
@ -192,10 +192,16 @@ int pkvm_create_hyp_vm(struct kvm *kvm)
|
|||
{
|
||||
int ret = 0;
|
||||
|
||||
/*
|
||||
* Synchronise with kvm_arch_prepare_memory_region(), as we
|
||||
* prevent memslot modifications on a pVM that has been run.
|
||||
*/
|
||||
mutex_lock(&kvm->slots_lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
if (!pkvm_hyp_vm_is_created(kvm))
|
||||
ret = __pkvm_create_hyp_vm(kvm);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
mutex_unlock(&kvm->slots_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
|
@ -219,9 +225,10 @@ void pkvm_destroy_hyp_vm(struct kvm *kvm)
|
|||
mutex_unlock(&kvm->arch.config_lock);
|
||||
}
|
||||
|
||||
int pkvm_init_host_vm(struct kvm *kvm)
|
||||
int pkvm_init_host_vm(struct kvm *kvm, unsigned long type)
|
||||
{
|
||||
int ret;
|
||||
bool protected = type & KVM_VM_TYPE_ARM_PROTECTED;
|
||||
|
||||
if (pkvm_hyp_vm_is_created(kvm))
|
||||
return -EINVAL;
|
||||
|
|
@ -236,6 +243,11 @@ int pkvm_init_host_vm(struct kvm *kvm)
|
|||
return ret;
|
||||
|
||||
kvm->arch.pkvm.handle = ret;
|
||||
kvm->arch.pkvm.is_protected = protected;
|
||||
if (protected) {
|
||||
pr_warn_once("kvm: protected VMs are experimental and for development only, tainting kernel\n");
|
||||
add_taint(TAINT_USER, LOCKDEP_STILL_OK);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
|
@ -322,15 +334,38 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
|
|||
return 0;
|
||||
}
|
||||
|
||||
static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end)
|
||||
static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64 end)
|
||||
{
|
||||
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
|
||||
pkvm_handle_t handle = kvm->arch.pkvm.handle;
|
||||
struct pkvm_mapping *mapping;
|
||||
int ret;
|
||||
|
||||
if (!handle)
|
||||
return 0;
|
||||
for_each_mapping_in_range_safe(pgt, start, end, mapping) {
|
||||
struct page *page;
|
||||
|
||||
ret = kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page,
|
||||
handle, mapping->gfn);
|
||||
if (WARN_ON(ret))
|
||||
continue;
|
||||
|
||||
page = pfn_to_page(mapping->pfn);
|
||||
WARN_ON_ONCE(mapping->nr_pages != 1);
|
||||
unpin_user_pages_dirty_lock(&page, 1, true);
|
||||
account_locked_vm(current->mm, 1, false);
|
||||
pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
|
||||
kfree(mapping);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
|
||||
{
|
||||
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
|
||||
pkvm_handle_t handle = kvm->arch.pkvm.handle;
|
||||
struct pkvm_mapping *mapping;
|
||||
int ret;
|
||||
|
||||
for_each_mapping_in_range_safe(pgt, start, end, mapping) {
|
||||
ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn,
|
||||
|
|
@ -347,7 +382,21 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
|
|||
void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
|
||||
u64 addr, u64 size)
|
||||
{
|
||||
__pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
|
||||
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
|
||||
pkvm_handle_t handle = kvm->arch.pkvm.handle;
|
||||
|
||||
if (!handle)
|
||||
return;
|
||||
|
||||
if (pkvm_hyp_vm_is_created(kvm) && !kvm->arch.pkvm.is_dying) {
|
||||
WARN_ON(kvm_call_hyp_nvhe(__pkvm_start_teardown_vm, handle));
|
||||
kvm->arch.pkvm.is_dying = true;
|
||||
}
|
||||
|
||||
if (kvm_vm_is_protected(kvm))
|
||||
__pkvm_pgtable_stage2_reclaim(pgt, addr, addr + size);
|
||||
else
|
||||
__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
|
||||
}
|
||||
|
||||
void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
|
||||
|
|
@ -365,31 +414,58 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
|||
struct kvm_hyp_memcache *cache = mc;
|
||||
u64 gfn = addr >> PAGE_SHIFT;
|
||||
u64 pfn = phys >> PAGE_SHIFT;
|
||||
u64 end = addr + size;
|
||||
int ret;
|
||||
|
||||
if (size != PAGE_SIZE && size != PMD_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
lockdep_assert_held_write(&kvm->mmu_lock);
|
||||
mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, end - 1);
|
||||
|
||||
/*
|
||||
* Calling stage2_map() on top of existing mappings is either happening because of a race
|
||||
* with another vCPU, or because we're changing between page and block mappings. As per
|
||||
* user_mem_abort(), same-size permission faults are handled in the relax_perms() path.
|
||||
*/
|
||||
mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1);
|
||||
if (mapping) {
|
||||
if (size == (mapping->nr_pages * PAGE_SIZE))
|
||||
return -EAGAIN;
|
||||
if (kvm_vm_is_protected(kvm)) {
|
||||
/* Protected VMs are mapped using RWX page-granular mappings */
|
||||
if (WARN_ON_ONCE(size != PAGE_SIZE))
|
||||
return -EINVAL;
|
||||
|
||||
/* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
|
||||
ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
|
||||
if (ret)
|
||||
return ret;
|
||||
mapping = NULL;
|
||||
if (WARN_ON_ONCE(prot != KVM_PGTABLE_PROT_RWX))
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* We either raced with another vCPU or the guest PTE
|
||||
* has been poisoned by an erroneous host access.
|
||||
*/
|
||||
if (mapping) {
|
||||
ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault);
|
||||
return ret ? -EFAULT : -EAGAIN;
|
||||
}
|
||||
|
||||
ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
|
||||
} else {
|
||||
if (WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE))
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* We either raced with another vCPU or we're changing between
|
||||
* page and block mappings. As per user_mem_abort(), same-size
|
||||
* permission faults are handled in the relax_perms() path.
|
||||
*/
|
||||
if (mapping) {
|
||||
if (size == (mapping->nr_pages * PAGE_SIZE))
|
||||
return -EAGAIN;
|
||||
|
||||
/*
|
||||
* Remove _any_ pkvm_mapping overlapping with the range,
|
||||
* bigger or smaller.
|
||||
*/
|
||||
ret = __pkvm_pgtable_stage2_unshare(pgt, addr, end);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
mapping = NULL;
|
||||
}
|
||||
|
||||
ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn,
|
||||
size / PAGE_SIZE, prot);
|
||||
}
|
||||
|
||||
ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot);
|
||||
if (WARN_ON(ret))
|
||||
return ret;
|
||||
|
||||
|
|
@ -404,9 +480,14 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
|||
|
||||
int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
|
||||
{
|
||||
lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
|
||||
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
|
||||
|
||||
return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
|
||||
if (WARN_ON(kvm_vm_is_protected(kvm)))
|
||||
return -EPERM;
|
||||
|
||||
lockdep_assert_held_write(&kvm->mmu_lock);
|
||||
|
||||
return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
|
||||
}
|
||||
|
||||
int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
|
||||
|
|
@ -416,6 +497,9 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
|
|||
struct pkvm_mapping *mapping;
|
||||
int ret = 0;
|
||||
|
||||
if (WARN_ON(kvm_vm_is_protected(kvm)))
|
||||
return -EPERM;
|
||||
|
||||
lockdep_assert_held(&kvm->mmu_lock);
|
||||
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
|
||||
ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn,
|
||||
|
|
@ -447,6 +531,9 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
|
|||
struct pkvm_mapping *mapping;
|
||||
bool young = false;
|
||||
|
||||
if (WARN_ON(kvm_vm_is_protected(kvm)))
|
||||
return false;
|
||||
|
||||
lockdep_assert_held(&kvm->mmu_lock);
|
||||
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
|
||||
young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
|
||||
|
|
@ -458,12 +545,18 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
|
|||
int pkvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
|
||||
enum kvm_pgtable_walk_flags flags)
|
||||
{
|
||||
if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
|
||||
return -EPERM;
|
||||
|
||||
return kvm_call_hyp_nvhe(__pkvm_host_relax_perms_guest, addr >> PAGE_SHIFT, prot);
|
||||
}
|
||||
|
||||
void pkvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
|
||||
enum kvm_pgtable_walk_flags flags)
|
||||
{
|
||||
if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
|
||||
return;
|
||||
|
||||
WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
|
||||
}
|
||||
|
||||
|
|
@ -485,3 +578,15 @@ int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
|
|||
WARN_ON_ONCE(1);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Forcefully reclaim a page from the guest, zeroing its contents and
|
||||
* poisoning the stage-2 pte so that pages can no longer be mapped at
|
||||
* the same IPA. The page remains pinned until the guest is destroyed.
|
||||
*/
|
||||
bool pkvm_force_reclaim_guest_page(phys_addr_t phys)
|
||||
{
|
||||
int ret = kvm_call_hyp_nvhe(__pkvm_force_reclaim_guest_page, phys);
|
||||
|
||||
return !ret || ret == -EAGAIN;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -43,6 +43,7 @@
|
|||
#include <asm/system_misc.h>
|
||||
#include <asm/tlbflush.h>
|
||||
#include <asm/traps.h>
|
||||
#include <asm/virt.h>
|
||||
|
||||
struct fault_info {
|
||||
int (*fn)(unsigned long far, unsigned long esr,
|
||||
|
|
@ -269,6 +270,15 @@ static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr
|
|||
return false;
|
||||
}
|
||||
|
||||
static bool is_pkvm_stage2_abort(unsigned int esr)
|
||||
{
|
||||
/*
|
||||
* S1PTW should only ever be set in ESR_EL1 if the pkvm hypervisor
|
||||
* injected a stage-2 abort -- see host_inject_mem_abort().
|
||||
*/
|
||||
return is_pkvm_initialized() && (esr & ESR_ELx_S1PTW);
|
||||
}
|
||||
|
||||
static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
|
||||
unsigned long esr,
|
||||
struct pt_regs *regs)
|
||||
|
|
@ -289,8 +299,14 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
|
|||
* If we now have a valid translation, treat the translation fault as
|
||||
* spurious.
|
||||
*/
|
||||
if (!(par & SYS_PAR_EL1_F))
|
||||
if (!(par & SYS_PAR_EL1_F)) {
|
||||
if (is_pkvm_stage2_abort(esr)) {
|
||||
par &= SYS_PAR_EL1_PA;
|
||||
return pkvm_force_reclaim_guest_page(par);
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* If we got a different type of fault from the AT instruction,
|
||||
|
|
@ -376,9 +392,11 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
|
|||
if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
|
||||
return;
|
||||
|
||||
if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
|
||||
"Ignoring spurious kernel translation fault at virtual address %016lx\n", addr))
|
||||
if (is_spurious_el1_translation_fault(addr, esr, regs)) {
|
||||
WARN_RATELIMIT(!is_pkvm_stage2_abort(esr),
|
||||
"Ignoring spurious kernel translation fault at virtual address %016lx\n", addr);
|
||||
return;
|
||||
}
|
||||
|
||||
if (is_el1_mte_sync_tag_check_fault(esr)) {
|
||||
do_tag_recovery(addr, esr, regs);
|
||||
|
|
@ -395,6 +413,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
|
|||
msg = "read from unreadable memory";
|
||||
} else if (addr < PAGE_SIZE) {
|
||||
msg = "NULL pointer dereference";
|
||||
} else if (is_pkvm_stage2_abort(esr)) {
|
||||
msg = "access to hypervisor-protected memory";
|
||||
} else {
|
||||
if (esr_fsc_is_translation_fault(esr) &&
|
||||
kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
|
||||
|
|
@ -621,6 +641,13 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
|
|||
addr, esr, regs);
|
||||
}
|
||||
|
||||
if (is_pkvm_stage2_abort(esr)) {
|
||||
if (!user_mode(regs))
|
||||
goto no_context;
|
||||
arm64_force_sig_fault(SIGSEGV, SEGV_ACCERR, far, "stage-2 fault");
|
||||
return 0;
|
||||
}
|
||||
|
||||
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
|
||||
|
||||
if (!(mm_flags & FAULT_FLAG_USER))
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
config ARM_PKVM_GUEST
|
||||
bool "Arm pKVM protected guest driver"
|
||||
depends on ARM64
|
||||
depends on ARM64 && DMA_RESTRICTED_POOL
|
||||
help
|
||||
Protected guests running under the pKVM hypervisor on arm64
|
||||
are isolated from the host and must issue hypercalls to enable
|
||||
|
|
|
|||
|
|
@ -703,6 +703,11 @@ struct kvm_enable_cap {
|
|||
#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL
|
||||
#define KVM_VM_TYPE_ARM_IPA_SIZE(x) \
|
||||
((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
|
||||
|
||||
#define KVM_VM_TYPE_ARM_PROTECTED (1UL << 31)
|
||||
#define KVM_VM_TYPE_ARM_MASK (KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
|
||||
KVM_VM_TYPE_ARM_PROTECTED)
|
||||
|
||||
/*
|
||||
* ioctls for /dev/kvm fds:
|
||||
*/
|
||||
|
|
|
|||
Loading…
Reference in New Issue