mirror-linux/virt/kvm
Aaron Sacks 577a8d3bae KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
kvm_reset_dirty_gfn() guards the gfn range with

	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
		return;

but offset is u64 and the addition is unchecked.  The check can be
silently bypassed by a u64 wrap.

The dirty ring backing those entries is MAP_SHARED at
KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
slot and offset fields of any entry between when the kernel pushes
them and when KVM_RESET_DIRTY_RINGS consumes them.  On reset,
kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
them straight back into this check; only the flags handshake is
treated as the handover, the slot/offset payload is taken on trust.

Crafting two entries

	entry[i].offset   = 0xffffffffffffffc1
	entry[i+1].offset = 0

makes the coalescing loop in kvm_dirty_ring_reset() compute

	delta = (s64)(0 - 0xffffffffffffffc1) = 63

which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
existing mask by setting bit 63.  The trailing kvm_reset_dirty_gfn()
call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
the sum is 0 in u64 and the bounds check passes.

That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
unchanged.  On the legacy MMU path -- kvm_memslots_have_rmaps() ==
true, i.e. shadow paging, any VM that has allocated shadow roots, or
a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
slot->arch.rmap[0][] with a near-U64_MAX gfn.  That is an
out-of-bounds load of a kvm_rmap_head, followed by a conditional
clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
The path is reachable from any process holding /dev/kvm.

Range-check offset on its own first, so the addition cannot wrap.
memslot->npages is bounded well below U64_MAX, so once offset <
npages holds, offset + __fls(mask) (with __fls(mask) < BITS_PER_LONG)
stays in range.

Fixes: fb04a1eddb ("KVM: X86: Implement ring-based dirty memory tracking")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Sacks <contact@xchglabs.com>
Link: https://patch.msgid.link/20260512060742.1628959-1-contact@xchglabs.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12 22:16:16 +02:00
..
Kconfig KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER 2026-02-28 15:31:35 +01:00
Makefile.kvm KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GUEST_MEMFD 2025-08-27 04:34:59 -04:00
async_pf.c KVM: remove redundant __GFP_NOWARN 2025-08-19 11:51:13 -07:00
async_pf.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 2019-06-19 17:09:56 +02:00
binary_stats.c KVM: Remove subtle "struct kvm_stats_desc" pseudo-overlay 2026-01-08 10:40:48 -08:00
coalesced_mmio.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
coalesced_mmio.h
dirty_ring.c KVM: Reject wrapped offset in kvm_reset_dirty_gfn() 2026-05-12 22:16:16 +02:00
eventfd.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
guest_memfd.c Arm: 2026-04-17 07:18:03 -07:00
irqchip.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
kvm_main.c Arm: 2026-04-17 07:18:03 -07:00
kvm_mm.h KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes 2025-10-20 06:30:40 -07:00
pfncache.c KVM: pfncache: Precisely track refcounted pages 2024-10-25 12:57:59 -04:00
vfio.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
vfio.h