mirror-linux/arch
Roman Kagan d3466ce4f4 KVM: x86/pmu: Truncate counter value to allowed width on write
[ Upstream commit b29a2acd36 ]

Performance counters are defined to have width less than 64 bits.  The
vPMU code maintains the counters in u64 variables but assumes the value
to fit within the defined width.  However, for Intel non-full-width
counters (MSR_IA32_PERFCTRx) the value receieved from the guest is
truncated to 32 bits and then sign-extended to full 64 bits.  If a
negative value is set, it's sign-extended to 64 bits, but then in
kvm_pmu_incr_counter() it's incremented, truncated, and compared to the
previous value for overflow detection.

That previous value is not truncated, so it always evaluates bigger than
the truncated new one, and a PMI is injected.  If the PMI handler writes
a negative counter value itself, the vCPU never quits the PMI loop.

Turns out that Linux PMI handler actually does write the counter with
the value just read with RDPMC, so when no full-width support is exposed
via MSR_IA32_PERF_CAPABILITIES, and the guest initializes the counter to
a negative value, it locks up.

This has been observed in the field, for example, when the guest configures
atop to use perfevents and runs two instances of it simultaneously.

To address the problem, maintain the invariant that the counter value
always fits in the defined bit width, by truncating the received value
in the respective set_msr methods.  For better readability, factor the
out into a helper function, pmc_write_counter(), shared by vmx and svm
parts.

Fixes: 9cd803d496 ("KVM: x86: Update vPMCs when retiring instructions")
Cc: stable@vger.kernel.org
Signed-off-by: Roman Kagan <rkagan@amazon.de>
Link: https://lore.kernel.org/all/20230504120042.785651-1-rkagan@amazon.de
Tested-by: Like Xu <likexu@tencent.com>
[sean: tweak changelog, s/set/write in the helper]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-02 09:35:21 +01:00
..
alpha alpha: remove __init annotation from exported page_is_ram() 2023-08-16 18:27:31 +02:00
arc ARC: atomics: Add compiler barrier to atomic operations... 2023-09-19 12:28:04 +02:00
arm ARM: dts: ti: omap: Fix noisy serial with overrun-throttle-ms for mapphone 2023-10-25 12:03:09 +02:00
arm64 arm64: dts: mediatek: mt8195-demo: update and reorder reserved memory regions 2023-10-19 23:08:55 +02:00
csky
hexagon
ia64
loongarch LoongArch: numa: Fix high_memory calculation 2023-10-06 14:57:01 +02:00
m68k m68k: Fix invalid .section syntax 2023-09-13 09:42:21 +02:00
microblaze
mips MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled 2023-10-06 14:56:45 +02:00
nios2
openrisc
parisc parisc: Restore __ldcw_align for PA-RISC 2.0 processors 2023-10-10 22:00:45 +02:00
powerpc powerpc/64e: Fix wrong test in __ptep_test_and_clear_young() 2023-10-19 23:08:58 +02:00
riscv riscv, bpf: Sign-extend return values 2023-10-19 23:08:53 +02:00
s390 s390/pci: fix iommu bitmap allocation 2023-10-25 12:03:15 +02:00
sh sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory() 2023-09-19 12:28:04 +02:00
sparc
um um: Fix hostaudio build errors 2023-09-13 09:42:58 +02:00
x86 KVM: x86/pmu: Truncate counter value to allowed width on write 2023-11-02 09:35:21 +01:00
xtensa xtensa: boot/lib: fix function prototypes 2023-10-06 14:56:49 +02:00
.gitignore
Kconfig