kvm_riscv_vcpu_pmu_event_info() returned -ENOMEM from the
SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall()
to abort KVM_RUN and surface the error to userspace instead of
completing the ECALL with a negative SBI error in a0.
Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU
handlers and kvm_sbi_ext_pmu_handler comment.
Fixes: e309fd113b ("RISC-V: KVM: Implement get event info function")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260514173642.41448-2-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
kvm_riscv_vcpu_pmu_snapshot_set_shmem() returned -ENOMEM from the
SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall() to
abort KVM_RUN and surface the error to userspace instead of
ompleting the ECALL with a negative SBI error in a0.
Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU
handlers and kvm_sbi_ext_pmu_handler comment.
Fixes: c2f41ddbcd ("RISC-V: KVM: Implement SBI PMU Snapshot feature")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260514173642.41448-1-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
- Add Kunit correctness testing and microbenchmarks for strlen(),
strnlen(), and strrchr()
- Add RISC-V-specific strnlen(), strchr(), strrchr() implementations
- Add hardware error exception handling
- Clean up and optimize our unaligned access probe code
- Enable HAVE_IOREMAP_PROT to be able to use generic_access_phys()
- Remove XIP kernel support
- Warn when addresses outside the vmemmap range are passed to
vmemmap_populate()
- Update the ACPI FADT revision check to warn if it's not at least
ACPI v6.6, which is when key RISC-V-specific tables were added to the
specification
- Increase COMMAND_LINE_SIZE to 2048 to match ARM64, x86, PowerPC, etc.
- Make kaslr_offset() a static inline function, since there's no need
for it to show up in the symbol table
- Add KASLR offset and SATP to the VMCOREINFO ELF notes to improve
kdump support
- Add Makefile cleanup rule for vdso_cfi copied source files, and add
a .gitignore for the build artifacts in that directory
- Remove some redundant ifdefs that check Kconfig macros
- Add missing SPDX license tag to the CFI selftest
- Simplify UTS_MACHINE assignment in the RISC-V Makefile
- Clarify some unclear comments and remove some superfluous comments
- Fix various English typos across the RISC-V codebase
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmnqoPQACgkQx4+xDQu9
KksBGw/+K4cZ5+m2hnS8RVZmreDHPkpLuRmIxqPe1JG/cS0KHwWBX+IX9uYdrmqP
Ex+hZyt+pqFAdbEcV0t4445RR8Lz7D4SxzFFk6q36OuWkrFahOnQQm0prdO+CSok
Ch4AqbH0WNbgoU5xGpCbfsBeNeDOJWc+sNKmoMGF1mlZyy7s7m5jwu2vxdpuc7Ut
pkzqA87JR2Pn2C0EitlJv2mYiKLrnl+ma+yRLjLC3mtubs1HjIUoPTtS4iEuZt41
SabT0SWKPhKXvjxnVxqxKGizH77eciIz+fjecFGB2lO07Lc3z2asT8sJ1bnCspMI
e0Thbohs5Z2q2vGg49UqfDCm47BUWkSjhtgOi1E/JcWPahgCGGP4mYLD6AVZ9biK
gQofXZq5XGxLWjKOoNqh5nPIYIWDtgQgQkXkLiCNYcp1CZ0RaCkkER64UKeRuhoS
tSZuLIbjNzqQMhD9tKWnPueQS3tz3CdNvSMWiDgy+2HoKYIxcaDJ5zPPCMVTWEHn
ohoTLG63oRglV2x5ol27FQKip4SUpxXaDtnuPBytsgys88m0TIOkXvWpzU5si5jQ
O3n43ZiHsnA7jRl4MVlFKDwzHFnm8eOMxpThU34oHJku8AyYQS9zTc05KfbjJEsp
p7YDuh8bH7FHyxLQXHFNor4dCDRY7xU67urz3wjaGRopKA4UE4g=
=hG4G
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
"There is one significant change outside arch/riscv in this pull
request: the addition of a set of KUnit tests for strlen(), strnlen(),
and strrchr().
Otherwise, the most notable changes are to add some RISC-V-specific
string function implementations, to remove XIP kernel support, to add
hardware error exception handling, and to optimize our runtime
unaligned access speed testing.
A few comments on the motivation for removing XIP support. It's been
broken in the RISC-V kernel for months. The code is not easy to
maintain. Furthermore, for XIP support to truly be useful for RISC-V,
we think that compile-time feature switches would need to be added for
many of the RISC-V ISA features and microarchitectural properties that
are currently implemented with runtime patching. No one has stepped
forward to take responsibility for that work, so many of us think it's
best to remove it until clear use cases and champions emerge.
Summary:
- Add Kunit correctness testing and microbenchmarks for strlen(),
strnlen(), and strrchr()
- Add RISC-V-specific strnlen(), strchr(), strrchr() implementations
- Add hardware error exception handling
- Clean up and optimize our unaligned access probe code
- Enable HAVE_IOREMAP_PROT to be able to use generic_access_phys()
- Remove XIP kernel support
- Warn when addresses outside the vmemmap range are passed to
vmemmap_populate()
- Update the ACPI FADT revision check to warn if it's not at least
ACPI v6.6, which is when key RISC-V-specific tables were added to
the specification
- Increase COMMAND_LINE_SIZE to 2048 to match ARM64, x86, PowerPC,
etc.
- Make kaslr_offset() a static inline function, since there's no need
for it to show up in the symbol table
- Add KASLR offset and SATP to the VMCOREINFO ELF notes to improve
kdump support
- Add Makefile cleanup rule for vdso_cfi copied source files, and add
a .gitignore for the build artifacts in that directory
- Remove some redundant ifdefs that check Kconfig macros
- Add missing SPDX license tag to the CFI selftest
- Simplify UTS_MACHINE assignment in the RISC-V Makefile
- Clarify some unclear comments and remove some superfluous comments
- Fix various English typos across the RISC-V codebase"
* tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
riscv: Remove support for XIP kernel
riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
riscv: Split out compare_unaligned_access()
riscv: Reuse measure_cycles() in check_vector_unaligned_access()
riscv: Split out measure_cycles() for reuse
riscv: Clean up & optimize unaligned scalar access probe
riscv: lib: add strrchr() implementation
riscv: lib: add strchr() implementation
riscv: lib: add strnlen() implementation
lib/string_kunit: extend benchmarks to strnlen() and chr searches
lib/string_kunit: add performance benchmark for strlen()
lib/string_kunit: add correctness test for strrchr()
lib/string_kunit: add correctness test for strnlen()
lib/string_kunit: add correctness test for strlen()
riscv: vdso_cfi: Add .gitignore for build artifacts
riscv: vdso_cfi: Add clean rule for copied sources
riscv: enable HAVE_IOREMAP_PROT
riscv: mm: WARN_ON() for bad addresses in vmemmap_populate()
riscv: acpi: update FADT revision check to 6.6
riscv: add hardware error trap handler support
...
The KVM ISA extension related checks are not VCPU specific and
should be factored out of vcpu_onereg.c into separate sources.
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-6-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Rename kvm_riscv_vcpu_isa_check_host() to kvm_riscv_isa_check_host()
and use it as common function with KVM RISC-V to check isa extensions
supported by host.
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-5-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
When a guest initiates an SBI_EXT_PMU_COUNTER_CFG_MATCH call with
ctr_base=0xfffffffffffffffe, ctr_mask=0xeb5f and flags=0x1
(SBI_PMU_CFG_FLAG_SKIP_MATCH), kvm_riscv_vcpu_pmu_ctr_cfg_match()
first invokes kvm_pmu_validate_counter_mask() to verify whether
ctr_base and ctr_mask are valid, by evaluating:
!ctr_mask || (ctr_base + __fls(ctr_mask) >= kvm_pmu_num_counters(kvpmu))
With the above inputs, __fls(0xeb5f) equals 15, and adding 15 to
0xfffffffffffffffe causes an integer overflow, wrapping around to 13.
Since 13 is less than kvm_pmu_num_counters(), the validation wrongly
succeeds.
Thereafter, since flags & SBI_PMU_CFG_FLAG_SKIP_MATCH is satisfied,
the code evaluates:
!test_bit(ctr_base + __ffs(ctr_mask), kvpmu->pmc_in_use)
Here __ffs(0xeb5f) equals 0, so test_bit() receives 0xfffffffffffffffe
as the bit index and attempts to access the corresponding element of
the kvpmu->pmc_in_use, which results in an invalid memory access. This
triggers the following Oops:
Unable to handle kernel paging request at virtual address e3ebffff12abba89
generic_test_bit include/asm-generic/bitops/generic-non-atomic.h:128
kvm_riscv_vcpu_pmu_ctr_cfg_match arch/riscv/kvm/vcpu_pmu.c:758
kvm_sbi_ext_pmu_handler arch/riscv/kvm/vcpu_sbi_pmu.c:49
kvm_riscv_vcpu_sbi_ecall arch/riscv/kvm/vcpu_sbi.c:608
kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:240
The root cause is that kvm_pmu_validate_counter_mask() does not account
for the case where ctr_base itself is out of range, allowing the
subsequent addition to silently overflow and bypass the check.
Fix this by explicitly validating ctr_base against kvm_pmu_num_counters()
before performing the addition.
This bug was found by fuzzing the KVM RISC-V PMU interface.
Fixes: 0cb74b65d2 ("RISC-V: KVM: Implement perf support without sampling")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Atish Patra <atish.patra@linux.dev>
Link: https://lore.kernel.org/r/20260319035902.924661-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
In kvm_riscv_vcpu_pmu_snapshot_set_shmem(), when kvm_vcpu_write_guest()
fails, kvpmu->sdata is freed but not set to NULL. This leaves a dangling
pointer that will be freed again when kvm_pmu_clear_snapshot_area() is
called during vcpu teardown, triggering a KASAN double-free report.
First free occurs in kvm_riscv_vcpu_pmu_snapshot_set_shmem():
kvm_riscv_vcpu_pmu_snapshot_set_shmem arch/riscv/kvm/vcpu_pmu.c:443
kvm_sbi_ext_pmu_handler arch/riscv/kvm/vcpu_sbi_pmu.c:74
kvm_riscv_vcpu_sbi_ecall arch/riscv/kvm/vcpu_sbi.c:608
kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:240
kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:1008
kvm_vcpu_ioctl virt/kvm/kvm_main.c:4476
Second free (double-free) occurs in kvm_pmu_clear_snapshot_area():
kvm_pmu_clear_snapshot_area arch/riscv/kvm/vcpu_pmu.c:403 [inline]
kvm_riscv_vcpu_pmu_deinit.part arch/riscv/kvm/vcpu_pmu.c:905
kvm_riscv_vcpu_pmu_deinit arch/riscv/kvm/vcpu_pmu.c:893
kvm_arch_vcpu_destroy arch/riscv/kvm/vcpu.c:199
kvm_vcpu_destroy virt/kvm/kvm_main.c:469 [inline]
kvm_destroy_vcpus virt/kvm/kvm_main.c:489
kvm_arch_destroy_vm arch/riscv/kvm/vm.c:54
kvm_destroy_vm virt/kvm/kvm_main.c:1301 [inline]
kvm_put_kvm virt/kvm/kvm_main.c:1338
kvm_vm_release virt/kvm/kvm_main.c:1361
Fix it by setting kvpmu->sdata to NULL after kfree() in
kvm_riscv_vcpu_pmu_snapshot_set_shmem(), so that the subsequent
kfree(NULL) in kvm_pmu_clear_snapshot_area() becomes a safe no-op.
This bug was found by fuzzing the KVM RISC-V PMU interface.
Fixes: c2f41ddbcd ("RISC-V: KVM: Implement SBI PMU Snapshot feature")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260318092956.708246-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
When a guest invokes SBI_EXT_PMU_COUNTER_FW_READ or
SBI_EXT_PMU_COUNTER_FW_READ_HI on a firmware counter that has not been
configured via SBI_EXT_PMU_COUNTER_CFG_MATCH, the pmc->event_idx remains
SBI_PMU_EVENT_IDX_INVALID (0xFFFFFFFF). get_event_code() extracts the
lower 16 bits, yielding 0xFFFF (65535), which is then used to index into
kvpmu->fw_event[]. Since fw_event is only RISCV_KVM_MAX_FW_CTRS (32)
entries, this triggers an array-index-out-of-bounds:
UBSAN: array-index-out-of-bounds in arch/riscv/kvm/vcpu_pmu.c:255:37
index 65535 is out of range for type 'kvm_fw_event [32]'
Add a check for the known unconfigured case (SBI_PMU_EVENT_IDX_INVALID)
and a WARN_ONCE guard for any unexpected out-of-bounds event codes,
returning -EINVAL in both cases.
Fixes: badc386869 ("RISC-V: KVM: Support firmware events")
Fixes: 08fb07d6dc ("RISC-V: KVM: Support 64 bit firmware counters on RV32")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260316014533.2312254-2-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
When saddr_high != 0 on RV32, the goto out was unconditional, causing
valid 64-bit addresses to be rejected. Only goto out when the address
is invalid (64-bit host with saddr_high != 0).
Fixes: c2f41ddbcd ("RISC-V: KVM: Implement SBI PMU Snapshot feature")
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260311231833.13189-1-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Guest-controlled counter indices received via SBI ecalls are used to
index into the PMC array. Sanitize them with array_index_nospec()
to prevent speculative out-of-bounds access.
Similar to x86 commit 13c5183a4e ("KVM: x86: Protect MSR-based
index computations in pmu.h from Spectre-v1/L1TF attacks").
Fixes: 8f0153ecd3 ("RISC-V: KVM: Add skeleton support for perf")
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Link: https://lore.kernel.org/r/20260303-kvm-riscv-spectre-v1-v2-4-192caab8e0dc@cispa.de
Signed-off-by: Anup Patel <anup@brainfault.org>
The indexed array only has RISCV_KVM_MAX_COUNTERS elements.
The out-of-bound access could have been performed by a guest,
but it could only access another guest accessible data.
Fixes: 8f0153ecd3 ("RISC-V: KVM: Add skeleton support for perf")
Signed-off-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260227134617.23378-1-radim.krcmar@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
If execution reaches "ret = 0" assignment in
kvm_riscv_vcpu_pmu_event_info() then it means
kvm_vcpu_write_guest() returned 0 hence ret is
already zero and does not need to be assigned 0.
Fixes: e309fd113b ("RISC-V: KVM: Implement get event info function")
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251229072530.3075496-1-maqianga@uniontech.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The new get_event_info funciton allows the guest to query the presence
of multiple events with single SBI call. Currently, the perf driver
in linux guest invokes it for all the standard SBI PMU events. Support
the SBI function implementation in KVM as well.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-7-d8f80cacb884@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
There is not much value in checking if a memslot is writable explicitly
before a write as it may change underneath after the check. Rather, return
invalid address error when write_guest fails as it checks if the slot
is writable anyways.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-6-d8f80cacb884@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
SBI v3.0 introduced a new raw event type v2 for wider mhpmeventX
programming. Add the support in kvm for that.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-3-d8f80cacb884@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The perf event should be marked disabled during the creation as
it is not ready to be scheduled until there is SBI PMU start call
or config matching is called with auto start. Otherwise, event add/start
gets called during perf_event_create_kernel_counter function.
It will be enabled and scheduled to run via perf_event_enable during
either the above mentioned scenario.
Fixes: 0cb74b65d2 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-1-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
With the latest Linux-6.11-rc3, the below NULL pointer crash is observed
when SBI PMU snapshot is enabled for the guest and the guest is forcefully
powered-off.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000508
Oops [#1]
Modules linked in: kvm
CPU: 0 UID: 0 PID: 61 Comm: term-poll Not tainted 6.11.0-rc3-00018-g44d7178dd77a #3
Hardware name: riscv-virtio,qemu (DT)
epc : __kvm_write_guest_page+0x94/0xa6 [kvm]
ra : __kvm_write_guest_page+0x54/0xa6 [kvm]
epc : ffffffff01590e98 ra : ffffffff01590e58 sp : ffff8f80001f39b0
gp : ffffffff81512a60 tp : ffffaf80024872c0 t0 : ffffaf800247e000
t1 : 00000000000007e0 t2 : 0000000000000000 s0 : ffff8f80001f39f0
s1 : 00007fff89ac4000 a0 : ffffffff015dd7e8 a1 : 0000000000000086
a2 : 0000000000000000 a3 : ffffaf8000000000 a4 : ffffaf80024882c0
a5 : 0000000000000000 a6 : ffffaf800328d780 a7 : 00000000000001cc
s2 : ffffaf800197bd00 s3 : 00000000000828c4 s4 : ffffaf800248c000
s5 : ffffaf800247d000 s6 : 0000000000001000 s7 : 0000000000001000
s8 : 0000000000000000 s9 : 00007fff861fd500 s10: 0000000000000001
s11: 0000000000800000 t3 : 00000000000004d3 t4 : 00000000000004d3
t5 : ffffffff814126e0 t6 : ffffffff81412700
status: 0000000200000120 badaddr: 0000000000000508 cause: 000000000000000d
[<ffffffff01590e98>] __kvm_write_guest_page+0x94/0xa6 [kvm]
[<ffffffff015943a6>] kvm_vcpu_write_guest+0x56/0x90 [kvm]
[<ffffffff015a175c>] kvm_pmu_clear_snapshot_area+0x42/0x7e [kvm]
[<ffffffff015a1972>] kvm_riscv_vcpu_pmu_deinit.part.0+0xe0/0x14e [kvm]
[<ffffffff015a2ad0>] kvm_riscv_vcpu_pmu_deinit+0x1a/0x24 [kvm]
[<ffffffff0159b344>] kvm_arch_vcpu_destroy+0x28/0x4c [kvm]
[<ffffffff0158e420>] kvm_destroy_vcpus+0x5a/0xda [kvm]
[<ffffffff0159930c>] kvm_arch_destroy_vm+0x14/0x28 [kvm]
[<ffffffff01593260>] kvm_destroy_vm+0x168/0x2a0 [kvm]
[<ffffffff015933d4>] kvm_put_kvm+0x3c/0x58 [kvm]
[<ffffffff01593412>] kvm_vm_release+0x22/0x2e [kvm]
Clearly, the kvm_vcpu_write_guest() function is crashing because it is
being called from kvm_pmu_clear_snapshot_area() upon guest tear down.
To address the above issue, simplify the kvm_pmu_clear_snapshot_area() to
not zero-out PMU snapshot area from kvm_pmu_clear_snapshot_area() because
the guest is anyway being tore down.
The kvm_pmu_clear_snapshot_area() is also called when guest changes
PMU snapshot area of a VCPU but even in this case the previous PMU
snaphsot area must not be zeroed-out because the guest might have
reclaimed the pervious PMU snapshot area for some other purpose.
Fixes: c2f41ddbcd ("RISC-V: KVM: Implement SBI PMU Snapshot feature")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20240815170907.2792229-1-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The RISC-V SBI PMU specification defines several standard hardware and
cache events. Currently, all of these events are exposed to userspace,
even when not actually implemented. They appear in the `perf list`
output, and commands like `perf stat` try to use them.
This is more than just a cosmetic issue, because the PMU driver's .add
function fails for these events, which causes pmu_groups_sched_in() to
prematurely stop scheduling in other (possibly valid) hardware events.
Add logic to check which events are supported by the hardware (i.e. can
be mapped to some counter), so only usable events are reported to
userspace. Since the kernel does not know the mapping between events and
possible counters, this check must happen during boot, when no counters
are in use. Make the check asynchronous to minimize impact on boot time.
Fixes: e999143459 ("RISC-V: Add perf platform driver based on SBI PMU extension")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-3-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Rename the function to indicate that it is meant for firmware
counter read. While at it, add a range sanity check for it as
well.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240420151741.962500-17-atishp@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The SBI v2.0 introduced a fw_read_hi function to read 64 bit firmware
counters for RV32 based systems.
Add infrastructure to support that.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240420151741.962500-16-atishp@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
KVM enables perf for guest via counter virtualization. However, the
sampling can not be supported as there is no mechanism to enabled
trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot
to provide the counter overflow data via the shared memory.
In case of sampling event, the host first sets the guest's LCOFI
interrupt and injects to the guest via irq filtering mechanism defined
in AIA specification. Thus, ssaia must be enabled in the host in order
to use perf sampling in the guest. No other AIA dependency w.r.t kernel
is required.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240420151741.962500-15-atishp@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
PMU Snapshot function allows to minimize the number of traps when the
guest access configures/access the hpmcounters. If the snapshot feature
is enabled, the hypervisor updates the shared memory with counter
data and state of overflown counters. The guest can just read the
shared memory instead of trap & emulate done by the hypervisor.
This patch doesn't implement the counter overflow yet.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240420151741.962500-14-atishp@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Currently, we return a linux error code if creating a perf event failed
in kvm. That shouldn't be necessary as guest can continue to operate
without perf profiling or profiling with firmware counters.
Return appropriate SBI error code to indicate that PMU configuration
failed. An error message in kvm already describes the reason for failure.
Fixes: 0cb74b65d2 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240420151741.962500-13-atishp@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The virtual counter value is updated during pmu_ctr_read. There is no need
to update it in reset case. Otherwise, it will be counted twice which is
incorrect.
Fixes: 0cb74b65d2 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240420151741.962500-12-atishp@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
The initial sample period value when counter value is not assigned
should be set to maximum value supported by the counter width.
Otherwise, it may result in spurious interrupts.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240420151741.962500-11-atishp@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
SBI PMU extension defines a set of firmware events which can provide
useful information to guests about the number of SBI calls. As
hypervisor implements the SBI PMU extension, these firmware events
correspond to ecall invocations between VS->HS mode. All other firmware
events will always report zero if monitored as KVM doesn't implement them.
This patch adds all the infrastructure required to support firmware
events.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
RISC-V SBI PMU & Sscofpmf ISA extension allows supporting perf in
the virtualization enviornment as well. KVM implementation
relies on SBI PMU extension for the most part while trapping
& emulating the CSRs read for counter access.
This patch doesn't have the event sampling support yet.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
As the KVM guests only see the virtual PMU counters, all hpmcounter
access should trap and KVM emulates the read access on behalf of guests.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The privilege mode filtering feature must be available in the host so
that the host can inhibit the counters while the execution is in HS mode.
Otherwise, the guests may have access to critical guest information.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
This patch only adds barebone structure of perf implementation. Most
of the function returns zero at this point and will be implemented
fully in the future.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>