Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmmXUh8ACgkQMUZtbf5S
IrufYA//ZVj+4gvegqKwKZYXNBndVW00GGTYqaILbaenK1olUVUelVB91eV2Klc/
dXCeKG/MgEPuT89IjkPzVr2Yg4x6uhjcQL1rsahORn+GuQfSI/P8y7ysDOPnHVeM
Rtsg1m8z3EizJcHPeAJe7nEqFzfvZ2m+FCEGe++z8BYaUZUVApytgpIWOHO/aB+p
t13bCNzd05XxPphMl610T00Fncj2jCVDHILMgTB5rmFmkeJuQwNrRGXQSoQame46
+g+yCZjT0eVTrBaH1EUssWfrOT3VJj3BEee6gSp7k9mxMkbW18i8shBgmxS+EHjk
u19wwBzSrHK+JY1UExim+1E/rZisQVmEE1Gs0ALedxAu9zC/Julzfa2/+BFsc0j7
QTXd4jukG3aTPIX8v3TV2Igu0j+bAT4WdpzvnsXXBMVKy7wFYMd1+aSOLyFH2W9L
qRbg50oUATcsz77bZt6YUTJEgua4HXNYGtn15FMZOR7HJVR2L44Q5TK5mQxGp5iM
GabeKMzg6bsjE98STM3nbWks3pIb9ptIk++i0913eSqKgn84bDPtp3Gabfgle2SJ
8gjKS61K8rDt5x8StXVod7oGQ4asL8RJyOtE/avgbWUu9BNH8/oKqsE6TQrpXauv
1ndiyim/mPe4fBCxkVAi2+uq5/ph9z8XyleESz9VYwyL3Rl4nsg=
=qSCj
-----END PGP SIGNATURE-----
Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200"
* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: nfc: nci: Fix parameter validation for packet data
net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
net/mlx5: Fix misidentification of write combining CQE during poll loop
net/mlx5e: Fix misidentification of ASO CQE during poll loop
net/mlx5: Fix multiport device check over light SFs
bonding: alb: fix UAF in rlb_arp_recv during bond up/down
bnge: fix reserving resources from FW
eth: fbnic: Advertise supported XDP features.
rds: tcp: fix uninit-value in __inet_bind
net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
octeontx2-af: Fix default entries mcam entry action
net/mlx5e: XSK, Fix unintended ICOSQ change
ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
inet: move icmp_global_{credit,stamp} to a separate cache line
icmp: prevent possible overflow in icmp_global_allow()
selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmXR6QACgkQ6rmadz2v
bTqVjg/+PZPMKGBMfF5uWk74LWYQIt01ePfHH2QeA4DsOwNK+9Q1+jmCLRPa/diL
Ds//ZIEMatmtdd1eO5aHyGXE1sBsJ02LfKOhsPukQyzD/FtZ4BmQpzpG2mK5o1M5
NAH6wxY+6Tr8UlXQtoTF1FFXSa6Y0vQmkyXofOoUgSBAxTPMGVQnWw4bq7mUAX9A
G6/TnPDgGbNLejPCmu8mERCkqRjIGAgjBUItVeiHbdxymtzjHcrH7nwxuP59djR9
1AhMrJnyV+s7iEMkAKGkE6NOID73R/YQEqmvD1eX0AWvqdR8+4lOHT0KPU039JqT
RQV5JgXSfeEkdUtyvqQJZdiJinjFLOwp4CGcX+DKcvUpAKmLx8q3ihPiuWk8+JOV
fnosXQIeQ7B9EuTvoNoNTfvU/MuV8vWd3/1kQc+KGXhzk944Ypb29zywGEoGZarU
eb7YRtUIsXBo7H2K1juqTvj72jyhG83cbZxE5+pR2gv87yGgUvt0r0u0FvzkQf2c
Fq671n6UaOd+0ZYey7YG9bIc+WMTbsjqVY0f7+3Rcl3Q68we0HHdqiPoF637/a1r
lS6TZLRDJmykDPp03db97UcQml6RKJiyTnTNQVUF4Uk+VF1KTreXK/D8fD401gxZ
GWuLt/bjq8l/EVJOhzpaO0JDejmZVRaOQUX8t9DwstS+FFbSyBs=
=mb+y
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
Hung)
- Fix a potential use-after-free of BTF object (Anton Protopopov)
- Add feature detection to libbpf and avoid moving arena global
variables on older kernels (Emil Tsalapatis)
- Remove extern declaration of bpf_stream_vprintk() from libbpf headers
(Ihor Solodrai)
- Fix truncated netlink dumps in bpftool (Jakub Kicinski)
- Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
Dwivedi)
- Remove hexdump dependency while building bpf selftests (Matthieu
Baerts)
- Complete fsession support in BPF trampolines on riscv (Menglong Dong)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Remove hexdump dependency
libbpf: Remove extern declaration of bpf_stream_vprintk()
selftests/bpf: Use vmlinux.h in test_xdp_meta
bpftool: Fix truncated netlink dumps
libbpf: Delay feature gate check until object prepare time
libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
bpf: Add a map/btf from a fd array more consistently
selftests/bpf: Fix map_kptr grace period wait
selftests/bpf: enable fsession_test on riscv64
selftests/bpf: Adjust selftest due to function rename
bpf, riscv: add fsession support for trampolines
bpf: Fix a potential use-after-free of BTF object
bpf, riscv: introduce emit_store_stack_imm64() for trampoline
libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
libbpf: Add gating for arena globals relocation feature
Total patches: 7
Reviews/patch: 0.57
Reviewed rate: 42%
- The 2 patch series "two fixes in kho_populate()" from Ran Xiaokai
fixes a couple of not-major issues in the kexec handover code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaKBAAKCRDdBJ7gKXxA
jpB1AP9UpNzT63aGDnB6G8pgekSdK/I2gypZI3cS7MpBPorRUgEAhcClc2//zWGK
0Wz1rxh3sWIE/pzd/yOEsv+7oQHeDQA=
=oUp2
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-MM updates from Andrew Morton:
- "two fixes in kho_populate()" fixes a couple of not-major issues in
the kexec handover code (Ran Xiaokai)
- misc singletons
* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib/group_cpus: handle const qualifier from clusters allocation type
kho: remove unnecessary WARN_ON(err) in kho_populate()
kho: fix missing early_memunmap() call in kho_populate()
scripts/gdb: implement x86_page_ops in mm.py
objpool: fix the overestimation of object pooling metadata size
selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
delayacct: fix build regression on accounting tool
Total patches: 36
Reviews/patch: 1.77
Reviewed rate: 83%
- The 2 patch series "mm/vmscan: fix demotion targets checks in
reclaim/demotion" from Bing Jiao fixes a couple of issues in the
demotion code - pages were failed demotion and were finding themselves
demoted into disallowed nodes.
- The 11 patch series "Remove XA_ZERO from error recovery of dup_mmap()"
from Liam Howlett fixes a rare mapledtree race and performs a number of
cleanups.
- The 13 patch series "mm: add bitmap VMA flag helpers and convert all
mmap_prepare to use them" from Lorenzo Stoakes implements a lot of
cleanups following on from the conversion of the VMA flags into a
bitmap.
- The 5 patch series "support batch checking of references and unmapping
for large folios" from Baolin Wang implements batching to greatly
improve the performance of reclaiming clean file-backed large folios.
- The 3 patch series "selftests/mm: add memory failure selftests" from
Miaohe Lin does as claimed.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaIEQAKCRDdBJ7gKXxA
jj73AQCQDwLoipDiQRGyjB5BDYydymWuDoiB1tlDPHfYAP3b/QD/UQtVlOEXqwM3
naOKs3NQ1pwnfhDaQMirGw2eAnJ1SQY=
=6Iif
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
couple of issues in the demotion code - pages were failed demotion
and were finding themselves demoted into disallowed nodes (Bing Jiao)
- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
mapledtree race and performs a number of cleanups (Liam Howlett)
- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
them" implements a lot of cleanups following on from the conversion
of the VMA flags into a bitmap (Lorenzo Stoakes)
- "support batch checking of references and unmapping for large folios"
implements batching to greatly improve the performance of reclaiming
clean file-backed large folios (Baolin Wang)
- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
Lin)
* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
mm/page_alloc: clear page->private in free_pages_prepare()
selftests/mm: add memory failure dirty pagecache test
selftests/mm: add memory failure clean pagecache test
selftests/mm: add memory failure anonymous page test
mm: rmap: support batched unmapping for file large folios
arm64: mm: implement the architecture-specific clear_flush_young_ptes()
arm64: mm: support batch clearing of the young flag for large folios
arm64: mm: factor out the address and ptep alignment into a new helper
mm: rmap: support batched checks of the references for large folios
tools/testing/vma: add VMA userland tests for VMA flag functions
tools/testing/vma: separate out vma_internal.h into logical headers
tools/testing/vma: separate VMA userland tests into separate files
mm: make vm_area_desc utilise vma_flags_t only
mm: update all remaining mmap_prepare users to use vma_flags_t
mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
mm: update secretmem to use VMA flags on mmap_prepare
mm: update hugetlbfs to use VMA flags on mmap_prepare
mm: add basic VMA flag operation helper functions
tools: bitmap: add missing bitmap_[subset(), andnot()]
mm: add mk_vma_flags() bitmap flag macro helper
...
Add ipv4-mapped-ipv6 case to ksft_runner.sh before
an upcoming TCP fix in this area.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260217142924.1853498-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2
("selftests/bpf: Remove xxd util dependency").
hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:
/bin/sh: 2: hexdump: not found
Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.
Fixes: b640d556a2 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Ensure tools/objtool is cleaned by 'make clean' and 'make mrproper'
- Fix test program for CONFIG_CC_CAN_LINK to avoid a warning, which is
made fatal by -Werror
- Drop explicit LZMA parallel compression in scripts/make_fit.py
- Several fixes for commit 62089b8048 ("kbuild: rpm-pkg: Generate
debuginfo package manually")
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaZYm2AAKCRAdayaRccAa
lg5gAP0aeqyRzFjCAcle4vDqh1bLIY3+RxMgL4/d25tJfq+01wEAlvrLWdDl0eh8
RdncqkIWNQA8YwlQDDVO0K1wcu+Mzw4=
=CoUD
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Ensure tools/objtool is cleaned by 'make clean' and 'make mrproper'
- Fix test program for CONFIG_CC_CAN_LINK to avoid a warning, which is
made fatal by -Werror
- Drop explicit LZMA parallel compression in scripts/make_fit.py
- Several fixes for commit 62089b8048 ("kbuild: rpm-pkg: Generate
debuginfo package manually")
* tag 'kbuild-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: rpm-pkg: Disable automatic requires for manual debuginfo package
kbuild: rpm-pkg: Fix manual debuginfo generation when using .src.rpm
kernel: rpm-pkg: Restore find-debuginfo.sh approach to -debuginfo package
kbuild: rpm-pkg: Restrict manual debug package creation
scripts/make_fit.py: Drop explicit LZMA parallel compression
kbuild: Fix CC_CAN_LINK detection
kbuild: Add objtool to top-level clean target
An issue was reported that building BPF program which includes both
vmlinux.h and bpf_helpers.h from libbpf fails due to conflicting
declarations of bpf_stream_vprintk().
Remove the extern declaration from bpf_helpers.h to address this.
In order to use bpf_stream_printk() macro, BPF programs are expected
to either include vmlinux.h of the kernel they are targeting, or add
their own extern declaration.
Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Closes: https://github.com/libbpf/libbpf/issues/947
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Replace linux/* includes with vmlinux.h
- Include errno.h
- Include bpf_tracing_net.h for TC_ACT_* and ETH_*
- Use BPF_STDERR instead of BPF_STREAM_STDERR
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
turbostat.c:8688: rapl_perf_init: Assertion `next_domain < num_domains' failed.
Two recent cleanup patches that were not supposed to change anything
broke the core_id code needed for AMD RAPL initialization:
commit 070e92361e ("tools/power turbostat: Enhance HT enumeration")
commit ddf60e38ca ("tools/power turbostat: Simplify global core_id calculation")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Since we started running selftests in NIPA we have been seeing
tc_actions.sh generate a soft lockup warning on ~20% of the runs.
On the pre-netdev foundation setup it was actually a missed irq
splat from the console. Now it's either that or a lockup.
I initially suspected a socket locking issue since the test
is exercising local loopback with act_mirred.
After hours of staring at this I noticed in strace that ncat
when -o $file is specified _both_ saves the output to the file
and still prints it to stdout. Because the file being sent
is constructed with:
dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$mirred
^^^^^^^^^
the data printed is all \0. Most terminals don't display nul
characters (and neither does vng output capture save them).
But QEMU's serial console still has to poke them thru which
is very slow and causes the lockup (if the file is >600kB).
Replace the '-o $file' with '> $file'. This speeds the test up
from 2m20s to 18s on debug kernels, and prevents the warnings.
Fixes: ca22da2fbd ("act_mirred: use the backlog for nested calls to mirred ingress")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260214035159.2119699-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Netlink requires that the recv buffer used during dumps is at least
min(PAGE_SIZE, 8k) (see the man page). Otherwise the messages will
get truncated. Make sure bpftool follows this requirement, avoid
missing information on systems with large pages.
Acked-by: Quentin Monnet <qmo@kernel.org>
Fixes: 7084566a23 ("tools/bpftool: Remove libbpf_internal.h usage in bpftool")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20260217194150.734701-1-kuba@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since release 2025.12.02:
Add L2 statistics columns for recent Intel processors:
L2MRPS = L2 Cache M-References Per Second
L2%hit = L2 Cache Hit %
Sort work and output by cpu# rather than core#
Minor features and fixes.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEE67dNfPFP+XUaA73mB9BFOha3NhcFAmmPhVYUHGxlbi5icm93
bkBpbnRlbC5jb20ACgkQB9BFOha3NhfpyxAAuAjLE5AD+pHSSMW/wOXU5YxU5L5g
xIo5rWrgMIwx3IGDZE3EO45N0GpMDrL1hwmjPT/NtF+RESlzxo7NrsdrAM9+u8Eu
qMu+Krd45cGwi/q37QD337pIMLvG1nT46BLt4eQk2TpUJSJynNF4WiwJ32enHoSh
lwG76mChITxOdGDYfAvLZknsJqJgpv9sBbJzm3M7HxIKEnobKfE4A3Urooq+sz5X
zWcNSPBWNnVLSIs79INbgSaFBY51P3HtIaXvmivfAKS5BWcUZ6/5BAiCG+QjSqIe
l6bW4HU0UdVwzX76g2mApeeU40mz53xO5uDN9oriXOAEFh3unf4ui0I62TBgjDCB
Y86XfFAXzjYMfml9rM8pA7U6Rj+3XrMFjRdrO+7itKHotaWrXylr6qO3bNO6DxN1
OTcveL8hdnTunGOsiOuG2CFfwLxMhpyyb9+MBA5JNofltCmzgrhW7TjzLKPXM4fE
xCIe16RR/1DEw1PnPzYABU8gNmhTjm0zbUC8tlQhe8G9tSAXtBoz9Zv5bnqHr++6
ETlipyt9u7awj9APX/Oye/0eskcYVgST3XimXxD5rQrtveRyg/4B8PTej9eXNIp3
8eJjFkv+I+/740KetJFD/6gP8SBbO0bRV5yjDDXJ8ihbqm8854tFaOSef3W8zXNc
8iuNqxby3SVZc7Q=
=UG/p
-----END PGP SIGNATURE-----
Merge tag 'turbostat-2026.02.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Add L2 statistics columns for recent Intel processors:
L2MRPS = L2 Cache M-References Per Second
L2%hit = L2 Cache Hit %
- Sort work and output by cpu# rather than core#
- Minor features and fixes
* tag 'turbostat-2026.02.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (23 commits)
tools/power turbostat: version 2026.02.14
tools/power turbostat: Fix and document --header_iterations
tools/power turbostat: Use strtoul() for iteration parsing
tools/power turbostat: Favor cpu# over core#
tools/power turbostat: Expunge logical_cpu_id
tools/power turbostat: Enhance HT enumeration
tools/power turbostat: Simplify global core_id calculation
tools/power turbostat: Unify even/odd/average counter referencing
tools/power turbostat: Allocate average counters dynamically
tools/power turbostat: Delete core_data.core_id
tools/power turbostat: Rename physical_core_id to core_id
tools/power turbostat: Cleanup package_id
tools/power turbostat: Cleanup internal use of "base_cpu"
tools/power turbostat: Add L2 cache statistics
tools/power turbostat: Remove redundant newlines from err(3) strings
tools/power turbostat: Allow more use of is_hybrid flag
tools/power turbostat: Rename "LLCkRPS" column to "LLCMRPS"
tools/power turbostat.8: Document the "--force" option
tools/power turbostat: Harden against unexpected values
tools/power turbostat: Dump hypervisor name
...
Commit 728ff16791 ("libbpf: Add gating for arena globals relocation feature")
adds a feature gate check that loads a map and BPF program to
test the running kernel supports large direct offsets for LDIMM64
instructions. This check is currently used to calculate arena symbol
offsets during bpf_object__collect_relos, itself called by
bpf_object_open.
However, the program calling bpf_object_open may not have the permissions to
load maps and programs. This is the case with the BPF selftests, where
bpftool is invoked at compilation time during skeleton generation. This
causes errors as the feature gate unexpectedly fails with -EPERM.
Avoid this by moving all the use of the FEAT_LDIMM64_FULL_RANGE_OFF feature gate
to BPF object preparation time instead.
Fixes: 728ff16791 ("libbpf: Add gating for arena globals relocation feature")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260217204345.548648-3-emil@etsalapatis.com
Commit 728ff16791 uses a PROG_TYPE_TRACEPOINT BPF test program to
check whether the running kernel supports large LDIMM64 offsets. The
feature gate incorrectly assumes that the program will fail at
verification time with one of two messages, depending on whether the
feature is supported by the running kernel. However,
PROG_TYPE_TRACEPOINT programs may fail to load before verification even
starts, e.g., if the shell does not have the appropriate capabilities.
Use a BPF_PROG_TYPE_SOCKET_FILTER program for the feature gate instead.
Also fix two minor issues. First, ensure the log buffer for the test is
initialized: Failing program load before verification led to libbpf dumping
uninitialized data to stdout. Also, ensure that close() is only called
for program_fd in the probe if the program load actually succeeded. The
call was currently failing silently with -EBADF most of the time.
Fixes: 728ff16791 ("libbpf: Add gating for arena globals relocation feature")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260217204345.548648-2-emil@etsalapatis.com
Here is the "big" set of USB and Thunderbolt driver updates for 7.0-rc1.
Overall more lines were removed than added, thanks to dropping the
obsolete isp1362 USB host controller driver, always a nice change.
Other than that, nothing major happening here, highlights are:
- lots of dwc3 driver updates and new hardware support added
- usb gadget function driver updates
- usb phy driver updates
- typec driver updates and additions
- USB rust binding updates for syntax and formatting changes
- more usb serial device ids added
- other smaller USB core and driver updates and additions
All of these have been in linux-next for a long time, with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaZR0Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylcCgCfUyUFi0UOMPRyrU/fo5nyeWomgvsAnRst3nva
y7BvYwC2L4FIP23snrTM
=8S4Q
-----END PGP SIGNATURE-----
Merge tag 'usb-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the "big" set of USB and Thunderbolt driver updates for
7.0-rc1. Overall more lines were removed than added, thanks to
dropping the obsolete isp1362 USB host controller driver, always a
nice change.
Other than that, nothing major happening here, highlights are:
- lots of dwc3 driver updates and new hardware support added
- usb gadget function driver updates
- usb phy driver updates
- typec driver updates and additions
- USB rust binding updates for syntax and formatting changes
- more usb serial device ids added
- other smaller USB core and driver updates and additions
All of these have been in linux-next for a long time, with no reported
problems"
* tag 'usb-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (77 commits)
usb: typec: ucsi: Add Thunderbolt alternate mode support
usb: typec: hd3ss3220: Check if regulator needs to be switched
usb: phy: tegra: parametrize PORTSC1 register offset
usb: phy: tegra: parametrize HSIC PTS value
usb: phy: tegra: return error value from utmi_wait_register
usb: phy: tegra: cosmetic fixes
dt-bindings: usb: renesas,usbhs: Add RZ/G3E SoC support
usb: dwc2: fix resume failure if dr_mode is host
usb: cdns3: fix role switching during resume
usb: dwc3: gadget: Move vbus draw to workqueue context
USB: serial: option: add Telit FN920C04 RNDIS compositions
usb: dwc3: Log dwc3 address in traces
usb: gadget: tegra-xudc: Add handling for BLCG_COREPLL_PWRDN
usb: phy: tegra: add HSIC support
usb: phy: tegra: use phy type directly
usb: typec: ucsi: Enforce mode selection for cros_ec_ucsi
usb: typec: ucsi: Support mode selection to activate altmodes
usb: typec: Introduce mode_selection bit
usb: typec: Implement mode selection
usb: typec: Expose alternate mode priority via sysfs
...
The tests use the tc pedit action to modify the IPv4 source address
("pedit ex munge ip src set"), but the IP header checksum is not
recalculated after the modification. As a result, the modified packet
fails sanity checks in br_netfilter after bridging and is dropped,
which causes the test to fail.
Fix this by ensuring net.bridge.bridge-nf-call-iptables is set to 0
during the test execution. This prevents the bridge from passing
L2 traffic to netfilter, bypassing the checksum validation that
causes the test failure.
Fixes: 92ad382894 ("selftests: forwarding: Add a test for pedit munge SIP and DIP")
Fixes: 226657ba23 ("selftests: forwarding: Add a forwarding test for pedit munge dsfield")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-4-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv6 packet has an incorrect payload length set in the IPv6 header.
After VXLAN decapsulation, such packets do not pass sanity checks in
br_netfilter and are dropped, which causes the test to fail.
Fix this by setting the correct IPv6 payload length for the encapsulated
packet generated by mausezahn, so that the packet is accepted
by br_netfilter.
tools/testing/selftests/net/forwarding/vxlan_bridge_1d_ipv6.sh
lines 698-706
)"00:03:"$( : Payload length
)"3a:"$( : Next header
)"04:"$( : Hop limit
)"$saddr:"$( : IP saddr
)"$daddr:"$( : IP daddr
)"80:"$( : ICMPv6.type
)"00:"$( : ICMPv6.code
)"00:"$( : ICMPv6.checksum
)
Data after IPv6 header:
• 80: — 1 byte (ICMPv6 type)
• 00: — 1 byte (ICMPv6 code)
• 00: — 1 byte (ICMPv6 checksum, truncated)
Total: 3 bytes → 00:03 is correct. The old value 00:08 did not match
the actual payload size.
Fixes: b07e9957f2 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-3-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv4 packet contains a zero IP header checksum. After VXLAN
decapsulation, such packets do not pass sanity checks in br_netfilter
and are dropped, which causes the test to fail.
Fix this by calculating and setting a valid IPv4 header checksum for the
encapsulated packet generated by mausezahn, so that the packet is accepted
by br_netfilter. Fixed by using the payload_template_calc_checksum() /
payload_template_expand_checksum() helpers that are only available
in v6.3 and newer kernels.
Fixes: a0b61f3d8e ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-2-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Recently we were able to trigger a warning in the mdb_n_entries counting
code. Add tests that exercise different ways which used to trigger that
warning.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://patch.msgid.link/20260213070031.1400003-3-nikolay@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add check_rx_hds test that verifies header/data split works across
payload sizes. The test sweeps payload sizes from 1 byte to 8KB, if any
data propagates up to userspace as SCM_DEVMEM_LINEAR, then the test
fails. This shows that regardless of payload size, ncdevmem's
configuration of hds-thresh to 0 is respected.
Add -L (--fail-on-linear) flag to ncdevmem that causes the receiver to
fail if any SCM_DEVMEM_LINEAR cmsg is received.
Use socat option for fixed block sizing and tcp nodelay to disable
nagle's algo to avoid buffering.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-4-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCgAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmmR/3oPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Yf9gH/0SJoaO0Bdgh4SoWtE3AMr6Gxnig5dggoto4
C0lXix/2JcObZvBfmFkE0siCnQidb9XICCtIKlpvW35SgiQ9GsXuMT8oleo7wJii
eYHdA97gOMXverRfWVg/hb87nmx901/bleJTZsuOrzh96HJGvkXARR85X0fEJo4K
zfNrhnRUqUv2Onu0b1awmUYg01Yjq/VY3LzY6wRDWHE1Nfz4L0uC9XZs7FTO0UHe
Q3eivfRgEMtPM64BYFweTR5vk/harx/fcbqMhapvLPaH/K726Uw6SxrXCOMq9dQL
vVWy0G+yrMllu+1lIJoXnWxPEluYpuBxW/Zshx6WQa+bzA2Q1wA=
=uUQO
-----END PGP SIGNATURE-----
Merge tag 'docs-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux
Pull documentation fixes from Jonathan Corbet:
"A handful of small, late-arriving documentation fixes"
* tag 'docs-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux:
docs: toshiba_haps: fix grammar error in SSD warning
Docs/mm: fix typos and grammar in page_tables.rst
Docs/core-api: fix typos in rbtree.rst
docs: clarify wording in programming-language.rst
docs: process: maintainer-pgp-guide: update kernel.org docs link
docs: kdoc_parser: allow __exit in function prototypes
* update tools/include/linux/mm.h to fix memblock tests compilation
* drop redundant struct page* parameter from memblock_free_pages() and get
struct page from the pfn
* add underflow detection for size calculation in memtest and warn about
underflow when VM_DEBUG is enabled
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmmQIhoQHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kWhYB/0aobkrfD4aW5Utfmzp08LdBwtfsOqEfKX6
AdBGPdG+WB90auW4qwDupspqj2lYDpJ4QvETNP0B84ek62VEN+8YEbvcC4W70l4H
nsrrnkTgwFGNXXxjr6tIQXu9hnC1o7eSuWhhYry4XG+JEKR3iah54JmbxcDrAEFj
lb4BzdocDtF6J3EkOv5alaDfdwUxgA3C6Idp2mpVb4m7DMraGZMq3lm7EPYm22zb
zo9v0nvXW9xtZfADQ6mRzp4uTjd/UAUH+YsU/u1S1f+JBN1bELXmFRf/X3CKBC6/
AIO9FcHsfA0i1MhbeBizT9eUEFaNIRxbMAtWbfdHrQhaLWNvyPOU
=Gz3z
-----END PGP SIGNATURE-----
Merge tag 'memblock-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- update tools/include/linux/mm.h to fix memblock tests compilation
- drop redundant struct page* parameter from memblock_free_pages() and
get struct page from the pfn
- add underflow detection for size calculation in memtest and warn
about underflow when VM_DEBUG is enabled
* tag 'memblock-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm/memtest: add underflow detection for size calculation
memblock: drop redundant 'struct page *' argument from memblock_free_pages()
memblock test: include <linux/sizes.h> from tools mm.h stub
- Update the bootconfig parser to stop searching for a value when it
encounters a newline character.
Note that this changes the bootconfig formatting but this should not
fall under the don't break user space rule as users of bootconfig is
for booting the kernel and not about applications running in the
kernel's user space.
- Update the tests for bootconfig parser to ensure the good examples
to be parsed correctly by comparing the expected results.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmmPs+sbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bG60H/1GuYXlEbJrfM2G1moC5
c9HLkNON7xspEDWfT8lHk+T+4l7xj/Oriwl9Kkv/0F/P7NzIYZaSiHoht0TqrL6c
/BR3pgHVayp4H/woaDZdo9KDCMFuxW0ukrbUriz/taMJn4+b7krScGoIM4sl1e02
cQVYbJxP9x5oEhgrQEOBUHnYSaEcB7qBclIUCQOl3UV9krTZpKOsLe3tNjcj0JrZ
ACmSvP4oGKNIz+IarNmlV0m5enJk/pZLIpovbbrfm1vV4lxDWTCbSF2Q/dmpY43k
ZIhjgjSES5bS5PSl7aQJW7j2itCUHDu7kYz+i8ifjqMnEcJkE9PqqsdJAhq1J63i
M6w=
=t5PQ
-----END PGP SIGNATURE-----
Merge tag 'bootconfig-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:
- Update the bootconfig parser to stop searching for a value when it
encounters a newline character
- Update the tests for bootconfig parser to ensure the good examples to
be parsed correctly by comparing the expected results
* tag 'bootconfig-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Check the parsed output of the good examples
bootconfig: Terminate value search if it hits a newline
Highlights:
- amd/pmf:
- Avoid overwriting BIOS input values when events occur rapidly
- Fix PMF driver issues related to S4 (in part on crypto/ccp side)
- Add NPU metrics API (for accel side consumers)
- Allow disabling Smart PC function through a module parameter
- asus-wmi & HID/asus:
- Unification of backlight control (replaces quirks)
- Support multiple interfaces for controlling keyboard/RGB brightness
- Simplify init sequence
- hp-wmi:
- Add manual fan control for Victus S models
- Add fan mode keep-alive
- Fix platform profile values for Omen 16-wf1xxx
- Add EC offset to get the thermal profile
- intel/pmc: Show substate residencies also for non-primary PMCs
- intel/ISST:
- Store and restore data for all domains
- Write interface improvements
- lenovo-wmi:
- Support multiple Capability Data
- Add HWMON reporting and tuning support
- mellanox/mlx-platform: Add HI173 & HI174 support
- surface/aggregator_registry: Add Surface Pro 11 (QCOM)
- thinkpad_acpi: Add support for HW damage detection capability
- uniwill: Implement cTGP setting
- wmi:
- Introduce marshalling support
- Convert a few drivers to use the new buffer-based WMI API
- tools/power/x86/intel-speed-select: Allow read operations for non-root
- Miscellaneous cleanups / refactoring / improvements
The following is an automated shortlog grouped by driver:
amd/pmf:
- Added a module parameter to disable the Smart PC function
- Introduce new interface to export NPU metrics
- Prevent TEE errors after hibernate
- Use ring buffer to store custom BIOS input values
amd:
- Use scope-based cleanup for wbrf_record()
asus-wmi:
- add keyboard brightness event handler
- Add support for multiple kbd led handlers
- remove unused keyboard backlight quirk
crypto:
- ccp - Add an S4 restore flow
- ccp - Declare PSP dead if PSP_CMD_TEE_RING_INIT fails
- ccp - Factor out ring destroy handling to a helper
- ccp - Send PSP_CMD_TEE_RING_DESTROY when PSP_CMD_TEE_RING_INIT fails
HID: asus:
- add support for the asus-wmi brightness handler
- early return for ROG devices
- fortify keyboard handshake
- initialize additional endpoints only for certain devices
- listen to the asus-wmi brightness device instead of creating one
- move vendor initialization to probe
- simplify RGB init sequence
- use same report_id in response
hp-wmi:
- Add EC offsets to read Victus S thermal profile
- add manual fan control for Victus S models
- fix platform profile values for Omen 16-wf1xxx
- implement fan keep-alive
- order include headers
ideadpad-laptop:
- Clean up style warnings and checks
intel/pmc:
- Change LPM mode fields to u8
- Enable substate residencies for multiple PMCs
- Move LPM mode attributes to PMC
- Remove double empty line
intel/pmt:
- Replace sprintf() with sysfs_emit()
intel/uncore-freq:
- Replace sprintf() with scnprintf()
- Replace sprintf() with sysfs_emit()
intel-wmi-sbl-fw-update:
- Use new buffer-based WMI API
intel/wmi: thunderbolt:
- Use new buffer-based WMI API
ISST:
- Add missing write block check
- Check for admin capability for write commands
- Optimize suspend/resume callbacks
- Store and restore all domains data
lenovo-wmi-capdata:
- Add support for Capability Data 00
- Add support for Fan Test Data
lenovo-wmi-{capdata,other}:
- Fix HWMON channel visibility
- Support multiple Capability Data
lenovo-wmi-capdata:
- Wire up Fan Test Data
lenovo-wmi-helpers:
- Convert returned buffer into u32
lenovo-wmi-other:
- Add HWMON for fan reporting/tuning
mlx-platform:
- Add support DGX flavor of next-generation 800GB/s ethernet switch.
- Add support for new Nvidia DGX system based on class VMOD0010
Rename lenovo-wmi-capdata01 to lenovo-wmi-capdata:
- Rename lenovo-wmi-capdata01 to lenovo-wmi-capdata
surface: aggregator_registry:
- Add Surface Pro 11 (QCOM)
surface:
- Replace deprecated strcpy() in surface_button_add()
thinkpad_acpi:
- Add support to detect hardware damage detection capability.
- Add sysfs to display details of damaged device.
tools/power/x86/intel-speed-select:
- Allow non root users
- Fix file descriptor leak in isolate_cpus()
- Use pkg-config for libnl-3.0 detection
- v1.25 release
uniwill:
- Implement cTGP setting
uniwill-laptop:
- Introduce device descriptor system
wmi:
- Add helper functions for WMI string conversions
- Add kunit test for the marshalling code
- Add kunit test for the string conversion code
wmi-bmof:
- Use new buffer-based WMI API
wmi:
- Introduce marshalling support
wmi: string-kunit:
- Add missing oversized string test case
wmi:
- Update driver development guide
xiaomi-wmi:
- Use new buffer-based WMI API
yogabook:
- Clean up code style
Merges:
- Merge branch 'fixes' of into for-next
- Merge branch 'intel-sst' of https://github.com/spandruvada/linux-kernel into for-next
- Merge branch 'platform-drivers-x86-asus-kbd' into for-next
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaY9soAAKCRBZrE9hU+XO
MT5EAP9aK1wHlVGDfuC2k07X4gk8ZX5Ks9anXJlBcZFrpC9okwD5Aeqj3XLK338x
g5k/x+r87GwXjcBLnFi2TnNA2c8SWQY=
=eGAm
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:
"Highlights:
- amd/pmf:
- Avoid overwriting BIOS input values when events occur rapidly
- Fix PMF driver issues related to S4 (in part on crypto/ccp side)
- Add NPU metrics API (for accel side consumers)
- Allow disabling Smart PC function through a module parameter
- asus-wmi & HID/asus:
- Unification of backlight control (replaces quirks)
- Support multiple interfaces for controlling keyboard/RGB brightness
- Simplify init sequence
- hp-wmi:
- Add manual fan control for Victus S models
- Add fan mode keep-alive
- Fix platform profile values for Omen 16-wf1xxx
- Add EC offset to get the thermal profile
- intel/pmc: Show substate residencies also for non-primary PMCs
- intel/ISST:
- Store and restore data for all domains
- Write interface improvements
- lenovo-wmi:
- Support multiple Capability Data
- Add HWMON reporting and tuning support
- mellanox/mlx-platform: Add HI173 & HI174 support
- surface/aggregator_registry: Add Surface Pro 11 (QCOM)
- thinkpad_acpi: Add support for HW damage detection capability
- uniwill: Implement cTGP setting
- wmi:
- Introduce marshalling support
- Convert a few drivers to use the new buffer-based WMI API
- tools/power/x86/intel-speed-select: Allow read operations for non-root
- Miscellaneous cleanups / refactoring / improvements"
* tag 'platform-drivers-x86-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (68 commits)
platform/x86: lenovo-wmi-{capdata,other}: Fix HWMON channel visibility
platform/x86: hp-wmi: Add EC offsets to read Victus S thermal profile
platform: mellanox: mlx-platform: Add support DGX flavor of next-generation 800GB/s ethernet switch.
platform: mellanox: mlx-platform: Add support for new Nvidia DGX system based on class VMOD0010
HID: asus: add support for the asus-wmi brightness handler
platform/x86: asus-wmi: add keyboard brightness event handler
platform/x86: asus-wmi: remove unused keyboard backlight quirk
HID: asus: listen to the asus-wmi brightness device instead of creating one
platform/x86: asus-wmi: Add support for multiple kbd led handlers
HID: asus: early return for ROG devices
HID: asus: move vendor initialization to probe
HID: asus: fortify keyboard handshake
HID: asus: use same report_id in response
HID: asus: initialize additional endpoints only for certain devices
HID: asus: simplify RGB init sequence
platform/wmi: string-kunit: Add missing oversized string test case
platform/x86/amd/pmf: Added a module parameter to disable the Smart PC function
platform/x86/uniwill: Implement cTGP setting
platform/x86: uniwill-laptop: Introduce device descriptor system
platform/x86/amd: Use scope-based cleanup for wbrf_record()
...
Now that the RISC-V trampoline JIT supports BPF_TRACE_FSESSION, run
the fsession selftest on riscv64 as well as x86_64.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Tested-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20260208053311.698352-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
broke map_kptr selftest since it removed the function we were kprobing.
Use a new kfunc that invokes call_rcu_tasks_trace and sets a program
provided pointer to an integer to 1. Technically this can be unsafe if
the memory being written to from the callback disappears, but this is
just for usage in a test where we ensure we spin until we see the value
to be set to 1, so it's ok.
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Fixes: c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260211185747.3630539-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix bpf_linker__add_buf()'s logic of copying data from memory buffer into
memfd. In the event of short write not writing entire buf_sz bytes into memfd
file, we'll append bytes from the beginning of buf *again* (corrupting ELF
file contents) instead of correctly appending the rest of not-yet-read buf
contents.
Closes: https://github.com/libbpf/libbpf/issues/945
Fixes: 6d5e5e5d7c ("libbpf: Extend linker API to support in-memory ELF files")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260209230134.3530521-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add feature gating for the arena globals relocation introduced in
commit c1f61171d4. The commit depends on a previous commit in the
same patchset that is absent from older kernels
(12a1fe6e12 "bpf/verifier: Do not limit maximum direct offset into arena map").
Without this commit, arena globals relocation with arenas >= 512MiB
fails to load and breaks libbpf's backwards compatibility.
Introduce a libbpf feature to check whether the running kernel allows for
full range ldimm64 offset, and only relocate arena globals if it does.
Fixes: c1f61171d4 ("libbpf: Move arena globals to the end of the arena")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260210184532.255475-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
wait_for_port() can wait up to 2 seconds with the sleep and the polling
in wait_local_port_listen() combined. So, in netcons_basic.sh, the socat
process could die before the test writes to the netconsole.
Increase the timeout to 3 seconds to make netcons_basic.sh pass
consistently.
Fixes: 3dc6c76391 ("selftests: net: Add IPv6 support to netconsole basic tests")
Signed-off-by: Pin-yen Lin <treapking@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210005939.3230550-1-treapking@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since release 2025.12.02:
Add L2 statistics columns for recent Intel processors:
L2MRPS = L2 Cache M-References Per Second
L2%hit = L2 Cache Hit %
Sort work and output by cpu# rather than core#
This commit:
Version number and white space (indent -l160)
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
The "header_iterations" option is commonly used to de-clutter
the screen of redundant header label rows in an interactive session:
Eg. every 10 rows:
$ sudo turbostat --header_iterations 10 -S -q -i 1
But --header_iterations was missing from turbostat.8
Also turbostat help advertised the "-N" short option
that did not actually work:
$ turbostat --help
-N, --header_iterations num
print header every num iterations
Repair "-N"
Document "--header_iterations" on turbostat.8
Signed-off-by: Len Brown <len.brown@intel.com>
Replace strtod() with strtoul() and check errno for -n/-N options, since
num_iterations and header_iterations are unsigned long counters. Reject
zero and conversion errors; negative inputs wrap to large positive values
per standard unsigned semantics.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Turbostat collects statistics and outputs results in "topology order",
which means it prioritizes the core# over the cpu#.
The strategy is to minimize wakesups to a core -- which is
important when measuring an idle system.
But core order is problematic, because Linux core#'s are physical
(within each package), and thus subject to APIC-id scrambling
that may be done by the hardware or the BIOS.
As a result users may be are faced with rows in a confusing order:
sudo turbostat -q --show topology,Busy%,CPU%c6,UncMHz sleep 1
Core CPU Busy% CPU%c6 UncMHz
- - 1.25 72.18 3400
0 4 7.74 0.00
1 5 1.77 88.59
2 6 0.48 96.73
3 7 0.21 98.34
4 8 0.14 96.85
5 9 0.26 97.55
6 10 0.44 97.24
7 11 0.12 96.18
8 0 5.41 0.31 3400
8 1 0.19
12 2 0.41 0.22
12 3 0.08
32 12 0.04 99.21
33 13 0.25 94.92
Abandon the legacy "core# topology order" in favor of simply
ordering by cpu#, with a special case to handle HT siblings
that may not have adjacent cpu#'s.
sudo ./turbostat -q --show topology,Busy%,CPU%c6,UncMHz sleep 1
1.003001 sec
Core CPU Busy% CPU%c6 UncMHz
- - 1.38 80.55 1600
8 0 10.94 0.00 1600
8 1 0.53
12 2 2.90 0.45
12 3 0.11
0 4 1.96 91.20
1 5 0.97 96.40
2 6 0.24 94.72
3 7 0.31 98.01
4 8 0.20 98.20
5 9 0.62 96.00
6 10 0.06 98.15
7 11 0.12 99.31
32 12 0.04 99.07
33 13 0.27 95.09
The result is that cpu#'s now take precedence over core#'s.
Signed-off-by: Len Brown <len.brown@intel.com>
- Add more CPUCFG mask bits.
- Improve feature detection.
- Add lazy load support for FPU and binary translation (LBT) register state.
- Fix return value for memory reads from and writes to in-kernel devices.
- Add support for detecting preemption from within a guest.
- Add KVM steal time test case to tools/selftests.
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception.
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process.
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers.
- More pKVM fixes for features that are not supposed to be exposed to
guests.
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor.
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding.
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest.
- Preliminary work for guest GICv5 support.
- A bunch of debugfs fixes, removing pointless custom iterators stored
in guest data structures.
- A small set of FPSIMD cleanups.
- Selftest fixes addressing the incorrect alignment of page
allocation.
- Other assorted low-impact fixes and spelling fixes.
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(),
and kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO
register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that enlisted
help from the architecture's page table management to build hypervisor
page tables, in particular enabling sharing the last level of page
tables. This however was a lot of code (~3K lines) in order to support
KVM, and also blocked several features. The biggest advantages is
that the page size of userspace is completely independent of the
page size used by the guest: userspace can mix normal pages, THPs and
hugetlbfs as it sees fit, and in fact transparent hugepages were not
possible before. It's also now possible to have nested guests and
guests with huge pages running on the same host.
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware (contexted
switched around the fastpath run loop) and allowing direct access to data
MSRs and PMCs (restricted by the vPMU model). KVM still intercepts
access to control registers, e.g. to enforce event filtering and to
prevent the guest from profiling sensitive host state. This is more
accurate, since it has no risk of contention and thus dropped events, and
also has significantly less overhead.
For more information, see the commit message for merge commit bf2c3138ae
("Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD").
- Disallow changing the virtual CPU model if L2 is active, for all the same
reasons KVM disallows change the model after the first KVM_RUN.
- Fix a bug where KVM would incorrectly reject host accesses to PV MSRs
when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those
were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM
would attempt to read CR3 configuring an async #PF entry.
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86
only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports
that are intended for external usage, and those are allowed explicitly.
- When checking nested events after a vCPU is unblocked, ignore -EBUSY instead
of WARNing. Userspace can sometimes put the vCPU into what should be an
impossible state, and spurious exit to userspace on -EBUSY does not really
do anything to solve the issue.
- Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU
is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller
stuffing architecturally impossible states into KVM.
- Add support for new Intel instructions that don't require anything beyond
enumerating feature flags to userspace.
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2.
- Add WARNs to guard against modifying KVM's CPU caps outside of the intended
setup flow, as nested VMX in particular is sensitive to unexpected changes
in KVM's golden configuration.
- Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts
when the suppression feature is enabled by the guest (currently limited to
split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor
Suppress EOI Broadcasts isn't an option as some userspaces have come to rely
on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective
of whether or not userspace I/O APIC supports Directed EOIs).
- Clean up KVM's handling of marking mapped vCPU pages dirty.
- Drop a pile of *ancient* sanity checks hidden behind in KVM's unused
ASSERT() macro, most of which could be trivially triggered by the guest
and/or user, and all of which were useless.
- Fold "struct dest_map" into its sole user, "struct rtc_status", to make it
more obvious what the weird parameter is used for, and to allow fropping
these RTC shenanigans if CONFIG_KVM_IOAPIC=n.
- Bury all of ioapic.h, i8254.h and related ioctls (including
KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y.
- Add a regression test for recent APICv update fixes.
- Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv()
to consolidate the updates, and to co-locate SVI updates with the updates
for KVM's own cache of ISR information.
- Drop a dead function declaration.
- Minor cleanups.
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to temporarily
switch to vmcs01 instead of deferring the update until the next nested
VM-Exit. The deferred updates approach directly contributed to several
bugs, was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans.
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults,
and instead always reflect them into the guest. Since KVM doesn't shadow
EPCM entries, EPCM violations cannot be due to KVM interference and
can't be resolved by KVM.
- Fix a bug where KVM would register its posted interrupt wakeup handler even
if loading kvm-intel.ko ultimately failed.
- Disallow access to vmcb12 fields that aren't fully supported, mostly to
avoid weirdness and complexity for FRED and other features, where KVM wants
enable VMCS shadowing for fields that conditionally exist.
- Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or
refuses to online a CPU) due to a VMCS config mismatch.
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.
- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.
- Treat exit_code as an unsigned 64-bit value through all of KVM.
- Add support for fetching SNP certificates from userspace.
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.
- Misc fixes and cleanups.
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
and extend x86's infrastructure to support EPT and NPT (for L2 guests).
- Extend several nested VMX tests to also cover nested SVM.
- Add a selftest for nested VMLOAD/VMSAVE.
- Rework the nested dirty log test, originally added as a regression test for
PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
and to hopefully make the test easier to understand and maintain.
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage
handling. SEV/SNP was the only user of the tracking and it can do it via
the RMP.
- Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE
and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to
avoid non-trivial complexity for something that no known VMM seems to be
doing and to avoid an API special case for in-place conversion, which
simply can't support unaligned sources.
- When populating guest_memfd memory, GUP the source page in common code and
pass the refcounted page to the vendor callback, instead of letting vendor
code do the heavy lifting. Doing so avoids a looming deadlock bug with
in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap
invalidate lock.
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space when
creating a vCPU-specific mapping of guest memory. Actually this bug
could not be hit even on x86, the only architecture with multiple
address spaces, but it's a bug nevertheless.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmNqwwUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPaZAf/cJx5B67lnST272esz0j29MIuT/Ti
jnf6PI9b7XubKYOtNvlu5ZW4Jsa5dqRG0qeO/JmcXDlwBf5/UkWOyvqIXyiuTl0l
KcSUlKPtTgKZSoZpJpTppuuDE8FSYqEdcCmjNvoYzcJoPjmaeJbK6aqO0AkBbb6e
L5InrLV7nV9iua6rFvA0s/G8/Eq2DG8M9hTRHe6NcI/z4hvslOudvpUXtC8Jygoo
cV8vFavUwc+atrmvhAOLvSitnrjfNa4zcG6XMOlwXPfIdvi3zqTlQTgUpwGKiAGQ
RIDUVZ/9bcWgJqbPRsdEWwaYRkNQWc5nmrAHRpEEaYV/NeBBNf4v6qfKSw==
=SkJ1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Loongarch:
- Add more CPUCFG mask bits
- Improve feature detection
- Add lazy load support for FPU and binary translation (LBT) register
state
- Fix return value for memory reads from and writes to in-kernel
devices
- Add support for detecting preemption from within a guest
- Add KVM steal time test case to tools/selftests
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers
- More pKVM fixes for features that are not supposed to be exposed to
guests
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest
- Preliminary work for guest GICv5 support
- A bunch of debugfs fixes, removing pointless custom iterators
stored in guest data structures
- A small set of FPSIMD cleanups
- Selftest fixes addressing the incorrect alignment of page
allocation
- Other assorted low-impact fixes and spelling fixes
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO
register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that
enlisted help from the architecture's page table management to
build hypervisor page tables, in particular enabling sharing the
last level of page tables. This however was a lot of code (~3K
lines) in order to support KVM, and also blocked several features.
The biggest advantages is that the page size of userspace is
completely independent of the page size used by the guest:
userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
and in fact transparent hugepages were not possible before. It's
also now possible to have nested guests and guests with huge pages
running on the same host
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware
(contexted switched around the fastpath run loop) and allowing
direct access to data MSRs and PMCs (restricted by the vPMU model).
KVM still intercepts access to control registers, e.g. to enforce
event filtering and to prevent the guest from profiling sensitive
host state. This is more accurate, since it has no risk of
contention and thus dropped events, and also has significantly less
overhead.
For more information, see the commit message for merge commit
bf2c3138ae ("Merge tag 'kvm-x86-pmu-6.20' ...")
- Disallow changing the virtual CPU model if L2 is active, for all
the same reasons KVM disallows change the model after the first
KVM_RUN
- Fix a bug where KVM would incorrectly reject host accesses to PV
MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
even if those were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
where KVM would attempt to read CR3 configuring an async #PF entry
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
(for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
Only a few exports that are intended for external usage, and those
are allowed explicitly
- When checking nested events after a vCPU is unblocked, ignore
-EBUSY instead of WARNing. Userspace can sometimes put the vCPU
into what should be an impossible state, and spurious exit to
userspace on -EBUSY does not really do anything to solve the issue
- Also throw in the towel and drop the WARN on INIT/SIPI being
blocked when vCPU is in Wait-For-SIPI, which also resulted in
playing whack-a-mole with syzkaller stuffing architecturally
impossible states into KVM
- Add support for new Intel instructions that don't require anything
beyond enumerating feature flags to userspace
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2
- Add WARNs to guard against modifying KVM's CPU caps outside of the
intended setup flow, as nested VMX in particular is sensitive to
unexpected changes in KVM's golden configuration
- Add a quirk to allow userspace to opt-in to actually suppress EOI
broadcasts when the suppression feature is enabled by the guest
(currently limited to split IRQCHIP, i.e. userspace I/O APIC).
Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
option as some userspaces have come to rely on KVM's buggy behavior
(KVM advertises Supress EOI Broadcast irrespective of whether or
not userspace I/O APIC supports Directed EOIs)
- Clean up KVM's handling of marking mapped vCPU pages dirty
- Drop a pile of *ancient* sanity checks hidden behind in KVM's
unused ASSERT() macro, most of which could be trivially triggered
by the guest and/or user, and all of which were useless
- Fold "struct dest_map" into its sole user, "struct rtc_status", to
make it more obvious what the weird parameter is used for, and to
allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n
- Bury all of ioapic.h, i8254.h and related ioctls (including
KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y
- Add a regression test for recent APICv update fixes
- Handle "hardware APIC ISR", a.k.a. SVI, updates in
kvm_apic_update_apicv() to consolidate the updates, and to
co-locate SVI updates with the updates for KVM's own cache of ISR
information
- Drop a dead function declaration
- Minor cleanups
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to
temporarily switch to vmcs01 instead of deferring the update until
the next nested VM-Exit.
The deferred updates approach directly contributed to several bugs,
was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page
faults, and instead always reflect them into the guest. Since KVM
doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
interference and can't be resolved by KVM
- Fix a bug where KVM would register its posted interrupt wakeup
handler even if loading kvm-intel.ko ultimately failed
- Disallow access to vmcb12 fields that aren't fully supported,
mostly to avoid weirdness and complexity for FRED and other
features, where KVM wants enable VMCS shadowing for fields that
conditionally exist
- Print out the "bad" offsets and values if kvm-intel.ko refuses to
load (or refuses to online a CPU) due to a VMCS config mismatch
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure
- Add support for virtualizing ERAPS. Note, correct virtualization of
ERAPS relies on an upcoming, publicly announced change in the APM
to reduce the set of conditions where hardware (i.e. KVM) *must*
flush the RAP
- Ignore nSVM intercepts for instructions that are not supported
according to L1's virtual CPU model
- Add support for expedited writes to the fast MMIO bus, a la VMX's
fastpath for EPT Misconfig
- Don't set GIF when clearing EFER.SVME, as GIF exists independently
of SVM, and allow userspace to restore nested state with GIF=0
- Treat exit_code as an unsigned 64-bit value through all of KVM
- Add support for fetching SNP certificates from userspace
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when
emulating VMLOAD or VMSAVE on behalf of L2
- Misc fixes and cleanups
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
support, and extend x86's infrastructure to support EPT and NPT
(for L2 guests)
- Extend several nested VMX tests to also cover nested SVM
- Add a selftest for nested VMLOAD/VMSAVE
- Rework the nested dirty log test, originally added as a regression
test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
improve test coverage and to hopefully make the test easier to
understand and maintain
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked
hugepage handling. SEV/SNP was the only user of the tracking and it
can do it via the RMP
- Retroactively document and enforce (for SNP) that
KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
source page to be 4KiB aligned, to avoid non-trivial complexity for
something that no known VMM seems to be doing and to avoid an API
special case for in-place conversion, which simply can't support
unaligned sources
- When populating guest_memfd memory, GUP the source page in common
code and pass the refcounted page to the vendor callback, instead
of letting vendor code do the heavy lifting. Doing so avoids a
looming deadlock bug with in-place due an AB-BA conflict betwee
mmap_lock and guest_memfd's filemap invalidate lock
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space
when creating a vCPU-specific mapping of guest memory. Actually
this bug could not be hit even on x86, the only architecture with
multiple address spaces, but it's a bug nevertheless"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
KVM: s390: Increase permitted SE header size to 1 MiB
MAINTAINERS: Replace backup for s390 vfio-pci
KVM: s390: vsie: Fix race in acquire_gmap_shadow()
KVM: s390: vsie: Fix race in walk_guest_tables()
KVM: s390: Use guest address to mark guest page dirty
irqchip/riscv-imsic: Adjust the number of available guest irq files
RISC-V: KVM: Transparent huge page support
RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
KVM: riscv: selftests: Add riscv vm satp modes
KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
RISC-V: KVM: Remove unnecessary 'ret' assignment
KVM: s390: Add explicit padding to struct kvm_s390_keyop
KVM: LoongArch: selftests: Add steal time test case
LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
LoongArch: KVM: Add paravirt preempt feature in hypervisor side
...
Record the cpu_id of each CPU HT sibling -- will need this later.
Rename "thread_id" to "ht_id" to disambiguate that the scope
of this id is within a Core -- it is not a global cpu_id.
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
Standardize the generation of globally unique core_id's
in a macro, and simplify the related code.
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
Update the syntax of accesses to the even and odd counters
to match the average counters.
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
The current static definition of average{} is inconsistent with
the dynamically allocated even{} and odd{} counters.
Allocate average{} counters dynamically.
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
Delete redundant core_data.core_id.
Use cpus[].core_id as the single copy of the truth.
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
The Linux Kernel topology sysfs is flawed.
core_id is not globally unique, but is per-package.
Turbostat works around this when it needs to, with
rapl_core_id = cpus[cpu].core_id;
rapl_core_id += cpus[cpu].package_id * nr_cores_per_package
Otherwise, turbostat handles core_id as subservient to each package.
As there is only one core_id namespace, rename
physical_core_id to simply be core_id.
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
The kernel topology sysfs uses the name "physical_package_id"
because it is allowed to be sparse.
Inside Turbostat, that physical package_id namespace is the only
package_id namespace, so re-name it to simply be "package_id"
in cpus[].
Delete the redundant copy of package_id in pkg_data.
Rely instead on the single copy of the truth in cpus[].
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
Disambiguate the uses "base_cpu":
master_cpu: lowest permitted cpu#, read global MSRs here
package_data.first_cpu: lowest permitted cpu# in that package
core_data.first_cpu: lowest permitted cpu# in the core
current_cpu: where I'm running now
No functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
version 2026.02.04
Add support for L2 cache statistics: L2MRPS and L2%hit
L2 statistics join the LLC in the "cache" counter group.
While the underlying LLC perf kernel support was architectural,
L2 perf counters are model-specific:
Support Intel Xeon -- Sapphire Rapids and newer.
Support Intel Atom -- Gracemont and newer.
Support Intel Hybrid -- Alder Lake and newer.
Example:
alder-lake-n$ sudo turbostat --quiet --show CPU,Busy%,cache my_workload
CPU Busy% LLCMRPS LLC%hit L2MRPS L2%hit
- 49.82 1210 85.02 2909 31.63
0 99.14 322 88.89 767 32.38
1 0.91 1 32.47 1 18.86
2 0.20 0 40.78 0 23.34
3 99.17 295 81.79 706 31.89
4 0.68 1 58.71 1 15.61
5 99.16 299 85.65 726 31.32
6 0.08 0 45.35 0 31.71
7 99.21 293 83.63 707 30.92
where "my_workload" is a wrapper for a yogini workload
that has 4 fully-busy threads with 2MB working set each.
Note that analogous to the system summary for multiple LLC systems,
the system summary row for the L2 is the aggregate of all CPUS in the
system -- there is no per-cache roll-up.
Signed-off-by: Len Brown <len.brown@intel.com>
- Add support for control flow integrity for userspace processes.
This is based on the standard RISC-V ISA extensions Zicfiss and
Zicfilp
- Improve ptrace behavior regarding vector registers, and add some selftests
- Optimize our strlen() assembly
- Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI
volume mounting
- Clean up some code slightly, including defining copy_user_page() as
copy_page() rather than memcpy(), aligning us with other
architectures; and using max3() to slightly simplify an expression
in riscv_iommu_init_check()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmmOYpYACgkQx4+xDQu9
KkvzOQ/9Fq8ZxWgYofhTPtw9/vps3avheOHlEoRrBWYfn1VkTRPAcbUULL4PGXwg
dnVFEl3AcrpOFikIthbukklLeLoOnUshZJBU25zY5h0My1jb63V1//gEwJR6I0dg
+V+GJmfzc4+YVaHK6UFdn7j3GgKUbTC7xXRMuGEriAzKPnm3AXAjh94wMNx6depv
Li3IXRoZT/HvqIAyfeAoM9STwOzJtE3Sc6fXABkzsIbNTjjdgIqoRSsQsKY10178
z6ox/sVStnLmVaMbOd/ZVN0J70JRDsvK0TC0/13K1ESUbnVia9a3bPIxLRmSapKC
wXnwAuSeevtFshGGyd5LZO0QQGxzG1H63Gky2GRoh8bTQbd2tQcfQzANdnPkBAQS
j2aOiSsiUQeNZqfZAfEBwRd27GXRYlKb/MpgCZKUH+ZO9VG6QaD3VGvg17/Caghy
nVdbBQ81ZV9tkz9EMN0vt2VJHmEqARh88w619laHjg+ioPTG4/UIDPzskt1I+Fgm
Y6NQLeFyfaO3RKKDYWGPcY7fmWQI9V8MECHOvyVI4xJcgqAbqnfsgytjuiFbrfRo
fTvpuB7kvltBZ180QSB79xj0sWGFTWR02MeWy3uOaLZz2eIm2ZTZbMUSgNYR0ldG
L3y7CEkTkoVF1ijYgAfuMgptk3Yf0dpa66D9HUo947wWkNrW5ds=
=4fTk
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
- Add support for control flow integrity for userspace processes.
This is based on the standard RISC-V ISA extensions Zicfiss and
Zicfilp
- Improve ptrace behavior regarding vector registers, and add some
selftests
- Optimize our strlen() assembly
- Enable the ISO-8859-1 code page as built-in, similar to ARM64, for
EFI volume mounting
- Clean up some code slightly, including defining copy_user_page() as
copy_page() rather than memcpy(), aligning us with other
architectures; and using max3() to slightly simplify an expression
in riscv_iommu_init_check()
* tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
riscv: lib: optimize strlen loop efficiency
selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function
selftests: riscv: verify ptrace accepts valid vector csr values
selftests: riscv: verify ptrace rejects invalid vector csr inputs
selftests: riscv: verify syscalls discard vector context
selftests: riscv: verify initial vector state with ptrace
selftests: riscv: test ptrace vector interface
riscv: ptrace: validate input vector csr registers
riscv: csr: define vtype register elements
riscv: vector: init vector context with proper vlenb
riscv: ptrace: return ENODATA for inactive vector extension
kselftest/riscv: add kselftest for user mode CFI
riscv: add documentation for shadow stack
riscv: add documentation for landing pad / indirect branch tracking
riscv: create a Kconfig fragment for shadow stack and landing pad support
arch/riscv: add dual vdso creation logic and select vdso based on hw
arch/riscv: compile vdso with landing pad and shadow stack note
riscv: enable kernel access to shadow stack memory via the FWFT SBI call
riscv: add kernel command line option to opt out of user CFI
riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe
...
The _get_unused_cpus() function can return CPU numbers >= 16, which
exceeds RPS_MAX_CPUS in toeplitz.c. When this happens, the test fails
with a cryptic message:
# Exception| Traceback (most recent call last):
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/ksft.py", line 319, in ksft_run
# Exception| func(*args)
# Exception| File "/tmp/cur/linux/tools/testing/selftests/drivers/net/hw/toeplitz.py", line 189, in test
# Exception| with bkg(" ".join(rx_cmd), ksft_ready=True, exit_wait=True) as rx_proc:
# Exception| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 124, in __init__
# Exception| super().__init__(comm, background=True,
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 77, in __init__
# Exception| raise Exception("Did not receive ready message")
# Exception| Exception: Did not receive ready message
Rename _get_unused_cpus() to _get_unused_rps_cpus() and cap the CPU
search range to RPS_MAX_CPUS.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210093110.1935149-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>