mirror-linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	07fdad3a93	Networking changes for 6.18. Core & protocols ---------------- - Improve drop account scalability on NUMA hosts for RAW and UDP sockets and the backlog, almost doubling the Pps capacity under DoS. - Optimize the UDP RX performance under stress, reducing contention, revisiting the binary layout of the involved data structs and implementing NUMA-aware locking. This improves UDP RX performance by an additional 50%, even more under extreme conditions. - Add support for PSP encryption of TCP connections; this mechanism has some similarities with IPsec and TLS, but offers superior HW offloads capabilities. - Ongoing work to support Accurate ECN for TCP. AccECN allows more than one congestion notification signal per RTT and is a building block for Low Latency, Low Loss, and Scalable Throughput (L4S). - Reorganize the TCP socket binary layout for data locality, reducing the number of touched cachelines in the fastpath. - Refactor skb deferral free to better scale on large multi-NUMA hosts, this improves TCP and UDP RX performances significantly on such HW. - Increase the default socket memory buffer limits from 256K to 4M to better fit modern link speeds. - Improve handling of setups with a large number of nexthop, making dump operating scaling linearly and avoiding unneeded synchronize_rcu() on delete. - Improve bridge handling of VLAN FDB, storing a single entry per bridge instead of one entry per port; this makes the dump order of magnitude faster on large switches. - Restore IP ID correctly for encapsulated packets at GSO segmentation time, allowing GRO to merge packets in more scenarios. - Improve netfilter matching performance on large sets. - Improve MPTCP receive path performance by leveraging recently introduced core infrastructure (skb deferral free) and adopting recent TCP autotuning changes. - Allow bridges to redirect to a backup port when the bridge port is administratively down. - Introduce MPTCP 'laminar' endpoint that con be used only once per connection and simplify common MPTCP setups. - Add RCU safety to dst->dev, closing a lot of possible races. - A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing code duplication. - Supports pulling data from an skb frag into the linear area of an XDP buffer. Things we sprinkled into general kernel code -------------------------------------------- - Generate netlink documentation from YAML using an integrated YAML parser. Driver API ---------- - Support using IPv6 Flow Label in Rx hash computation and RSS queue selection. - Introduce API for fetching the DMA device for a given queue, allowing TCP zerocopy RX on more H/W setups. - Make XDP helpers compatible with unreadable memory, allowing more easily building DevMem-enabled drivers with a unified XDP/skbs datapath. - Add a new dedicated ethtool callback enabling drivers to provide the number of RX rings directly, improving efficiency and clarity in RX ring queries and RSS configuration. - Introduce a burst period for the health reporter, allowing better handling of multiple errors due to the same root cause. - Support for DPLL phase offset exponential moving average, controlling the average smoothing factor. Device drivers -------------- - Add a new Huawei driver for 3rd gen NIC (hinic3). - Add a new SpacemiT driver for K1 ethernet MAC. - Add a generic abstraction for shared memory communication devices (dibps) - Ethernet high-speed NICs: - nVidia/Mellanox: - Use multiple per-queue doorbell, to avoid MMIO contention issues - support adjacent functions, allowing them to delegate their SR-IOV VFs to sibling PFs - support RSS for IPSec offload - support exposing raw cycle counters in PTP and mlx5 - support for disabling host PFs. - Intel (100G, ice, idpf): - ice: support for SRIOV VFs over an Active-Active link aggregate - ice: support for firmware logging via debugfs - ice: support for Earliest TxTime First (ETF) hardware offload - idpf: support basic XDP functionalities and XSk - Broadcom (bnxt): - support Hyper-V VF ID - dynamic SRIOV resource allocations for RoCE - Meta (fbnic): - support queue API, zero-copy Rx and Tx - support basic XDP functionalities - devlink health support for FW crashes and OTP mem corruptions - expand hardware stats coverage to FEC, PHY, and Pause - Wangxun: - support ethtool coalesce options - support for multiple RSS contexts - Ethernet virtual: - Macsec: - replace custom netlink attribute checks with policy-level checks - Bonding: - support aggregator selection based on port priority - Microsoft vNIC: - use page pool fragments for RX buffers instead of full pages to improve memory efficiency - Ethernet NICs consumer, and embedded: - Qualcomm: support Ethernet function for IPQ9574 SoC - Airoha: implement wlan offloading via NPU - Freescale - enetc: add NETC timer PTP driver and add PTP support - fec: enable the Jumbo frame support for i.MX8QM - Renesas (R-Car S4): support HW offloading for layer 2 switching - support for RZ/{T2H, N2H} SoCs - Cadence (macb): support TAPRIO traffic scheduling - TI: - support for Gigabit ICSS ethernet SoC (icssm-prueth) - Synopsys (stmmac): a lot of cleanups - Ethernet PHYs: - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS driver - Support bcm63268 GPHY power control - Support for Micrel lan8842 PHY and PTP - Support for Aquantia AQR412 and AQR115 - CAN: - a large CAN-XL preparation work - reorganize raw_sock and uniqframe struct to minimize memory usage - rcar_canfd: update the CAN-FD handling - WiFi: - extended Neighbor Awareness Networking (NAN) support - S1G channel representation cleanup - improve S1G support - WiFi drivers: - Intel (iwlwifi): - major refactor and cleanup - Broadcom (brcm80211): - support for AP isolation - RealTek (rtw88/89) rtw88/89: - preparation work for RTL8922DE support - MediaTek (mt76): - HW restart improvements - MLO support - Qualcomm/Atheros (ath10k_ - GTK rekey fixes - Bluetooth drivers: - btusb: support for several new IDs for MT7925 - btintel: support for BlazarIW core - btintel_pcie: support for _suspend() / _resume() - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjdBdsSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkWnAP/2Jz0ExnYMDxLxkkKqMte+3JNeF/KNjB zVIslYN/5ekkd1TMXyDLFlWHqUOUMSF9+cK5ZRMHG6/lzOueSzFuuuDFsWs6Kf2f q7Q9MzlXhR9++YCsdES1uS5x3PPjMInzo2ZivMFa6fUDLLFSzeAOKL9k+RS0EggU VlXv2Wkm73R0O6KAssgDsHke9cnNz+F0DzhQ0S3qkyZF9tS5NrDeUl7fZ47XZgwb ZuXdEzKmTTepo2XvZGxJoF2D7nekWFlHhLjEPpDJtET19nwhukCry41/FplJwlKR dWsYkqLkrOEQKFQ+q++/5c3BgFOG+IrQLbP5bGQF1tZRMl4Dqp9PDxK5fKJfccbS 0VY3Y2qWOSYejT2Ji2kEHR5E5rPyFm3Y5A4Rnpz0yEHj14vL2v4zf7CZRkCyyDfD doIZXFGkM0+N7QeXLEN833cje9zjaXuP9GAE+2bb+wAWAZAqof4KX8JgHh+y5Rwm pvUtvFxmEtntlMwNBap8aT3FquGtfZncU8pzAE5kvWjuMvyF0NVWiE5rB2kSQM/X NLmUdvDyjwwJqthXAJG0fl+a0mNJ/kOAqSOKJDJrfKDGWa+ovwY0iY06SpK0wIbO Wz7tpMk5MSlYXW8xWMlbyhvvU6T9xuoQ2KV4QTdMxc6Ir3sNX6YkQr+gjQjxB0gx ST2QF6lZeWFh =w2Kz -----END PGP SIGNATURE----- Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core & protocols: - Improve drop account scalability on NUMA hosts for RAW and UDP sockets and the backlog, almost doubling the Pps capacity under DoS - Optimize the UDP RX performance under stress, reducing contention, revisiting the binary layout of the involved data structs and implementing NUMA-aware locking. This improves UDP RX performance by an additional 50%, even more under extreme conditions - Add support for PSP encryption of TCP connections; this mechanism has some similarities with IPsec and TLS, but offers superior HW offloads capabilities - Ongoing work to support Accurate ECN for TCP. AccECN allows more than one congestion notification signal per RTT and is a building block for Low Latency, Low Loss, and Scalable Throughput (L4S) - Reorganize the TCP socket binary layout for data locality, reducing the number of touched cachelines in the fastpath - Refactor skb deferral free to better scale on large multi-NUMA hosts, this improves TCP and UDP RX performances significantly on such HW - Increase the default socket memory buffer limits from 256K to 4M to better fit modern link speeds - Improve handling of setups with a large number of nexthop, making dump operating scaling linearly and avoiding unneeded synchronize_rcu() on delete - Improve bridge handling of VLAN FDB, storing a single entry per bridge instead of one entry per port; this makes the dump order of magnitude faster on large switches - Restore IP ID correctly for encapsulated packets at GSO segmentation time, allowing GRO to merge packets in more scenarios - Improve netfilter matching performance on large sets - Improve MPTCP receive path performance by leveraging recently introduced core infrastructure (skb deferral free) and adopting recent TCP autotuning changes - Allow bridges to redirect to a backup port when the bridge port is administratively down - Introduce MPTCP 'laminar' endpoint that con be used only once per connection and simplify common MPTCP setups - Add RCU safety to dst->dev, closing a lot of possible races - A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing code duplication - Supports pulling data from an skb frag into the linear area of an XDP buffer Things we sprinkled into general kernel code: - Generate netlink documentation from YAML using an integrated YAML parser Driver API: - Support using IPv6 Flow Label in Rx hash computation and RSS queue selection - Introduce API for fetching the DMA device for a given queue, allowing TCP zerocopy RX on more H/W setups - Make XDP helpers compatible with unreadable memory, allowing more easily building DevMem-enabled drivers with a unified XDP/skbs datapath - Add a new dedicated ethtool callback enabling drivers to provide the number of RX rings directly, improving efficiency and clarity in RX ring queries and RSS configuration - Introduce a burst period for the health reporter, allowing better handling of multiple errors due to the same root cause - Support for DPLL phase offset exponential moving average, controlling the average smoothing factor Device drivers: - Add a new Huawei driver for 3rd gen NIC (hinic3) - Add a new SpacemiT driver for K1 ethernet MAC - Add a generic abstraction for shared memory communication devices (dibps) - Ethernet high-speed NICs: - nVidia/Mellanox: - Use multiple per-queue doorbell, to avoid MMIO contention issues - support adjacent functions, allowing them to delegate their SR-IOV VFs to sibling PFs - support RSS for IPSec offload - support exposing raw cycle counters in PTP and mlx5 - support for disabling host PFs. - Intel (100G, ice, idpf): - ice: support for SRIOV VFs over an Active-Active link aggregate - ice: support for firmware logging via debugfs - ice: support for Earliest TxTime First (ETF) hardware offload - idpf: support basic XDP functionalities and XSk - Broadcom (bnxt): - support Hyper-V VF ID - dynamic SRIOV resource allocations for RoCE - Meta (fbnic): - support queue API, zero-copy Rx and Tx - support basic XDP functionalities - devlink health support for FW crashes and OTP mem corruptions - expand hardware stats coverage to FEC, PHY, and Pause - Wangxun: - support ethtool coalesce options - support for multiple RSS contexts - Ethernet virtual: - Macsec: - replace custom netlink attribute checks with policy-level checks - Bonding: - support aggregator selection based on port priority - Microsoft vNIC: - use page pool fragments for RX buffers instead of full pages to improve memory efficiency - Ethernet NICs consumer, and embedded: - Qualcomm: support Ethernet function for IPQ9574 SoC - Airoha: implement wlan offloading via NPU - Freescale - enetc: add NETC timer PTP driver and add PTP support - fec: enable the Jumbo frame support for i.MX8QM - Renesas (R-Car S4): - support HW offloading for layer 2 switching - support for RZ/{T2H, N2H} SoCs - Cadence (macb): support TAPRIO traffic scheduling - TI: - support for Gigabit ICSS ethernet SoC (icssm-prueth) - Synopsys (stmmac): a lot of cleanups - Ethernet PHYs: - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS driver - Support bcm63268 GPHY power control - Support for Micrel lan8842 PHY and PTP - Support for Aquantia AQR412 and AQR115 - CAN: - a large CAN-XL preparation work - reorganize raw_sock and uniqframe struct to minimize memory usage - rcar_canfd: update the CAN-FD handling - WiFi: - extended Neighbor Awareness Networking (NAN) support - S1G channel representation cleanup - improve S1G support - WiFi drivers: - Intel (iwlwifi): - major refactor and cleanup - Broadcom (brcm80211): - support for AP isolation - RealTek (rtw88/89) rtw88/89: - preparation work for RTL8922DE support - MediaTek (mt76): - HW restart improvements - MLO support - Qualcomm/Atheros (ath10k): - GTK rekey fixes - Bluetooth drivers: - btusb: support for several new IDs for MT7925 - btintel: support for BlazarIW core - btintel_pcie: support for _suspend() / _resume() - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs" * tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits) net: stmmac: Add support for Allwinner A523 GMAC200 dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible Revert "Documentation: net: add flow control guide and document ethtool API" octeontx2-pf: fix bitmap leak octeontx2-vf: fix bitmap leak net/mlx5e: Use extack in set rxfh callback net/mlx5e: Introduce mlx5e_rss_params for RSS configuration net/mlx5e: Introduce mlx5e_rss_init_params net/mlx5e: Remove unused mdev param from RSS indir init net/mlx5: Improve QoS error messages with actual depth values net/mlx5e: Prevent entering switchdev mode with inconsistent netns net/mlx5: HWS, Generalize complex matchers net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs selftests/net: add tcp_port_share to .gitignore Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set" net: add NUMA awareness to skb_attempt_defer_free() net: use llist for sd->defer_list net: make softnet_data.defer_count an atomic selftests: drv-net: psp: add tests for destroying devices selftests: drv-net: psp: add test for auto-adjusting TCP MSS ...	2025-10-02 15:17:01 -07:00
Linus Torvalds	ae28ed4578	bpf-next-6.18 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/ UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg= =LeQi -----END PGP SIGNATURE----- Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ...	2025-09-30 17:58:11 -07:00
Linus Torvalds	e4dcbdff11	Performance events updates for v6.18: Core perf code updates: - Convert mmap() related reference counts to refcount_t. This is in reaction to the recently fixed refcount bugs, which could have been detected earlier and could have mitigated the bug somewhat. (Thomas Gleixner, Peter Zijlstra) - Clean up and simplify the callchain code, in preparation for sframes. (Steven Rostedt, Josh Poimboeuf) Uprobes updates: - Add support to optimize usdt probes on x86-64, which gives a substantial speedup. (Jiri Olsa) - Cleanups and fixes on x86 (Peter Zijlstra) PMU driver updates: - Various optimizations and fixes to the Intel PMU driver (Dapeng Mi) Misc cleanups and fixes: - Remove redundant __GFP_NOWARN (Qianfeng Rong) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmjWpGIRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iHvxAAvO8qWbbhUdF3EZaFU0Wx6oh5KBhImU49 VZ107xe9llA0Szy3hIl1YpdOQA2NAHtma6We/ebonrPVTTkcSCGq8absc+GahA3I CHIomx2hjD0OQ01aHvTqgHJUdFUQQ0yzE3+FY6Tsn05JsNZvDmqpAMIoMQT0LuuG 7VvVRLBuDXtuMtNmGaGCvfDGKTZkGGxD6iZS1iWHuixvVAz4IECK0vYqSyh31UGA w9Jwa0thwjKm2EZTmcSKaHSM2zw3N8QXJ3SNPPThuMrtO6QDz2+3Da9kO+vhGcRP Jls9KnWC2wxNxqIs3dr80Mzn4qMplc67Ekx2tUqX4tYEGGtJQxW6tm3JOKKIgFMI g/KF9/WJPXp0rVI9mtoQkgndzyswR/ZJBAwfEQu+nAqlp3gmmQR9+MeYPCyNnyhB 2g22PTMbXkihJmRPAVeH+WhwFy1YY3nsRhh61ha3/N0ULXTHUh0E+hWwUVMifYSV SwXqQx4srlo6RJJNTji1d6R3muNjXCQNEsJ0lCOX6ajVoxWZsPH2x7/W1A8LKmY+ FLYQUi6X9ogQbOO3WxCjUhzp5nMTNA2vvo87MUzDlZOCLPqYZmqcjntHuXwdjPyO lPcfTzc2nK1Ud26bG3+p2Bk3fjqkX9XcTMFniOvjKfffEfwpAq4xRPBQ3uRlzn0V pf9067JYF+c= =sVXH -----END PGP SIGNATURE----- Merge tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Core perf code updates: - Convert mmap() related reference counts to refcount_t. This is in reaction to the recently fixed refcount bugs, which could have been detected earlier and could have mitigated the bug somewhat (Thomas Gleixner, Peter Zijlstra) - Clean up and simplify the callchain code, in preparation for sframes (Steven Rostedt, Josh Poimboeuf) Uprobes updates: - Add support to optimize usdt probes on x86-64, which gives a substantial speedup (Jiri Olsa) - Cleanups and fixes on x86 (Peter Zijlstra) PMU driver updates: - Various optimizations and fixes to the Intel PMU driver (Dapeng Mi) Misc cleanups and fixes: - Remove redundant __GFP_NOWARN (Qianfeng Rong)" * tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) selftests/bpf: Fix uprobe_sigill test for uprobe syscall error value uprobes/x86: Return error from uprobe syscall when not called from trampoline perf: Skip user unwind if the task is a kernel thread perf: Simplify get_perf_callchain() user logic perf: Use current->flags & PF_KTHREAD\|PF_USER_WORKER instead of current->mm == NULL perf: Have get_perf_callchain() return NULL if crosstask and user are set perf: Remove get_perf_callchain() init_nr argument perf/x86: Print PMU counters bitmap in x86_pmu_show_pmu_cap() perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48) perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error perf/x86/intel: Use early_initcall() to hook bts_init() uprobes: Remove redundant __GFP_NOWARN selftests/seccomp: validate uprobe syscall passes through seccomp seccomp: passthrough uprobe systemcall without filtering selftests/bpf: Fix uprobe syscall shadow stack test selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe selftests/bpf: Add uprobe_regs_equal test selftests/bpf: Add optimized usdt variant for basic usdt test ...	2025-09-30 11:11:21 -07:00
Daniel Borkmann	0e8e60e86c	selftests/bpf: Add test case for different expected_attach_type Add a small test case which adds two programs - one calling the other through a tailcall - and check that BPF rejects them in case of different expected_attach_type values: # ./vmtest.sh -- ./test_progs -t xdp_devmap [...] #641/1 xdp_devmap_attach/DEVMAP with programs in entries:OK #641/2 xdp_devmap_attach/DEVMAP with frags programs in entries:OK #641/3 xdp_devmap_attach/Verifier check of DEVMAP programs:OK #641/4 xdp_devmap_attach/DEVMAP with programs in entries on veth:OK #641 xdp_devmap_attach:OK Summary: 2/4 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250926171201.188490-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-27 06:24:27 -07:00
Matthieu Baerts (NGI0)	c5273f6ca1	mptcp: pm: rename 'subflows' to 'extra_subflows' A few variables linked to the Path-Managers are confusing, and it would help current and future developers, to clarify them. One of them is 'subflows', which in fact represents the number of extra subflows: all the additional subflows created after the initial one, and not the total number of subflows. While at it, add an additional name for the corresponding variable in MPTCP INFO: mptcpi_extra_subflows. Not to break the current uAPI, the new name is added as a 'define' pointing to the former name. This will then also help userspace devs. No functional changes intended. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-5-ad126cc47c6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:44:04 -07:00
Amery Hung	991e555eff	selftests/bpf: Test changing packet data from kfunc bpf_xdp_pull_data() is the first kfunc that changes packet data. Make sure the verifier clear all packet pointers after calling packet data changing kfunc. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250926164142.1850176-1-ameryhung@gmail.com	2025-09-26 10:44:51 -07:00
Tao Chen	d43029ff7d	selftests/bpf: Add stacktrace map lookup_and_delete_elem test case Add tests for stacktrace map lookup and delete: 1. use bpf_map_lookup_and_delete_elem to lookup and delete the target stack_id, 2. lookup the deleted stack_id again to double check. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250925175030.1615837-3-chen.dylane@linux.dev	2025-09-25 16:17:30 -07:00
Tao Chen	363b17e273	selftests/bpf: Refactor stacktrace_map case with skeleton The loading method of the stacktrace_map test case looks too outdated, refactor it with skeleton, and we can use global variable feature in the next patch. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250925175030.1615837-2-chen.dylane@linux.dev	2025-09-25 16:17:14 -07:00
Amery Hung	1193c46c17	selftests/bpf: Test changing packet data from global functions with a kfunc The verifier should invalidate all packet pointers after a packet data changing kfunc is called. So, similar to commit `3f23ee5590` ("selftests/bpf: test for changing packet data from global functions"), test changing packet data from global functions to make sure packet pointers are indeed invalidated. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250925170013.1752561-2-ameryhung@gmail.com	2025-09-25 14:52:09 -07:00
Jakub Kicinski	5e3fee34f6	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaNNwBQAKCRAraS+Z+3y5 E8heAQDdJTR9rwAL7gD79cldlHP5PTmjyidLIoFG/efaGSbN1AD9EdvrykDU4xOG aGaO8TooGUZf7vAL8tIFuMeydYvi/gM= =Qu4T -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-09-23 We've added 9 non-merge commits during the last 33 day(s) which contain a total of 10 files changed, 480 insertions(+), 53 deletions(-). The main changes are: 1) A new bpf_xdp_pull_data kfunc that supports pulling data from a frag into the linear area of a xdp_buff, from Amery Hung. This includes changes in the xdp_native.bpf.c selftest, which Nimrod's future work depends on. It is a merge from a stable branch 'xdp_pull_data' which has also been merged to bpf-next. There is a conflict with recent changes in 'include/net/xdp.h' in the net-next tree that will need to be resolved. 2) A compiler warning fix when CONFIG_NET=n in the recent dynptr skb_meta support, from Jakub Sitnicki. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests: drv-net: Pull data before parsing headers selftests/bpf: Test bpf_xdp_pull_data bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN bpf: Make variables in bpf_prog_test_run_xdp less confusing bpf: Clear packet pointers after changing packet data in kfuncs bpf: Support pulling non-linear xdp data bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail bpf: Clear pfmemalloc flag when freeing all fragments bpf: Return an error pointer for skb metadata when CONFIG_NET=n ==================== Link: https://patch.msgid.link/20250924050303.2466356-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-24 10:22:37 -07:00
Jiri Olsa	3d237467a4	selftests/bpf: Add kprobe multi write ctx attach test Adding test to check we can't attach kprobe multi program that writes to the context. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-7-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	1b881ee294	selftests/bpf: Add kprobe write ctx attach test Adding test to check we can't attach standard kprobe program that writes to the context. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	6a4ea0d1cb	selftests/bpf: Add uprobe context ip register change test Adding test to check we can change the application execution through instruction pointer change through uprobe program. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	7f8a05c5d3	selftests/bpf: Add uprobe context registers changes test Adding test to check we can change common register values through uprobe program. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Martin KaFai Lau	34f033a6c9	Merge branch 'bpf-next/xdp_pull_data' into 'bpf-next/master' Merge the xdp_pull_data stable branch into the master branch. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-09-23 16:23:58 -07:00
Amery Hung	323302f54d	selftests/bpf: Test bpf_xdp_pull_data Test bpf_xdp_pull_data() with xdp packets with different layouts. The xdp bpf program first checks if the layout is as expected. Then, it calls bpf_xdp_pull_data(). Finally, it checks the 0xbb marker at offset 1024 using directly packet access. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-8-ameryhung@gmail.com	2025-09-23 13:35:12 -07:00
Leon Hwang	1c6686bf7f	selftests/bpf: Add union argument tests using fexit programs Add test coverage for union argument support using fexit programs: * 8B union argument - verify that the verifier accepts it and that fexit programs can trace such functions. * 16B union argument - verify that the verifier accepts it and that fexit programs can access the argument, which is passed using two registers. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250919044110.23729-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:07:47 -07:00
Puranjay Mohan	f616549124	selftests: bpf: Add tests for signed loads from arena Add tests for loading 8, 16, and 32 bits with sign extension from arena, also verify that exception handling is working correctly and correct assembly is being generated by the x86 and arm64 JITs. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250923110157.18326-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:23 -07:00
Mykyta Yatsenko	c6ae18e0af	selftests/bpf: add bpf task work stress tests Add stress tests for BPF task-work scheduling kfuncs. The tests spawn multiple threads that concurrently schedule task_work callbacks against the same and different map values to exercise the kfuncs under high contention. Verify callbacks are reliably enqueued and executed with no drops. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20250923112404.668720-10-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:39 -07:00
Mykyta Yatsenko	39fd74dfd5	selftests/bpf: BPF task work scheduling tests Introducing selftests that check BPF task work scheduling mechanism. Validate that verifier does not accepts incorrect calls to bpf_task_work_schedule kfunc. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250923112404.668720-9-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:39 -07:00
Yonghong Song	5a427fddec	selftests/bpf: Fix selftest verifier_arena_large failure With latest llvm22, I got the following verification failure: ... ; int big_alloc2(void ctx) @ verifier_arena_large.c:207 0: (b4) w6 = 1 ; R6_w=1 ... ; if (err) @ verifier_arena_large.c:233 53: (56) if w6 != 0x0 goto pc+62 ; R6=0 54: (b7) r7 = -4 ; R7_w=-4 55: (18) r8 = 0x7f4000000000 ; R8_w=scalar() 57: (bf) r9 = addr_space_cast(r8, 0, 1) ; R8_w=scalar() R9_w=arena 58: (b4) w6 = 5 ; R6_w=5 ; pg = page[i]; @ verifier_arena_large.c:238 59: (bf) r1 = r7 ; R1_w=-4 R7_w=-4 60: (07) r1 += 4 ; R1_w=0 61: (79) r2 = (u64 )(r9 +0) ; R2_w=scalar() R9_w=arena ; if (pg != i) @ verifier_arena_large.c:239 62: (bf) r3 = addr_space_cast(r2, 0, 1) ; R2_w=scalar() R3_w=arena 63: (71) r3 = (u8 )(r3 +0) ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) 64: (5d) if r1 != r3 goto pc+51 ; R1_w=0 R3_w=0 ; bpf_arena_free_pages(&arena, (void __arena )pg, 2); @ verifier_arena_large.c:241 65: (18) r1 = 0xff11000114548000 ; R1_w=map_ptr(map=arena,ks=0,vs=0) 67: (b4) w3 = 2 ; R3_w=2 68: (85) call bpf_arena_free_pages#72675 ; 69: (b7) r1 = 0 ; R1_w=0 ; page[i + 1] = NULL; @ verifier_arena_large.c:243 70: (7b) (u64 *)(r8 +8) = r1 R8 invalid mem access 'scalar' processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2 ============= #489/5 verifier_arena_large/big_alloc2:FAIL The main reason is that 'r8' in insn '70' is not an arena pointer. Further debugging at llvm side shows that llvm commit ([1]) caused the failure. For the original code: page[i] = NULL; page[i + 1] = NULL; the llvm transformed it to something like below at source level: __builtin_memset(&page[i], 0, 16) Such transformation prevents llvm BPFCheckAndAdjustIR pass from generating proper addr_space_cast insns ([2]). Adding support in llvm BPFCheckAndAdjustIR pass should work, but not sure that such a pattern exists or not in real applications. At the same time, simply adding a memory barrier between two 'page' assignment can fix the issue. [1] https://github.com/llvm/llvm-project/pull/155415 [2] https://github.com/llvm/llvm-project/pull/84410 Cc: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250920045805.3288551-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-20 15:49:35 -07:00
Eduard Zingerman	fdcecdff90	selftests/bpf: test cases for callchain sensitive live stack tracking - simple propagation of read/write marks; - joining read/write marks from conditional branches; - avoid must_write marks in when same instruction accesses different stack offsets on different execution paths; - avoid must_write marks in case same instruction accesses stack and non-stack pointers on different execution paths; - read/write marks propagation to outer stack frame; - independent read marks for different callchains ending with the same function; - bpf_calls_callback() dependent logic in liveness.c:bpf_stack_slot_alive(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-12-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-19 09:27:24 -07:00
Eduard Zingerman	34c513be3d	selftests/bpf: __not_msg() tag for test_loader framework This patch adds tags __not_msg(<msg>) and __not_msg_unpriv(<msg>). Test fails if <msg> is found in verifier log. If __msg_not() is situated between __msg() tags framework matches __msg() tags first, and then checks that <msg> is not present in a portion of a log between bracketing __msg() tags. __msg_not() tags bracketed by a same __msg() group are effectively unordered. The idea is borrowed from LLVM's CheckFile with its CHECK-NOT syntax. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-11-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-19 09:27:23 -07:00
Eduard Zingerman	107e169799	bpf: disable and remove registers chain based liveness Remove register chain based liveness tracking: - struct bpf_reg_state->{parent,live} fields are no longer needed; - REG_LIVE_WRITTEN marks are superseded by bpf_mark_stack_write() calls; - mark_reg_read() calls are superseded by bpf_mark_stack_read(); - log.c:print_liveness() is superseded by logging in liveness.c; - propagate_liveness() is superseded by bpf_update_live_stack(); - no need to establish register chains in is_state_visited() anymore; - fix a bunch of tests expecting "_w" suffixes in verifier log messages. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-9-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-19 09:27:23 -07:00
KP Singh	ea2e6467ac	bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD Currently only array maps are supported, but the implementation can be extended for other maps and objects. The hash is memoized only for exclusive and frozen maps as their content is stable until the exclusive program modifies the map. This is required for BPF signing, enabling a trusted loader program to verify a map's integrity. The loader retrieves the map's runtime hash from the kernel and compares it against an expected hash computed at build time. Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-7-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-18 19:11:42 -07:00
KP Singh	6c850cbca8	selftests/bpf: Add tests for exclusive maps Check if access is denied to another program for an exclusive map Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-6-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-18 19:11:42 -07:00
Kumar Kartikeya Dwivedi	8b788d6638	selftests/bpf: Add tests for KF_RCU_PROTECTED Add a couple of test cases to ensure RCU protection is kicked in automatically, and the return type is as expected. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250917032755.4068726-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-18 15:36:17 -07:00
Kumar Kartikeya Dwivedi	1512231b6c	bpf: Enforce RCU protection for KF_RCU_PROTECTED Currently, KF_RCU_PROTECTED only applies to iterator APIs and that too in a convoluted fashion: the presence of this flag on the kfunc is used to set MEM_RCU in iterator type, and the lack of RCU protection results in an error only later, once next() or destroy() methods are invoked on the iterator. While there is no bug, this is certainly a bit unintuitive, and makes the enforcement of the flag iterator specific. In the interest of making this flag useful for other upcoming kfuncs, e.g. scx_bpf_cpu_curr() [0][1], add enforcement for invoking the kfunc in an RCU critical section in general. This would also mean that iterator APIs using KF_RCU_PROTECTED will error out earlier, instead of throwing an error for lack of RCU CS protection when next() or destroy() methods are invoked. In addition to this, if the kfuncs tagged KF_RCU_PROTECTED return a pointer value, ensure that this pointer value is only usable in an RCU critical section. There might be edge cases where the return value is special and doesn't need to imply MEM_RCU semantics, but in general, the assumption should hold for the majority of kfuncs, and we can revisit things if necessary later. [0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com [1]: https://lore.kernel.org/all/20250909195709.92669-1-arighi@nvidia.com Tested-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250917032755.4068726-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-18 15:36:17 -07:00
Eduard Zingerman	a24a2dda70	selftests/bpf: trigger verifier.c:maybe_exit_scc() for a speculative state This is a test case minimized from a syzbot reproducer from [1]. The test case triggers verifier.c:maybe_exit_scc() w/o preceding call to verifier.c:maybe_enter_scc() on a speculative symbolic execution path. Here is verifier log for the test case: Live regs before insn: 0: .......... (b7) r0 = 100 1 1: 0......... (7b) (u64 )(r10 -512) = r0 1 2: 0......... (b5) if r0 <= 0x0 goto pc-2 3: 0......... (95) exit 0: R1=ctx() R10=fp0 0: (b7) r0 = 100 ; R0_w=100 1: (7b) (u64 )(r10 -512) = r0 ; R0_w=100 R10=fp0 fp-512_w=100 2: (b5) if r0 <= 0x0 goto pc-2 mark_precise: ... 2: R0_w=100 3: (95) exit from 2 to 1 (speculative execution): R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100 1: R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100 1: (7b) (u64 )(r10 -512) = r0 processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 - Non-speculative execution path 0-3 does not allocate any checkpoints (and hence does not call maybe_enter_scc()), and schedules a speculative jump from 2 to 1. - Speculative execution path stops immediately because of an infinite loop detection and triggers verifier.c:update_branch_counts() -> maybe_exit_scc() calls. [1] https://lore.kernel.org/bpf/68c85acd.050a0220.2ff435.03a4.GAE@google.com/ Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250916212251.3490455-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-17 11:19:58 -07:00
Paul Chaignon	180a46bc1a	selftests/bpf: Test accesses to ctx padding This patch adds tests covering the various paddings in ctx structures. In case of sk_lookup BPF programs, the behavior is a bit different because accesses to the padding are explicitly allowed. Other cases result in a clear reject from the verifier. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/3dc5f025e350aeb2bb1c257b87c577518e574aeb.1758094761.git.paul.chaignon@gmail.com	2025-09-17 16:15:57 +02:00
Paul Chaignon	7c60f6e488	selftests/bpf: Move macros to bpf_misc.h Move the sizeof_field and offsetofend macros from individual test files to the common bpf_misc.h to avoid duplication. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/97a3f3788bd3aec309100bc073a5c77130e371fd.1758094761.git.paul.chaignon@gmail.com	2025-09-17 16:15:37 +02:00
Kumar Kartikeya Dwivedi	a8250d167c	selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns Make sure that we only switch the cgroup namespace and enter a new cgroup in a child process separate from test_progs, to not mess up the environment for subsequent tests. To remove this cgroup, we need to wait for the child to exit, and then rmdir its cgroup. If the read call fails, or waitpid succeeds, we know the child exited (read call would fail when the last pipe end is closed, otherwise waitpid waits until exit(2) is called). We then invoke a newly introduced remove_cgroup_pid() helper, that identifies cgroup path using the passed in pid of the now dead child, instead of using the current process pid (getpid()). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250915032618.1551762-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-15 10:53:15 -07:00
Saket Kumar Bhaskar	a9d4e9f0e8	selftests/bpf: Fix arena_spin_lock selftest failure For systems having CONFIG_NR_CPUS set to > 1024 in kernel config the selftest fails as arena_spin_lock_irqsave() returns EOPNOTSUPP. (eg - incase of powerpc default value for CONFIG_NR_CPUS is 8192) The selftest is skipped incase bpf program returns EOPNOTSUPP, with a descriptive message logged. Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Link: https://lore.kernel.org/r/20250913091337.1841916-1-skb99@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-15 10:49:18 -07:00
Jakub Kicinski	fc3a281041	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc6). Conflicts: net/netfilter/nft_set_pipapo.c net/netfilter/nft_set_pipapo_avx2.c `c4eaca2e10` ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") `84c1da7b38` ("netfilter: nft_set_pipapo: use avx2 algorithm for insertions too") Only trivial adjacent changes (in a doc and a Makefile). Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 17:40:13 -07:00
Puranjay Mohan	86f2225065	selftests/bpf: Add tests for arena fault reporting Add selftests for testing the reporting of arena page faults through BPF streams. Two new bpf programs are added that read and write to an unmapped arena address and the fault reporting is verified in the userspace through streams. The added bpf programs need to access the user_vm_start in struct bpf_arena, this is done by casting &arena to struct bpf_arena *, but barrier_var() is used on this ptr before accessing ptr->user_vm_start; to stop GCC from issuing an out-of-bound access due to the cast from smaller map struct to larger "struct bpf_arena" Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250911145808.58042-7-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-11 13:00:44 -07:00
Puranjay Mohan	edd03fcd76	selftests: bpf: use __stderr in stream error tests Start using __stderr directly in the bpf programs to test the reporting of may_goto timeout detection and spin_lock dead lock detection. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250911145808.58042-6-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-11 13:00:44 -07:00
Puranjay Mohan	744eeb2b27	selftests: bpf: introduce __stderr and __stdout Add __stderr and __stdout to validate the output of BPF streams for bpf selftests. Similar to __xlated, __jited, etc., __stderr/out can be used in the BPF progs to compare a string (regex supported) to the output in the bpf streams. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250911145808.58042-5-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-11 13:00:44 -07:00
Alexei Starovoitov	5d87e96a49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5 Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-11 09:34:37 -07:00
Rong Tao	6624fb2f33	selftests/bpf: Add tests for bpf_strnstr Add tests for bpf_strnstr(): bpf_strnstr("", "", 0) = 0 bpf_strnstr("hello world", "hello", 5) = 0 bpf_strnstr(str, "hello", 4) = -ENOENT bpf_strnstr("", "a", 0) = -ENOENT Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/tencent_2ED218F8082565C95D86A804BDDA8DBA200A@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-09 15:07:58 -07:00
Ilya Leoshkevich	5d40c038c8	selftests/bpf: Fix "expression result unused" warnings with icecc icecc is a compiler wrapper that distributes compile jobs over a build farm [1]. It works by sending toolchain binaries and preprocessed source code to remote machines. Unfortunately using it with BPF selftests causes build failures due to a clang bug [2]. The problem is that clang suppresses the -Wunused-value warning if the unused expression comes from a macro expansion. Since icecc compiles preprocessed source code, this information is not available. This leads to -Wunused-value false positives. obj_new_no_struct() and obj_new_acq() use the bpf_obj_new() macro and discard the result. arena_spin_lock_slowpath() uses two macros that produce values and ignores the results. Add (void) casts to explicitly indicate that this is intentional and suppress the warning. An alternative solution is to change the macros to not produce values. This would work today for the arena_spin_lock_slowpath() issue, but in the future there may appear users who need them. Another potential solution is to replace these macros with functions. Unfortunately this would not work, because these macros work with unknown types and control flow. [1] https://github.com/icecc/icecream [2] https://github.com/llvm/llvm-project/issues/142614 Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250829030017.102615-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-09 15:07:58 -07:00
Daniel Borkmann	3aa9b9a165	selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer Small cleanup and test extension to probe the bpf_crypto_{encrypt,decrypt}() kfunc when a bad dst buffer is passed in to assert that an error is returned. Also, encrypt_sanity() and skb_crypto_setup() were explicit to set the global status variable to zero before any test, so do the same for decrypt_sanity(). Do not explicitly zero the on-stack err before bpf_crypto_ctx_create() given the kfunc is expected to do it internally for the success case. Before kernel fix: # ./vmtest.sh -- ./test_progs -t crypto [...] [ 1.531200] bpf_testmod: loading out-of-tree module taints kernel. [ 1.533388] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #87/1 crypto_basic/crypto_release:OK #87/2 crypto_basic/crypto_acquire:OK #87 crypto_basic:OK test_crypto_sanity:PASS:skel open 0 nsec test_crypto_sanity:PASS:ip netns add crypto_sanity_ns 0 nsec test_crypto_sanity:PASS:ip -net crypto_sanity_ns -6 addr add face::1/128 dev lo nodad 0 nsec test_crypto_sanity:PASS:ip -net crypto_sanity_ns link set dev lo up 0 nsec test_crypto_sanity:PASS:open_netns 0 nsec test_crypto_sanity:PASS:AF_ALG init fail 0 nsec test_crypto_sanity:PASS:if_nametoindex lo 0 nsec test_crypto_sanity:PASS:skb_crypto_setup fd 0 nsec test_crypto_sanity:PASS:skb_crypto_setup 0 nsec test_crypto_sanity:PASS:skb_crypto_setup retval 0 nsec test_crypto_sanity:PASS:skb_crypto_setup status 0 nsec test_crypto_sanity:PASS:create qdisc hook 0 nsec test_crypto_sanity:PASS:make_sockaddr 0 nsec test_crypto_sanity:PASS:attach encrypt filter 0 nsec test_crypto_sanity:PASS:encrypt socket 0 nsec test_crypto_sanity:PASS:encrypt send 0 nsec test_crypto_sanity:FAIL:encrypt status unexpected error: -5 (errno 95) #88 crypto_sanity:FAIL Summary: 1/2 PASSED, 0 SKIPPED, 1 FAILED After kernel fix: # ./vmtest.sh -- ./test_progs -t crypto [...] [ 1.540963] bpf_testmod: loading out-of-tree module taints kernel. [ 1.542404] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #87/1 crypto_basic/crypto_release:OK #87/2 crypto_basic/crypto_acquire:OK #87 crypto_basic:OK #88 crypto_sanity:OK Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20250829143657.318524-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-09 15:07:57 -07:00
Leon Hwang	88a3bde432	selftests/bpf: Add case to test bpf_in_interrupt() Add a timer test case to test 'bpf_in_interrupt()'. cd tools/testing/selftests/bpf ./test_progs -t timer_interrupt 462 timer_interrupt:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250903140438.59517-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:05:58 -07:00
Rong Tao	abc8a952d4	selftests/bpf: Test kfunc bpf_strcasecmp Add testsuites for kfunc bpf_strcasecmp. Signed-off-by: Rong Tao <rongtao@cestc.cn> Link: https://lore.kernel.org/r/tencent_81A1A0ACC04B68158C57C4D151C46A832B07@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:57 -07:00
Menglong Dong	a85d888768	selftests/bpf: add benchmark testing for kprobe-multi-all For now, the benchmark for kprobe-multi is single, which means there is only 1 function is hooked during testing. Add the testing "kprobe-multi-all", which will hook all the kernel functions during the benchmark. And the "kretprobe-multi-all" is added too. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20250904021011.14069-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-04 09:00:25 -07:00
Eric Dumazet	51132b99f0	udp: add drop_counters to udp socket When a packet flood hits one or more UDP sockets, many cpus have to update sk->sk_drops. This slows down other cpus, because currently sk_drops is in sock_write_rx group. Add a socket_drop_counters structure to udp sockets. Using dedicated cache lines to hold drop counters makes sure that consumers no longer suffer from false sharing if/when producers only change sk->sk_drops. This adds 128 bytes per UDP socket. Tested with the following stress test, sending about 11 Mpps to a dual socket AMD EPYC 7B13 64-Core. super_netperf 20 -t UDP_STREAM -H DUT -l10 -- -n -P,1000 -m 120 Note: due to socket lookup, only one UDP socket is receiving packets on DUT. Then measure receiver (DUT) behavior. We can see both consumer and BH handlers can process more packets per second. Before: nstat -n ; sleep 1 ; nstat \| grep Udp Udp6InDatagrams 615091 0.0 Udp6InErrors 3904277 0.0 Udp6RcvbufErrors 3904277 0.0 After: nstat -n ; sleep 1 ; nstat \| grep Udp Udp6InDatagrams 816281 0.0 Udp6InErrors 7497093 0.0 Udp6RcvbufErrors 7497093 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250826125031.1578842-5-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-28 13:14:50 +02:00
Matt Fleming	737433c6a5	selftests/bpf: Add LPM trie microbenchmarks Add benchmarks for the standard set of operations: LOOKUP, INSERT, UPDATE, DELETE. Also include benchmarks to measure the overhead of the bench framework itself (NOOP) as well as the overhead of generating keys (BASELINE). Lastly, this includes a benchmark for FREE (trie_free()) which is known to have terrible performance for maps with many entries. Benchmarks operate on tries without gaps in the key range, i.e. each test begins or ends with a trie with valid keys in the range [0, nr_entries). This is intended to cause maximum branching when traversing the trie. LOOKUP, UPDATE, DELETE, and FREE fill a BPF LPM trie from userspace using bpf_map_update_batch() and run the corresponding benchmark operation via bpf_loop(). INSERT starts with an empty map and fills it kernel-side from bpf_loop(). FREE records the time to free a filled LPM trie by attaching and destroying a BPF prog. NOOP measures the overhead of the test harness by running an empty function with bpf_loop(). BASELINE is similar to NOOP except that the function generates a key. Each operation runs 10,000 times using bpf_loop(). Note that this value is intentionally independent of the number of entries in the LPM trie so that the stability of the results isn't affected by the number of entries. For those benchmarks that need to reset the LPM trie once it's full (INSERT) or empty (DELETE), throughput and latency results are scaled by the fraction of a second the operation actually ran to ignore any time spent reinitialising the trie. By default, benchmarks run using sequential keys in the range [0, nr_entries). BASELINE, LOOKUP, and UPDATE can use random keys via the --random parameter but beware there is a runtime cost involved in generating random keys. Other benchmarks are prohibited from using random keys because it can skew the results, e.g. when inserting an existing key or deleting a missing one. All measurements are recorded from within the kernel to eliminate syscall overhead. Most benchmarks run an XDP program to generate stats but FREE needs to collect latencies using fentry/fexit on map_free_deferred() because it's not possible to use fentry directly on lpm_trie.c since commit `c83508da56` ("bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs") and there's no way to create/destroy a map from within an XDP program. Here is example output from an AMD EPYC 9684X 96-Core machine for each of the benchmarks using a trie with 10K entries and a 32-bit prefix length, e.g. $ ./bench lpm-trie-$op \ --prefix_len=32 \ --producers=1 \ --nr_entries=10000 noop: throughput 74.417 ± 0.032 M ops/s ( 74.417M ops/prod), latency 13.438 ns/op baseline: throughput 70.107 ± 0.171 M ops/s ( 70.107M ops/prod), latency 14.264 ns/op lookup: throughput 8.467 ± 0.047 M ops/s ( 8.467M ops/prod), latency 118.109 ns/op insert: throughput 2.440 ± 0.015 M ops/s ( 2.440M ops/prod), latency 409.290 ns/op update: throughput 2.806 ± 0.042 M ops/s ( 2.806M ops/prod), latency 356.322 ns/op delete: throughput 4.625 ± 0.011 M ops/s ( 4.625M ops/prod), latency 215.613 ns/op free: throughput 0.578 ± 0.006 K ops/s ( 0.578K ops/prod), latency 1.730 ms/op And the same benchmarks using random keys: $ ./bench lpm-trie-$op \ --prefix_len=32 \ --producers=1 \ --nr_entries=10000 \ --random noop: throughput 74.259 ± 0.335 M ops/s ( 74.259M ops/prod), latency 13.466 ns/op baseline: throughput 35.150 ± 0.144 M ops/s ( 35.150M ops/prod), latency 28.450 ns/op lookup: throughput 7.119 ± 0.048 M ops/s ( 7.119M ops/prod), latency 140.469 ns/op insert: N/A update: throughput 2.736 ± 0.012 M ops/s ( 2.736M ops/prod), latency 365.523 ns/op delete: N/A free: N/A Signed-off-by: Matt Fleming <mfleming@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20250827140149.1001557-1-matt@readmodwrite.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-27 17:28:14 -07:00
Puranjay Mohan	22b22bf9ee	selftests/bpf: Enable timed may_goto tests for arm64 As arm64 JIT now supports timed may_goto instruction, make sure all relevant tests run on this architecture. Some tests were enabled and other required modifications to work properly on arm64. $ ./test_progs -a "stream","may_goto*",verifier_bpf_fastcall #404 stream_errors:OK [...] #406/2 stream_success/stream_cond_break:OK [...] #494/23 verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP #494/24 verifier_bpf_fastcall/may_goto_interaction_arm64:OK [...] #539/1 verifier_may_goto_1/may_goto 0:OK #539/2 verifier_may_goto_1/batch 2 of may_goto 0:OK #539/3 verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK #539/4 verifier_may_goto_1/may_goto batch with offsets 2/0:OK #539 verifier_may_goto_1:OK #540/1 verifier_may_goto_2/C code with may_goto 0:OK #540 verifier_may_goto_2:OK Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20250827113245.52629-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-08-27 17:16:22 -07:00
Jiawei Zhao	69424097ee	selftests/bpf: Enrich subtest_basic_usdt case in selftests to cover SIB handling logic When using GCC on x86-64 to compile an usdt prog with -O1 or higher optimization, the compiler will generate SIB addressing mode for global array, e.g. "1@-96(%rbp,%rax,8)". In this patch: - enrich subtest_basic_usdt test case to cover SIB addressing usdt argument spec handling logic Signed-off-by: Jiawei Zhao <phoenix500526@163.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250827053128.1301287-3-phoenix500526@163.com	2025-08-27 15:48:05 -07:00
Shubham Sharma	d3abefe897	selftests/bpf: Fix typos and grammar in test sources Fix spelling typos and grammar errors in BPF selftests source code. Signed-off-by: Shubham Sharma <slopixelz@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250826125746.17983-1-slopixelz@gmail.com	2025-08-27 15:13:08 -07:00
Nandakumar Edamana	2660b9d477	bpf: Add selftest to check the verifier's abstract multiplication Add new selftest to test the abstract multiplication technique(s) used by the verifier, following the recent improvement in tnum multiplication (tnum_mul). One of the newly added programs, verifier_mul/mul_precise, results in a false positive with the old tnum_mul, while the program passes with the latest one. Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250826034524.2159515-2-nandakumar@nandakumar.co.in	2025-08-27 15:00:31 -07:00

1 2 3 4 5 ...

1898 Commits (09cfd3c52ea76f43b3cb15e570aeddf633d65e80)