Core & protocols
----------------
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd passing
functionality. New method based on Tarjan's Strongly Connected Components
algorithm should be both faster and remove a lot of workarounds
we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP packets
and forwarding them together. Useful for small switches / routers which
lack basic checksum offload in some scenarios (e.g. PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't
use NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory accounting,
RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked,
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states.
State can be used either for input or output packet processing.
Things we sprinkled into general kernel code
--------------------------------------------
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter
---------
- Use GFP_KERNEL to clone elements, to deal better with OOM situations
and avoid failures in the .commit step.
BPF
---
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. "Session mode" is a common use-case for tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw tracepoint
programs in order to ease migration from classic to raw tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V JITs.
This allows inlining functions which need to access per-CPU state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction.
Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor process-context
bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API
----------
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line) config.
- Initial bits of a queue control API, for now allowing a single queue
to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling
-----------------
- Remove the need to create a config file to run the net forwarding tests
so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test machine).
Add a few such tests.
- Create a shared code library for writing Python tests. Expose the YAML
Netlink library from tools/ to the tests for easy Netlink access.
- Move netfilter tests under net/, extend them, separate performance tests
from correctness tests, and iron out issues found by running them
"on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF info,
TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers
-------
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application rather
than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it messes up
TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the MII
bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
=EsC2
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...
Since BT_HS has been remove HCI_AMP controllers no longer has any use so
remove it along with the capability of creating AMP controllers.
Since we no longer need to differentiate between AMP and Primary
controllers, as only HCI_PRIMARY is left, this also remove
hdev->dev_type altogether.
Fixes: e7b02296fb ("Bluetooth: Remove BT_HS")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZkGcZAAKCRDbK58LschI
g6o6APwLsqhrM2w71VUN5ciCxu4H5VDtZp6wkdqtVbxxU4qNxQEApKgYgKt8ZLF3
Kily5c7m+S4ZXhMX21rb8JhSAz0dfQk=
=5Dk7
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-13
We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).
The main changes are:
1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.
2) Add BPF range computation improvements to the verifier in particular
around XOR and OR operators, refactoring of checks for range computation
and relaxing MUL range computation so that src_reg can also be an unknown
scalar, from Cupertino Miranda.
3) Add support to attach kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. Session mode is a common use-case for tetragon and bpftrace,
from Jiri Olsa.
4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.
5) Improvements to BPF selftests in context of BPF gcc backend,
from Jose E. Marchesi & David Faust.
6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
-style in order to retire the old test, run it in BPF CI and additionally
expand test coverage, from Jordan Rife.
7) Big batch for BPF selftest refactoring in order to remove duplicate code
around common network helpers, from Geliang Tang.
8) Another batch of improvements to BPF selftests to retire obsolete
bpf_tcp_helpers.h as everything is available vmlinux.h,
from Martin KaFai Lau.
9) Fix BPF map tear-down to not walk the map twice on free when both timer
and wq is used, from Benjamin Tissoires.
10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
from Alexei Starovoitov.
11) Change BTF build scripts to using --btf_features for pahole v1.26+,
from Alan Maguire.
12) Small improvements to BPF reusing struct_size() and krealloc_array(),
from Andy Shevchenko.
13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
from Ilya Leoshkevich.
14) Extend TCP ->cong_control() callback in order to feed in ack and
flag parameters and allow write-access to tp->snd_cwnd_stamp
from BPF program, from Miao Xu.
15) Add support for internal-only per-CPU instructions to inline
bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
from Puranjay Mohan.
16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
from Tushar Vyavahare.
17) Extend libbpf to support "module:<function>" syntax for tracing
programs, from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
bpf: make list_for_each_entry portable
bpf: ignore expected GCC warning in test_global_func10.c
bpf: disable strict aliasing in test_global_func9.c
selftests/bpf: Free strdup memory in xdp_hw_metadata
selftests/bpf: Fix a few tests for GCC related warnings.
bpf: avoid gcc overflow warning in test_xdp_vlan.c
tools: remove redundant ethtool.h from tooling infra
selftests/bpf: Expand ATTACH_REJECT tests
selftests/bpf: Expand getsockname and getpeername tests
sefltests/bpf: Expand sockaddr hook deny tests
selftests/bpf: Expand sockaddr program return value tests
selftests/bpf: Retire test_sock_addr.(c|sh)
selftests/bpf: Remove redundant sendmsg test cases
selftests/bpf: Migrate ATTACH_REJECT test cases
selftests/bpf: Migrate expected_attach_type tests
selftests/bpf: Migrate wildcard destination rewrite test
selftests/bpf: Migrate sendmsg6 v4 mapped address tests
selftests/bpf: Migrate sendmsg deny test cases
selftests/bpf: Migrate WILDCARD_IP test
selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
...
====================
Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TX queue stop and wake are counted by some drivers.
Support reporting these via netdev-genl queue stats.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240510201927.1821109-2-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
API:
- Remove crypto stats interface.
Algorithms:
- Add faster AES-XTS on modern x86_64 CPUs.
- Forbid curves with order less than 224 bits in ecc (FIPS 186-5).
- Add ECDSA NIST P521.
Drivers:
- Expose otp zone in atmel.
- Add dh fallback for primes > 4K in qat.
- Add interface for live migration in qat.
- Use dma for aes requests in starfive.
- Add full DMA support for stm32mpx in stm32.
- Add Tegra Security Engine driver.
Others:
- Introduce scope-based x509_certificate allocation.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmZBjXMACgkQxycdCkmx
i6cQ7g/+JPKnzQedhpJSK5AnkAkqO9kJ16JdeB7AtdSeZZA/EIFxuXZ3Fv1fH44y
1CCibowc5zdss8F/1iOqPc57u5vy2Mjyw8qlhs7JlmcYf/lo7CBGfT8Uxo7BK/S9
n+/+y47Xu5p3yt/c6ldrwqjOaWaYuaCKICZtS91XVvrxM80iVnmDSQCNkcch4KQ4
nsdcVJhS4lOStBNjKtkhWlgufqdp8RPzKYH2B6GbW9z6en8WeTbnoMhgqjqQ3UID
/DHtixyee0MDUDReQrixyCM3XMV5er/qBMoDrCxipBuVrr4GMd2GlCEaZbXfTUW0
3K8Nle4KMMqi81lBAQKiD/hRjrC68FHOvVRGHtZntR0+NZ/nlinXCVWv4iHwRzAB
7BOqRTC3mfv+uMhTvgwQAkXCHAhivMokSzTaDCIrzPLjKIx2BOfVZKmPBt98LxeW
8/JfgEK4gX6wxe4GRftueEApCfWQrwYK60j5bIkescaJ/mI7M5bEByvTTob1lAka
Fw5kGDy8dVnrG9HagLwnXoI1pIGmca8hV1t24Vf1OCdWLgOW+GTCIuyutL2c9AWv
0vEbytGZl69XJlIgQGVcv9RM6NlIXxHwfSHU59N/SHTXhlHjm1XWi3HCiJaZ1b6+
pcILMJ29FMs8LobiN7PT+rNu6fboaH0/o+R7OK9mKRut864xFTk=
=NDS0
-----END PGP SIGNATURE-----
Merge tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Remove crypto stats interface
Algorithms:
- Add faster AES-XTS on modern x86_64 CPUs
- Forbid curves with order less than 224 bits in ecc (FIPS 186-5)
- Add ECDSA NIST P521
Drivers:
- Expose otp zone in atmel
- Add dh fallback for primes > 4K in qat
- Add interface for live migration in qat
- Use dma for aes requests in starfive
- Add full DMA support for stm32mpx in stm32
- Add Tegra Security Engine driver
Others:
- Introduce scope-based x509_certificate allocation"
* tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (123 commits)
crypto: atmel-sha204a - provide the otp content
crypto: atmel-sha204a - add reading from otp zone
crypto: atmel-i2c - rename read function
crypto: atmel-i2c - add missing arg description
crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy()
crypto: sahara - use 'time_left' variable with wait_for_completion_timeout()
crypto: api - use 'time_left' variable with wait_for_completion_killable_timeout()
crypto: caam - i.MX8ULP donot have CAAM page0 access
crypto: caam - init-clk based on caam-page0-access
crypto: starfive - Use fallback for unaligned dma access
crypto: starfive - Do not free stack buffer
crypto: starfive - Skip unneeded fallback allocation
crypto: starfive - Skip dma setup for zeroed message
crypto: hisilicon/sec2 - fix for register offset
crypto: hisilicon/debugfs - mask the unnecessary info from the dump
crypto: qat - specify firmware files for 402xx
crypto: x86/aes-gcm - simplify GCM hash subkey derivation
crypto: x86/aes-gcm - delete unused GCM assembly code
crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()
hwrng: stm32 - repair clock handling
...
A way for an application to know if an MPTCP connection fell back to TCP
is to use getsockopt(MPTCP_INFO) and look for errors. The issue with
this technique is that the same errors -- EOPNOTSUPP (IPv4) and
ENOPROTOOPT (IPv6) -- are returned if there was a fallback, *or* if the
kernel doesn't support this socket option. The userspace then has to
look at the kernel version to understand what the errors mean.
It is not clean, and it doesn't take into account older kernels where
the socket option has been backported. A cleaner way would be to expose
this info to the TCP socket level. In case of MPTCP socket where no
fallback happened, the socket options for the TCP level will be handled
in MPTCP code, in mptcp_getsockopt_sol_tcp(). If not, that will be in
TCP code, in do_tcp_getsockopt(). So MPTCP simply has to set the value
1, while TCP has to set 0.
If the socket option is not supported, one of these two errors will be
reported:
- EOPNOTSUPP (95 - Operation not supported) for MPTCP sockets
- ENOPROTOOPT (92 - Protocol not available) for TCP sockets, e.g. on the
socket received after an 'accept()', when the client didn't request to
use MPTCP: this socket will be a TCP one, even if the listen socket
was an MPTCP one.
With this new option, the kernel can return a clear answer to both "Is
this kernel new enough to tell me the fallback status?" and "If it is
new enough, is it currently a TCP or MPTCP socket?" questions, while not
breaking the previous method.
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240509-upstream-net-next-20240509-mptcp-tcp_is_mptcp-v1-1-f846df999202@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- selftests: Add str*cmp tests (Ivan Orlov)
- __counted_by: provide UAPI for _le/_be variants (Erick Archer)
- Various strncpy deprecation refactors (Justin Stitt)
- stackleak: Use a copy of soon-to-be-const sysctl table (Thomas Weißschuh)
- UBSAN: Work around i386 -regparm=3 bug with Clang prior to version 19
- Provide helper to deal with non-NUL-terminated string copying
- SCSI: Fix older string copying bugs (with new helper)
- selftests: Consolidate string helper behavioral tests
- selftests: add memcpy() fortify tests
- string: Add additional __realloc_size() annotations for "dup" helpers
- LKDTM: Fix KCFI+rodata+objtool confusion
- hardening.config: Enable KCFI
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmY/yCUWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuf2D/9xlQA7UxUDlm1Z6DPYzTZfNm4M
D+RJ1QoLNbZEYSzULWvfRSWI+c82qINoSgvtv2DdhWqSKivcMoeNDN846gewfwMY
0q3iChbhPaNBAHaXat1pf0iA6q2n/wpg1jv1C1PmPVSaEpl0CeQ2MLXSOMz9Gb7G
FkkaN/v+YlShUzkw61KwKPg959/bh5vCBbeLjSd1XAhLGKU7nWw4yj0J3usTnRbV
icCnW4mk9SD+pIli/+n7t/QIvPMf6TrJZoSgH9P7YNm+wNme4UEAm1PJz8F+KVAH
D3CJhlH36l8TrndsHMsHgDjKtUUchh+ExOlWGw3ObUnbU7ST2JP6crAdjtnyT2eN
uF+ELBT97SskFBAlzOzBSIs8lEwBZzTdJCmWqEBr3ZxxR7lcClmqbJY+X/FhvXko
o7PvtCbHCatpDPJPZ0e25nVsfEJS29RUED5Gen6vWcUtuvdFEgws70s5BDAbSZTo
RoJsuDqlRAFLdNDYmEN3UTGcm+PBjPgKsBrXiiNr4Y0BilU67Bzdmd8jiZC9ARe6
+3cfQRs0uWdemANzvrN5FnrIUhjRHWTvfVTXcC9Jt53HntIuMhhRajJuMcTAX5uQ
iWACUR14RL8lfInS8phWB5T4AvNexTFc6kVRqNzsGB0ZutsnAsqELttCk57tYQVr
Hlv/MbePyyLSKF/nYA==
=CgsW
-----END PGP SIGNATURE-----
Merge tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"The bulk of the changes here are related to refactoring and expanding
the KUnit tests for string helper and fortify behavior.
Some trivial strncpy replacements in fs/ were carried in my tree. Also
some fixes to SCSI string handling were carried in my tree since the
helper for those was introduce here. Beyond that, just little fixes
all around: objtool getting confused about LKDTM+KCFI, preparing for
future refactors (constification of sysctl tables, additional
__counted_by annotations), a Clang UBSAN+i386 crash fix, and adding
more options in the hardening.config Kconfig fragment.
Summary:
- selftests: Add str*cmp tests (Ivan Orlov)
- __counted_by: provide UAPI for _le/_be variants (Erick Archer)
- Various strncpy deprecation refactors (Justin Stitt)
- stackleak: Use a copy of soon-to-be-const sysctl table (Thomas
Weißschuh)
- UBSAN: Work around i386 -regparm=3 bug with Clang prior to
version 19
- Provide helper to deal with non-NUL-terminated string copying
- SCSI: Fix older string copying bugs (with new helper)
- selftests: Consolidate string helper behavioral tests
- selftests: add memcpy() fortify tests
- string: Add additional __realloc_size() annotations for "dup"
helpers
- LKDTM: Fix KCFI+rodata+objtool confusion
- hardening.config: Enable KCFI"
* tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits)
uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}
stackleak: Use a copy of the ctl_table argument
string: Add additional __realloc_size() annotations for "dup" helpers
kunit/fortify: Fix replaced failure path to unbreak __alloc_size
hardening: Enable KCFI and some other options
lkdtm: Disable CFI checking for perms functions
kunit/fortify: Add memcpy() tests
kunit/fortify: Do not spam logs with fortify WARNs
kunit/fortify: Rename tests to use recommended conventions
init: replace deprecated strncpy with strscpy_pad
kunit/fortify: Fix mismatched kvalloc()/vfree() usage
scsi: qla2xxx: Avoid possible run-time warning with long model_num
scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings
scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings
fs: ecryptfs: replace deprecated strncpy with strscpy
hfsplus: refactor copy_name to not use strncpy
reiserfs: replace deprecated strncpy with scnprintf
virt: acrn: replace deprecated strncpy with strscpy
ubsan: Avoid i386 UBSAN handler crashes with Clang
ubsan: Remove 1-element array usage in debug reporting
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZA578ACgkQ1V2XiooU
IOS18g//Zyuv+23GcUM7+FEXrlMN658xJyWiYvKjOaZZx5ZiV0QdZc4cbfPFD44p
qZBMmVC/WVoC89SLdwH1W47KoJU0xsK3/OdqGHHeNJ69111wIQMpOLfAetS2K0mb
F+Ue2vyWg1GQDICGsCdenHX7ihtVvnJJkomxc+3ObxtLCNsb2Dsr6JM5hMVP5Bil
4UZnPsrgfWy3A8O92burlPVE1sTWDFfFUGIf8geJc4QadwkgufkzxMhXNO7xHlpG
EZ99s8FPyD3R6tRPjf4gwdjr7JjinrdrYjZDuS4d3Uv8pKlUqcx8PgXG51/unr/y
qlynLXtEc1QU6SO2jENosHAG2/LQG2zsYEiiLFCP+a1JOtOxevZQKx8MyAFW8xDX
+RQhcBpTBocIyJ/tCDoM9lp69iYTR196Ct48v6pSGMNhZcddT4K4BkUL47GEs8T3
IA5x8h5gV2Q9ECMgqSaycdUsfLNgE/6fWx0ROs/wo3tMsgWrXCSJi8RFtN1sNbIO
rfuNnQiETIFBkQxBi7um8jadxdfIHm65cjgZBCyVbNNml3JwjYvXLxCXt2G7LxC4
Sg4nZvIbqWIifoMc1aQKypvFZjzzsWtFmYCuEUVLrnpj2SFTyh5CNzNo3MlHf7LG
sRb/XubdY6e0spLzd5VDjwH5qOT3poWAccatRr5BVUarxXCD5Vs=
=RD9p
-----END PGP SIGNATURE-----
Merge tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
Patch #1 skips transaction if object type provides no .update interface.
Patch #2 skips NETDEV_CHANGENAME which is unused.
Patch #3 enables conntrack to handle Multicast Router Advertisements and
Multicast Router Solicitations from the Multicast Router Discovery
protocol (RFC4286) as untracked opposed to invalid packets.
From Linus Luessing.
Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
dropping them, from Jason Xing.
Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
also from Jason.
Patch #6 removes reference in netfilter's sysctl documentation on pickup
entries which were already removed by Florian Westphal.
Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
allows to evict entries from the conntrack table,
also from Florian.
Patches #8 to #16 updates nf_tables pipapo set backend to allocate
the datastructure copy on-demand from preparation phase,
to better deal with OOM situations where .commit step is too late
to fail. Series from Florian Westphal.
Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
transitions, also from Florian.
Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
quick atomic reserves exhaustion with large sets, reporter refers
to million entries magnitude.
* tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: allow clone callbacks to sleep
selftests: netfilter: add packetdrill based conntrack tests
netfilter: nft_set_pipapo: remove dirty flag
netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
netfilter: nft_set_pipapo: merge deactivate helper into caller
netfilter: nft_set_pipapo: prepare walk function for on-demand clone
netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
netfilter: nft_set_pipapo: move prove_locking helper around
netfilter: conntrack: remove flowtable early-drop test
netfilter: conntrack: documentation: remove reference to non-existent sysctl
netfilter: use NF_DROP instead of -NF_DROP
netfilter: conntrack: dccp: try not to drop skb in conntrack
netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
netfilter: nf_tables: skip transaction if update object is not implemented
====================
Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmY/YdYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnmVEADBq8QT9Oa3HTIONHwxjmGMOalr7PSrBP89
S6Inv/l+3xDlyolyLh1HIXUC84iS9Ihi2pNC3dZct4fNcpA99H0CFaHDGwZ5rVri
MrFaubZAps1qSzeypqEq3zWGKVUoaYWaOKhuOjye5Ei2tKymbguhDKl1WiKibD21
E9qOYbhSUFdub/xtx9Rv4BS05QW5bHZ2Y/tTFqB8MY4JUsdb9g/deVZkyGUQYRSd
40mDallRldjQQTQ8iU4H6/ORdGIN/90aLPbmzMdFtQcymnmRyid3rOEwhwWYe4NO
ljnI8m1SJQilZz1d5oHBXBB5QubVptY1JWxbk8GQCSmOU5wrCq+ARCJXUtBXwniJ
K4VFsGm9MkZcc5vsIwIzvsrk8DODla6EVo/jyDy8iFceZcNWfVxdwa5NS67V/6QT
macbF785XDsmA5E4UjslbZqU047w+A5N1yazcZWzMk0coJDeB8AtsA1/C2WZOm8p
HVoiAzsqt81hvPItnjCyZluL/YW+BKeOTnq04QbpQKcJpZBzszO4ZLtuD+IXkE69
8ZZPGFPnPS4ZMQojKkwsBr+Yo65S18oBDkib36mr2lsdnoWTpGq47C7ScUDBbqGm
iI7U8tYMnVVkQQHVVmGI4KOr5/4lxxp8398kqCaxfW3D5BQhbtUOF/OBjBHj1ZSV
9aZx87CyhA==
=DwAV
-----END PGP SIGNATURE-----
Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
- Greatly improve send zerocopy performance, by enabling coalescing of
sent buffers.
MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the
io_uring side did not. In local testing, the crossover point for send
zerocopy being faster is now around 3000 byte packets, and it
performs better than the sync syscall variants as well.
This feature relies on a shared branch with net-next, which was
pulled into both branches.
- Unification of how async preparation is done across opcodes.
Previously, opcodes that required extra memory for async retry would
allocate that as needed, using on-stack state until that was the
case. If async retry was needed, the on-stack state was adjusted
appropriately for a retry and then copied to the allocated memory.
This led to some fragile and ugly code, particularly for read/write
handling, and made storage retries more difficult than they needed to
be. Allocate the memory upfront, as it's cheap from our pools, and
use that state consistently both initially and also from the retry
side.
- Move away from using remap_pfn_range() for mapping the rings.
This is really not the right interface to use and can cause lifetime
issues or leaks. Additionally, it means the ring sq/cq arrays need to
be physically contigious, which can cause problems in production with
larger rings when services are restarted, as memory can be very
fragmented at that point.
Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply
the same treatment to mapped ring provided buffers. This also helps
unify the code we have dealing with allocating and mapping memory.
Hard to see in the diffstat as we're adding a few features as well,
but this kills about ~400 lines of code from the codebase as well.
- Add support for bundles for send/recv.
When used with provided buffers, bundles support sending or receiving
more than one buffer at the time, improving the efficiency by only
needing to call into the networking stack once for multiple sends or
receives.
- Tweaks for our accept operations, supporting both a DONTWAIT flag for
skipping poll arm and retry if we can, and a POLLFIRST flag that the
application can use to skip the initial accept attempt and rely
purely on poll for triggering the operation. Both of these have
identical flags on the receive side already.
- Make the task_work ctx locking unconditional.
We had various code paths here that would do a mix of lock/trylock
and set the task_work state to whether or not it was locked. All of
that goes away, we lock it unconditionally and get rid of the state
flag indicating whether it's locked or not.
The state struct still exists as an empty type, can go away in the
future.
- Add support for specifying NOP completion values, allowing it to be
used for error handling testing.
- Use set/test bit for io-wq worker flags. Not strictly needed, but
also doesn't hurt and helps silence a KCSAN warning.
- Cleanups for io-wq locking and work assignments, closing a tiny race
where cancelations would not be able to find the work item reliably.
- Misc fixes, cleanups, and improvements
* tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits)
io_uring: support to inject result for NOP
io_uring: fail NOP if non-zero op flags is passed in
io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
io_uring/net: add IORING_ACCEPT_DONTWAIT flag
io_uring/filetable: don't unnecessarily clear/reset bitmap
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
io_uring: Require zeroed sqe->len on provided-buffers send
io_uring/notif: disable LAZY_WAKE for linked notifs
io_uring/net: fix sendzc lazy wake polling
io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
io_uring/rw: reinstate thread check for retries
io_uring/notif: implement notification stacking
io_uring/notif: simplify io_notif_flush()
net: add callback for setting a ubuf_info to skb
net: extend ubuf_info callback to ops structure
io_uring/net: support bundles for recv
io_uring/net: support bundles for send
io_uring/kbuf: add helpers for getting/peeking multiple buffers
io_uring/net: add provided buffer support for IORING_OP_SEND
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZj3HuwAKCRCRxhvAZXjc
orYvAQCZOr68uJaEaXAArYTdnMdQ6HIzG+FVlwrqtrhz0BV07wEAqgmtSR9XKh+L
0+DNepg4R8PZOHH371eSSsLNRCUCkAs=
=SVsU
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual fses.
Features:
- Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That
means we now have six free FMODE_* bits in total (but bit #6
already got used for FMODE_WRITE_RESTRICTED)
- Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup)
- Add fd_raw cleanup class so we can make use of automatic cleanup
provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well
- Optimize seq_puts()
- Simplify __seq_puts()
- Add new anon_inode_getfile_fmode() api to allow specifying f_mode
instead of open-coding it in multiple places
- Annotate struct file_handle with __counted_by() and use
struct_size()
- Warn in get_file() whether f_count resurrection from zero is
attempted (epoll/drm discussion)
- Folio-sophize aio
- Export the subvolume id in statx() for both btrfs and bcachefs
- Relax linkat(AT_EMPTY_PATH) requirements
- Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors
for dup*() equality replacing kcmp()
Cleanups:
- Compile out swapfile inode checks when swap isn't enabled
- Use (1 << n) notation for FMODE_* bitshifts for clarity
- Remove redundant variable assignment in fs/direct-io
- Cleanup uses of strncpy in orangefs
- Speed up and cleanup writeback
- Move fsparam_string_empty() helper into header since it's currently
open-coded in multiple places
- Add kernel-doc comments to proc_create_net_data_write()
- Don't needlessly read dentry->d_flags twice
Fixes:
- Fix out-of-range warning in nilfs2
- Fix ecryptfs overflow due to wrong encryption packet size
calculation
- Fix overly long line in xfs file_operations (follow-up to FMODE_*
cleanup)
- Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up
to FMODE_* cleanup)
- Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_*
cleanup)
- Fix stable offset api to prevent endless loops
- Fix afs file server rotations
- Prevent xattr node from overflowing the eraseblock in jffs2
- Move fdinfo PTRACE_MODE_READ procfs check into the .permission()
operation instead of .open() operation since this caused userspace
regressions"
* tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
afs: Fix fileserver rotation getting stuck
selftests: add F_DUPDFD_QUERY selftests
fcntl: add F_DUPFD_QUERY fcntl()
file: add fd_raw cleanup class
fs: WARN when f_count resurrection is attempted
seq_file: Simplify __seq_puts()
seq_file: Optimize seq_puts()
proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
fs: Create anon_inode_getfile_fmode()
xfs: don't call xfs_file_open from xfs_dir_open
xfs: drop fop_flags for directories
xfs: fix overly long line in the file_operations
shmem: Fix shmem_rename2()
libfs: Add simple_offset_rename() API
libfs: Fix simple_offset_rename_exchange()
jffs2: prevent xattr node from overflowing the eraseblock
vfs, swap: compile out IS_SWAPFILE() on swapless configs
vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
fs/direct-io: remove redundant assignment to variable retval
fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading
...
As usual, these are updates for drivers that are specific to certain
SoCs or firmware running on them. Notable updates include
- The new STMicroelectronics STM32 "firewall" bus driver that is
used to provide a barrier between different parts of an SoC
- Lots of updates for the Qualcomm platform drivers, in particular
SCM, which gets a rewrite of its initialization code
- Firmware driver updates for Arm FF-A notification interrupts
and indirect messaging, SCMI firmware support for pin control
and vendor specific interfaces, and TEE firmware interface
changes across multiple TEE drivers
- A larger cleanup of the Mediatek CMDQ driver and some related bits
- Kconfig changes for riscv drivers to prepare for adding Kanaan
k230 support
- Multiple minor updates for the TI sysc bus driver, memory controllers,
hisilicon hccs and more
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmY+dbEACgkQYKtH/8kJ
UifGTBAA3lh2qw++S5i6nk71388/nswb5fZKwqPKl1m+44SndE7r0/nauGm7IZhd
oM5xiBZzsoYCKuesSuejkBNgPmUPtUhyHBJKSKjwrcak4k1mrjDgXxfSxCqGptVZ
Ps683koJ/Ic7O/LQNxlVzUlssG/3gmhJELfpaVIB7rG8pmdgF9ocM73+iJrRwW1Q
fTFXUXeCcXJ2N5Yki7z2+4oB3RebPzTBz4NeIYNdGQj5/u61oG0KzXwvk8eqWhNb
0KJYsfAQZGzdyAys6XU1MHv4T4L2a3DQL6NMgLnovVEMhP2Hk0XlBmI7X+uAXYiM
2z289d9Wx3HMoiekulDJ+rpDUPxPXrEqaRkfWZ8G+HSY4KcIeSP7YGmhylr0kdvw
+Qo6orxZ9lkSPaT1aUkNIIywDzet/E2hY8zV1EcLBu9GWjkybAvT/Uy2lSSN+LLH
yEQyDf+s90N6QuZwdXN8a3QliP39tHqlye8wou6UQG8aZ7z870fKAKlvA6DjTfPM
JyhY1rXYH/bvC87sVTi5Qb09+2R6ftvk5xijiMOyXugPpO/6PQKULVataeUnzwgs
YTgOPhaqXVadDR/nkrG3FzEtvpYeTspwGpDiEpDrNHf5H1tFg6VfPNS8y0QOlSPY
JcmylQNCtwxCRLTw2NHOb3tLcY4ruDHNmrWf5INTzf6cJe49jaU=
=4rf0
-----END PGP SIGNATURE-----
Merge tag 'soc-drivers-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
"As usual, these are updates for drivers that are specific to certain
SoCs or firmware running on them.
Notable updates include
- The new STMicroelectronics STM32 "firewall" bus driver that is used
to provide a barrier between different parts of an SoC
- Lots of updates for the Qualcomm platform drivers, in particular
SCM, which gets a rewrite of its initialization code
- Firmware driver updates for Arm FF-A notification interrupts and
indirect messaging, SCMI firmware support for pin control and
vendor specific interfaces, and TEE firmware interface changes
across multiple TEE drivers
- A larger cleanup of the Mediatek CMDQ driver and some related bits
- Kconfig changes for riscv drivers to prepare for adding Kanaan k230
support
- Multiple minor updates for the TI sysc bus driver, memory
controllers, hisilicon hccs and more"
* tag 'soc-drivers-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (103 commits)
firmware: qcom: uefisecapp: Allow on sc8180x Primus and Flex 5G
soc: qcom: pmic_glink: Make client-lock non-sleeping
dt-bindings: soc: qcom,wcnss: fix bluetooth address example
soc/tegra: pmc: Add EQOS wake event for Tegra194 and Tegra234
bus: stm32_firewall: fix off by one in stm32_firewall_get_firewall()
bus: etzpc: introduce ETZPC firewall controller driver
firmware: arm_ffa: Avoid queuing work when running on the worker queue
bus: ti-sysc: Drop legacy idle quirk handling
bus: ti-sysc: Drop legacy quirk handling for smartreflex
bus: ti-sysc: Drop legacy quirk handling for uarts
bus: ti-sysc: Add a description and copyrights
bus: ti-sysc: Move check for no-reset-on-init
soc: hisilicon: kunpeng_hccs: replace MAILBOX dependency with PCC
soc: hisilicon: kunpeng_hccs: Add the check for obtaining complete port attribute
firmware: arm_ffa: Fix memory corruption in ffa_msg_send2()
bus: rifsc: introduce RIFSC firewall controller driver
of: property: fw_devlink: Add support for "access-controller"
soc: mediatek: mtk-socinfo: Correct the marketing name for MT8188GV
soc: mediatek: mtk-socinfo: Add entry for MT8395AV/ZA Genio 1200
soc: mediatek: mtk-mutex: Add support for MT8188 VPPSYS
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFEKqOX9jqZmzwkCc1GSBvS7ZkBkFAmY5ajUACgkQ1GSBvS7Z
kBmR5w//U8Cr5QqBzvczVZNmEtgNm4Zsy9DR2QnV7Wzp1hH/4DL7oDzJX319sw0i
jNUu5QnGhbq5T3hBF/GIbugSQSVhTeS2oRk/Qxc/k8X7cNchlk+iB9U1YF8Lppj8
Y7pB6pT20SFO/8nhQguYQC8F+W0TKnpwSNcykHHIF1THe1MO8PfPbkdcvty28Mvs
xB/IiYee5nm88Chx/+PAuJWFSaUK/AmS5jlcBRKpffzPu/HtQ6cLtC8eRrZlYG0v
plVkExIlsRYhCudBp8ihhLQx2Nc3PpbRPDnE8AKyQCOILXM29/Lk85U4583Lbg9t
bMQuVEm1lw0HPH2iDMRwkkdsl09xPZX7XpnAv5zhEXnl9kDaEwLjV3eXaxdMUmK0
wHd05OMnPiOeyTyBvAjVdjNythfgg8fdy40K8E7G2BXw0G9Yv+/klyT2LM8bhmYp
ec2tI9FyxzCtY/Dz89Of2xQCrFLSXCHVjQAwmAYheSw247JfKu2m3fSnC5ABd/cm
B70VBkXESYk78uzK6JtpRCqA3CgppeB7tryOJ60Ac0GczeZUSUirbSJxH5laMF71
aPnhw+PRep+GWaX0ym2dmaIiK6NnkbEHk5B4oHwJFoJzwqJ2ezUAVh0oQssSxpwJ
zU4i8CtFegVWNsyly7N8js96a6T/Sn5Sny0NTCt+nJpbCwLaaD0=
=aPYm
-----END PGP SIGNATURE-----
Merge tag 'gtp-24-05-07' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp
Pablo neira Ayuso says:
====================
gtp pull request 24-05-07
This v3 includes:
- fix for clang uninitialized variable per Jakub.
- address Smatch and Coccinelle reports per Simon
- remove inline in new IPv6 support per Simon
- fix memleaks in netlink control plane per Simon
-o-
The following patchset contains IPv6 GTP driver support for net-next,
this also includes IPv6 over IPv4 and vice-versa:
Patch #1 removes a unnecessary stack variable initialization in the
socket routine.
Patch #2 deals with GTP extension headers. This variable length extension
header to decapsulate packets accordingly. Otherwise, packets are
dropped when these extension headers are present which breaks
interoperation with other non-Linux based GTP implementations.
Patch #3 prepares for IPv6 support by moving IPv4 specific fields in PDP
context objects to a union.
Patch #4 adds IPv6 support while retaining backward compatibility.
Three new attributes allows to declare an IPv6 GTP tunnel
GTPA_FAMILY, GTPA_PEER_ADDR6 and GTPA_MS_ADDR6 as well as
IFLA_GTP_LOCAL6 to declare the IPv6 GTP UDP socket. Up to this
patch, only IPv6 outer in IPv6 inner is supported.
Patch #5 uses IPv6 address /64 prefix for UE/MS in the inner headers.
Unlike IPv4, which provides a 1:1 mapping between UE/MS,
IPv6 tunnel encapsulates traffic for /64 address as specified
by 3GPP TS. Patch has been split from Patch #4 to highlight
this behaviour.
Patch #6 passes up IPv6 link-local traffic, such as IPv6 SLAAC, for
handling to userspace so they are handled as control packets.
Patch #7 prepares to allow for GTP IPv4 over IPv6 and vice-versa by
moving IP specific debugging out of the function to build
IPv4 and IPv6 GTP packets.
Patch #8 generalizes TOS/DSCP handling following similar approach as
in the existing iptunnel infrastructure.
Patch #9 adds a helper function to build an IPv4 GTP packet in the outer
header.
Patch #10 adds a helper function to build an IPv6 GTP packet in the outer
header.
Patch #11 adds support for GTP IPv4-over-IPv6 and vice-versa.
Patch #12 allows to use the same TID/TEID (tunnel identifier) for inner
IPv4 and IPv6 packets for better UE/MS dual stack integration.
This series integrates with the osmocom.org project CI and TTCN-3 test
infrastructure (Oliver Smith) as well as the userspace libgtpnl library.
Thanks to Harald Welte, Oliver Smith and Pau Espin for reviewing and
providing feedback through the osmocom.org redmine platform to make this
happen.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Support to inject result for NOP so that we can inject failure from
userspace. It is very helpful for covering failure handling code in
io_uring core change.
With nop flags, it becomes possible to add more test features on NOP in
future.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240510035031.78874-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Often userspace needs to know whether two file descriptors refer to the
same struct file. For example, systemd uses this to filter out duplicate
file descriptors in it's file descriptor store (cf. [1]) and vulkan uses
it to compare dma-buf fds (cf. [2]).
The only api we provided for this was kcmp() but that's not generally
available or might be disallowed because it is way more powerful (allows
ordering of file pointers, operates on non-current task) etc. So give
userspace a simple way of comparing two file descriptors for sameness
adding a new fcntl() F_DUDFD_QUERY.
Link: a4f0e0da35/src/basic/fd-util.c (L517) [1]
Link: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/render/vulkan/texture.c#L490 [2]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[brauner: commit message]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Similarly to how polling first is supported for receive, it makes sense
to provide the same for accept. An accept operation does a lot of
expensive setup, like allocating an fd, a socket/inode, etc. If no
connection request is already pending, this is wasted and will just be
cleaned up and freed, only to retry via the usual poll trigger.
Add IORING_ACCEPT_POLL_FIRST, which tells accept to only initiate the
accept request if poll says we have something to accept.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This allows the caller to perform a non-blocking attempt, similarly to
how recvmsg has MSG_DONTWAIT. If set, and we get -EAGAIN on a connection
attempt, propagate the result to userspace rather than arm poll and
wait for a retry.
Suggested-by: Norman Maurer <norman_maurer@apple.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
35d92abfba ("net: hns3: fix kernel crash when devlink reload during initialization")
2a1a1a7b5f ("net: hns3: add command queue trace for hns3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit can be considered an addition to commit ca7e324e8a
("compiler_types: add Endianness-dependent __counted_by_{le,be}") [1].
In the commit referenced above the __counted_by_{le,be}() attributes
were defined based on platform's endianness with the goal to that the
structures contain flexible arrays at the end, and the counter for,
can be annotated with these attributes.
So, this commit only provide UAPI macros for UAPI structs that will
gain annotations for __counted_by_{le, be} attributes. And it is the
previous step to be able to use these attributes in UAPI.
Link: https://lore.kernel.org/r/20240327142241.1745989-2-aleksander.lobakin@intel.com
Suggested-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Erick Archer <erick.archer@outlook.com>
Link: https://lore.kernel.org/r/AS8PR02MB72372E45071E8821C07236F78BE42@AS8PR02MB7237.eurprd02.prod.outlook.com
Fixes: ca7e324e8a ("compiler_types: add Endianness-dependent __counted_by_{le,be}")
Signed-off-by: Kees Cook <keescook@chromium.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmY0mEIACgkQrB3Eaf9P
W7fU7g//bQyydwei/4Vo+cNCPp82k8wL/qhDY3IjN10PfJOSNmeCAcSgkuuHTRSx
g/hoxZEVzLrQT5bt+Sb38JxADFiL787GjdGEUy1gzF7CnDKcGT5KnydYNjDqDVGt
nOv9kAGfWIkMKdNqrhHifddPMWd+ZqvpUcFz5olvqIE2mpNgMwy2i3NID9bNAV31
v5AEvNINa1LKOhX9cEka8iPQXwp+I6yTLqyOd4VciOuFr8dPg0FQqFaYR+OtMsV0
kIxdGTVfmRWaNgq/Tsg4z/2rXEwmEjTWzAhNVGu8o8L3JozXOvbjIrDG7Ws6qB3V
XTFl8ueRMk0UCTlY/QAfip5H7IlAo+H0FUBC45FNP1UhHeWisXT4D5rqAEqQTlZR
bddtuueLZyKclFpXRNi+/8vdDrXhhEzeNINkc52Ef33rUTtZJR8bXrEUKzaYCIuF
ldub0PA3+e5wvwIxq5/Chc/+MIaIHnXBMUmbCJSPnMrupBQtO+i6arPQcbtaBAgS
YyVGTRk9YN0UAjSriIuiViLlgUCMsvsWgfSz9rd0PE54MFBrvcLPeCtPxKZ+sTVG
Y2iSZ8d3ThvsMiQVNU8gj3SlTY1oTvuaijDDGjnR0nWkxV9LMJHCPKfIzsbOKLJe
+ee5hKP4TOFygnV58BkqdGK/LavNpouTIbrM43hgmJ0IX9kSt4o=
=QiGZ
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-05-03
1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support.
This was defined by an early version of an IETF draft
that did not make it to a standard.
2) Introduce direction attribute for xfrm states.
xfrm states have a direction, a stsate can be used
either for input or output packet processing.
Add a direction to xfrm states to make it clear
for what a xfrm state is used.
* tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Restrict SA direction attribute to specific netlink message types
xfrm: Add dir validation to "in" data path lookup
xfrm: Add dir validation to "out" data path lookup
xfrm: Add Direction to the SA in or out
udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support
====================
Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add new iflink attributes to configure in-kernel UDP listener socket
address: IFLA_GTP_LOCAL and IFLA_GTP_LOCAL6. If none of these attributes
are specified, default is still to IPv4 INADDR_ANY for backward
compatibility.
Add new attributes to set up family and IPv6 address of GTP tunnels:
GTPA_FAMILY, GTPA_PEER_ADDR6 and GTPA_MS_ADDR6. If no GTPA_FAMILY is
specified, AF_INET is assumed for backward compatibility.
setsockopt IPV6_ADDRFORM allows to downgrade socket from IPv6 to IPv4
after socket is bound. Assumption is that socket listener that is
attached to the gtp device needs to be either IPv4 or IPv6. Therefore,
GTP socket listener does not allow for IPv4-mapped-IPv6 listener.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
So far Multicast Router Advertisements and Multicast Router
Solicitations from the Multicast Router Discovery protocol (RFC4286)
would be marked as INVALID for IPv6, even if they are in fact intact
and adhering to RFC4286.
This broke MRA reception and by that multicast reception on
IPv6 multicast routers in a Proxmox managed setup, where Proxmox
would install a rule like "-m conntrack --ctstate INVALID -j DROP"
at the top of the FORWARD chain with br-nf-call-ip6tables enabled
by default.
Similar to as it's done for MLDv1, MLDv2 and IPv6 Neighbor Discovery
already, fix this issue by excluding MRD from connection tracking
handling as MRD always uses predefined multicast destinations
for its messages, too. This changes the ct-state for ICMPv6 MRD messages
from INVALID to UNTRACKED.
This issue was found and fixed with the help of the mrdisc tool
(https://github.com/troglobit/mrdisc).
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmYzUX4ACgkQrB3Eaf9P
W7e7qQ/+LgkDkL/LyXv3kAPN8b2SapIiIajarlRfgdPYdM6PP+kzGJxC/t5NZ2HE
Q1N6K0hIL042rna1/grkUKHeQn4PXUlfT6y8YgjiuCvpFDVNb2ofyl3AmxjJnH1A
iwMWf6EhwGoxbVs3DbDJ554U8T0nBJeZ+MXLF/4BI13bNdj7stbcKRqj6KHC5sQO
JgtFVX+ip6LLGL7rR4YMv2h2p1sSu3Vp6bMcfM85I4ENec0UIjgsAF9P0buPl4gr
2oKtMxga86CQWcymKo6DI+MsBBk91wvM+5/T9zQtpdxDuNEQNrotCoCc0Kd03xmP
EGzJagwVGFj08kYJ7qICDwpXWCpLDVumoxWFNBWmAW9uNEkUW8Tiqmm8eW2Azs3d
VAUFcyzHr7mkAaqSDDdE4J+L276Z+dS+BHPnoF6Sp+ctuvSmmeS6lyY9mGnFGH7H
OiqFKonjBEC5iNAMIXF3WRKueMDdbbDFwHK4NEiTIUSeAMqETUP2sBC1GNTaN8YJ
soKYtwUtiag2P44ZYy5UYeKJlaBnT1FOZHLs24iCOY1XjqJerwjefQuBO6HDBz/I
vkaSY6ak6uRsAdfst45uQNPfxlJkFDbwRDowFCdhu5qG7bifqnXstQmNta2U1109
4e3vt5jPowN/9bCtMx7Z+ftmmTsapxYCu5ZYRVAq82WahsXFPtE=
=aeD1
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2024-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2024-05-02
1) Fix an error pointer dereference in xfrm_in_fwd_icmp.
From Antony Antony.
2) Preserve vlan tags for ESP transport mode software GRO.
From Paul Davey.
3) Fix a spelling mistake in an uapi xfrm.h comment.
From Anotny Antony.
* tag 'ipsec-2024-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Correct spelling mistake in xfrm.h comment
xfrm: Preserve vlan tags for transport mode software GRO
xfrm: fix possible derferencing in error path
====================
Link: https://lore.kernel.org/r/20240502084838.2269355-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/filter.h
kernel/bpf/core.c
66e13b615a ("bpf: verifier: prevent userspace memory access")
d503a04f8b ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduces validation for the x->dir attribute within the XFRM input
data lookup path. If the configured direction does not match the
expected direction, input, increment the XfrmInStateDirError counter
and drop the packet to ensure data integrity and correct flow handling.
grep -vw 0 /proc/net/xfrm_stat
XfrmInStateDirError 1
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Introduces validation for the x->dir attribute within the XFRM output
data lookup path. If the configured direction does not match the expected
direction, output, increment the XfrmOutStateDirError counter and drop
the packet to ensure data integrity and correct flow handling.
grep -vw 0 /proc/net/xfrm_stat
XfrmOutPolError 1
XfrmOutStateDirError 1
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch introduces the 'dir' attribute, 'in' or 'out', to the
xfrm_state, SA, enhancing usability by delineating the scope of values
based on direction. An input SA will restrict values pertinent to input,
effectively segregating them from output-related values.
And an output SA will restrict attributes for output. This change aims
to streamline the configuration process and improve the overall
consistency of SA attributes during configuration.
This feature sets the groundwork for future patches, including
the upcoming IP-TFS patch.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Adding support to attach bpf program for entry and return probe
of the same function. This is common use case which at the moment
requires to create two kprobe multi links.
Adding new BPF_TRACE_KPROBE_SESSION attach type that instructs
kernel to attach single link program to both entry and exit probe.
It's possible to control execution of the bpf program on return
probe simply by returning zero or non zero from the entry bpf
program execution to execute or not the bpf program on return
probe respectively.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240430112830.1184228-2-jolsa@kernel.org
The virtio-net device stats spec:
42f3899898
We introduce the relative feature and structures.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This introduces a TEE driver for Trusted Services [1].
Trusted Services is a TrustedFirmware.org project that provides a
framework for developing and deploying device Root of Trust services in
FF-A [2] Secure Partitions. The project hosts the reference
implementation of Arm Platform Security Architecture [3] for Arm
A-profile devices.
The FF-A Secure Partitions are accessible through the FF-A driver in
Linux. However, the FF-A driver doesn't have a user space interface so
user space clients currently cannot access Trusted Services. The goal of
this TEE driver is to bridge this gap and make Trusted Services
functionality accessible from user space.
[1] https://www.trustedfirmware.org/projects/trusted-services/
[2] https://developer.arm.com/documentation/den0077/
[3] https://www.arm.com/architecture/security-features/platform-security
-----BEGIN PGP SIGNATURE-----
iQJOBAABCgA4FiEEFV+gSSXZJY9ZyuB5LinzTIcAHJcFAmYp+8YaHGplbnMud2lr
bGFuZGVyQGxpbmFyby5vcmcACgkQLinzTIcAHJd3ixAAsWZwTmxavFD1Qh2bN6dR
XdRfv/4+8CXSN84aETBhlbjjzcUYswl6icyIcpShLcgFF7KvS+H30LRe0SGi9hjl
xePS0hoYteLtIDN0S9u5kaM8EdYeXKYU3hNqQHKxksCh7jlTEkwC0XcOb18OkyZT
niCHZ7+dKTmVQhH+MDvvUmeAljzk6vNwBBIBRscUX+JuG3IjWEux8pZedP3RAAS7
IKNNtcylGB3JJVJ5H11TRF42LeQSohHyKYp1XaMca6lwWi1PoAXKIia1gcsNYLLN
yaJByAiyGi2olk3SjeDx8Q0H35ezgJLwRBXulCXlCKzzc9ZFoVOw+a3QOI0tCJJ2
qnUq2YZwEb4yHryPk0/v1mDCkl9MZZTpL85CYFvS84e/Gk26eZAUHwWAlgPZY6Js
yWmfIAE/qtV48B0vUOxQGZgFAIsgn1TwaiOMWlEAoqlOQb6GosKYkCEHmfMJVMRo
F3DeD/QOTWr4u91dFgbStL5CNwVu9yhFzgYDekPxBJfv+4IzITUmUPdne/FNqz4G
P4RgQXstIl2xn58A/p457WSTmu/5N95/IILw0FNsbEiqNou0Wv+HFEEKEhzCSoOp
gru5xcGrBnx+1IdE2Vw77+XhsKHCUejTrFhWXHQM6zGA1XWr30XwkmgvS1f/xolB
DFsYhlZNRazhL16BsD4tGgo=
=RJca
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmYwAzkACgkQYKtH/8kJ
UicRyg/+JKZAg3KJ3lG2uXGvb9a/kqpenroR7AJu4bLgsj/3KdXAVozxV0vzh7dc
ayQCydMnMnKZDqlFoyDdqJEpFY3HINC6qknFYPF3Juc9R8aOGhMA8gN41vYDUbtS
PXjIRkoaSUvtaZ4MarU6S/jzi3h5FpdZ1VqBoSgHioG58w+aihN/ElauhLc/qJh4
Sz4hAsqBKKS107w0mwWaV0YhLbiGoxVBeQi8xLhO15Iy22jMU0NKMn3kGk0I14S8
maDV64UaVD9qwilp46kbErPXodzzhmwvR53vCEH54CCv4j9GoM80A+UrB7yboZu2
qY6k4LsyIVN6yHV268bHluNeo+XeTQbkEZxBS3SKe0NkhWY51BHL2vW/TFMr4BNj
xl9WLf4IaS6GX0NYCyiXqaEh92USHgdl41eTf9P4vnyPSH3cMryw9FK/f/1F7JGS
SOERoHVuyX0VeqUelWf4/Grux9n+votM49QCrdAICMbRcgiIIv/ITzKiSzYMKC0J
pS4F9426ZymPkq9CVYBy8yl2atNUjiBZ+tW55eRDAfNKJf/4N4b7ncxVAPoPf2J4
LhvEnKMAs4GtaF6G+0d2ltxhQdsl3uK9i2XEy+2WY4HceH2FEYctzG44SsIEzlVQ
D/FyyKpa5aPyybFsdYGXyfyVv5GBFPowPeFXK9k9GH9ODc33xAE=
=VM1o
-----END PGP SIGNATURE-----
Merge tag 'tee-ts-for-v6.10' of https://git.linaro.org/people/jens.wiklander/linux-tee into soc/drivers
TEE driver for Trusted Services
This introduces a TEE driver for Trusted Services [1].
Trusted Services is a TrustedFirmware.org project that provides a
framework for developing and deploying device Root of Trust services in
FF-A [2] Secure Partitions. The project hosts the reference
implementation of Arm Platform Security Architecture [3] for Arm
A-profile devices.
The FF-A Secure Partitions are accessible through the FF-A driver in
Linux. However, the FF-A driver doesn't have a user space interface so
user space clients currently cannot access Trusted Services. The goal of
this TEE driver is to bridge this gap and make Trusted Services
functionality accessible from user space.
[1] https://www.trustedfirmware.org/projects/trusted-services/
[2] https://developer.arm.com/documentation/den0077/
[3] https://www.arm.com/architecture/security-features/platform-security
* tag 'tee-ts-for-v6.10' of https://git.linaro.org/people/jens.wiklander/linux-tee:
MAINTAINERS: tee: tstee: Add entry
Documentation: tee: Add TS-TEE driver
tee: tstee: Add Trusted Services TEE driver
tee: optee: Move pool_op helper functions
tee: Refactor TEE subsystem header files
Link: https://lore.kernel.org/r/20240425073119.GA3261080@rayden
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZi9+AAAKCRDbK58LschI
g0nEAP487m7L0nLVriC2oIOWsi29tklW3etm6DO7gmGRGIHgrgEAnMyV1xBj3bGj
v6jJwDcybCym1hLx+1x1JCZ4eoAFswE=
=xbna
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-04-29
We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).
The main changes are:
1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86 BPF JIT. This allows
inlining per-CPU array and hashmap lookups
and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.
2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.
3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction,
from Alexei Starovoitov.
4) Add support for passing mark with bpf_fib_lookup helper,
from Anton Protopopov.
5) Add a new bpf_wq API for deferring events and refactor sleepable
bpf_timer code to keep common code where possible,
from Benjamin Tissoires.
6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
to check when NULL is passed for non-NULLable parameters,
from Eduard Zingerman.
7) Harden the BPF verifier's and/or/xor value tracking,
from Harishankar Vishwanathan.
8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
crypto subsystem, from Vadim Fedorenko.
9) Various improvements to the BPF instruction set standardization doc,
from Dave Thaler.
10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
from Andrea Righi.
11) Bigger batch of BPF selftests refactoring to use common network helpers
and to drop duplicate code, from Geliang Tang.
12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
from Jose E. Marchesi.
13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled,
from Kumar Kartikeya Dwivedi.
14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
from David Vernet.
15) Extend the BPF verifier to allow different input maps for a given
bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.
16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.
17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
the stack memory, from Martin KaFai Lau.
18) Improve xsk selftest coverage with new tests on maximum and minimum
hardware ring size configurations, from Tushar Vyavahare.
19) Various ReST man pages fixes as well as documentation and bash completion
improvements for bpftool, from Rameez Rehman & Quentin Monnet.
20) Fix libbpf with regards to dumping subsequent char arrays,
from Quentin Deslandes.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
bpf, docs: Clarify PC use in instruction-set.rst
bpf_helpers.h: Define bpf_tail_call_static when building with GCC
bpf, docs: Add introduction for use in the ISA Internet Draft
selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
bpf: check bpf_dummy_struct_ops program params for test runs
selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
selftests/bpf: adjust dummy_st_ops_success to detect additional error
bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
selftests/bpf: Add ring_buffer__consume_n test.
bpf: Add bpf_guard_preempt() convenience macro
selftests: bpf: crypto: add benchmark for crypto functions
selftests: bpf: crypto skcipher algo selftests
bpf: crypto: add skcipher to bpf crypto
bpf: make common crypto API for TC/XDP programs
bpf: update the comment for BTF_FIELDS_MAX
selftests/bpf: Fix wq test.
selftests/bpf: Use make_sockaddr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
...
====================
Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A spelling error was found in the comment section of
include/uapi/linux/xfrm.h. Since this header file is copied to many
userspace programs and undergoes Debian spellcheck, it's preferable to
fix it in upstream rather than downstream having exceptions.
This commit fixes the spelling mistake.
Fixes: df71837d50 ("[LSM-IPSec]: Security association restriction.")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Introduce RedBox support (HSR-SAN to be more precise) for HSR networks.
Following traffic reduction optimizations have been implemented:
- Do not send HSR supervisory frames to Port C (interlink)
- Do not forward to HSR ring frames addressed to Port C
- Do not forward to Port C frames from HSR ring
- Do not send duplicate HSR frame to HSR ring when destination is Port C
The corresponding patch to modify iptable2 sources has already been sent:
https://lore.kernel.org/netdev/20240308145729.490863-1-lukma@denx.de/T/
Testing procedure (veth and netns):
-----------------------------------
One shall run:
linux-vanila/tools/testing/selftests/net/hsr/hsr_redbox.sh
(Detailed description of the setup one can find in the test
script file).
Testing procedure (real hardware):
----------------------------------
The EVB-KSZ9477 has been used for testing on net-next branch
(SHA1: 5fc68320c1).
Ports 4/5 were used for SW managed HSR (hsr1) as first hsr0 for ports 1/2
(with HW offloading for ksz9477) was created. Port 3 has been used as
interlink port (single USB-ETH dongle).
Configuration - RedBox (EVB-KSZ9477):
if link set lan1 down;ip link set lan2 down
ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 supervision 45 version 1
ip link add name hsr1 type hsr slave1 lan4 slave2 lan5 interlink lan3 supervision 45 version 1
ip link set lan4 up;ip link set lan5 up
ip link set lan3 up
ip addr add 192.168.0.11/24 dev hsr1
ip link set hsr1 up
Configuration - DAN-H (EVB-KSZ9477):
ip link set lan1 down;ip link set lan2 down
ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 supervision 45 version 1
ip link add name hsr1 type hsr slave1 lan4 slave2 lan5 supervision 45 version 1
ip link set lan4 up;ip link set lan5 up
ip addr add 192.168.0.12/24 dev hsr1
ip link set hsr1 up
This approach uses only SW based HSR devices (hsr1).
-------------- ----------------- ------------
DAN-H Port5 | <------> | Port5 | |
Port4 | <------> | Port4 Port3 | <---> | PC
| | (RedBox) | | (USB-ETH)
EVB-KSZ9477 | | EVB-KSZ9477 | |
-------------- ----------------- ------------
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Two important arguments in RTT estimation, mrtt and srtt, are passed to
tcp_bpf_rtt(), so that bpf programs get more information about RTT
computation in BPF_SOCK_OPS_RTT_CB.
The difference between bpf_sock_ops->srtt_us and the srtt here is: the
former is an old rtt before update, while srtt passed by tcp_bpf_rtt()
is that after update.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240425161724.73707-2-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Mostly a copy/paste from the bpf_timer API, without the initialization
and free, as they will be done in a separate patch.
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-5-6c986a5a741f@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit cleans up the uapi for vhost_vdpa by
better naming some of the enums which report blk
information to user space, and they are not
in any official releases yet.
Fixes: 1ac61ddfee ("vDPA: report virtio-blk flush info to user space")
Fixes: ae1374b7f7 ("vDPA: report virtio-block read-only info to user space")
Fixes: 330b8aea69 ("vDPA: report virtio-block max segment size to user space")
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240415111047.1047774-1-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If IORING_OP_RECV is used with provided buffers, the caller may also set
IORING_RECVSEND_BUNDLE to turn it into a multi-buffer recv. This grabs
buffers available and receives into them, posting a single completion for
all of it.
This can be used with multishot receive as well, or without it.
Now that both send and receive support bundles, add a feature flag for
it as well. If IORING_FEAT_RECVSEND_BUNDLE is set after registering the
ring, then the kernel supports bundles for recv and send.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If IORING_OP_SEND is used with provided buffers, the caller may also
set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer send. The idea
is that an application can fill outgoing buffers in a provided buffer
group, and then arm a single send that will service them all. Once
there are no more buffers to send, or if the requested length has
been sent, the request posts a single completion for all the buffers.
This only enables it for IORING_OP_SEND, IORING_OP_SENDMSG is coming
in a separate patch. However, this patch does do a lot of the prep
work that makes wiring up the sendmsg variant pretty trivial. They
share the prep side.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Introduce an enumeration to define PSE types (C33 or PoDL),
utilizing a bitfield for potential future support of both types.
Include 'pse_get_types' helper for external access to PSE type info.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-2-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the current PSE interface for Ethernet Power Equipment, support is
limited to PoDL. This patch extends the interface to accommodate the
objects specified in IEEE 802.3-2022 145.2 for Power sourcing
Equipment (PSE).
The following objects are now supported and considered mandatory:
- IEEE 802.3-2022 30.9.1.1.5 aPSEPowerDetectionStatus
- IEEE 802.3-2022 30.9.1.1.2 aPSEAdminState
- IEEE 802.3-2022 30.9.1.2.1 aPSEAdminControl
To avoid confusion between "PoDL PSE" and "PoE PSE", which have similar
names but distinct values, we have followed the suggestion of Oleksij
Rempel and Andrew Lunn to maintain separate naming schemes for each,
using c33 (clause 33) prefix for "PoE PSE".
You can find more details in the discussion threads here:
https://lore.kernel.org/netdev/20230912110637.GI780075@pengutronix.de/https://lore.kernel.org/netdev/2539b109-72ad-470a-9dae-9f53de4f64ec@lunn.ch/
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-1-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The UDP_ENCAP_ESPINUDP_NON_IKE mode, introduced into the Linux kernel
in 2004 [2], has remained inactive and obsolete for an extended period.
This mode was originally defined in an early version of an IETF draft
[1] from 2001. By the time it was integrated into the kernel in 2004 [2],
it had already been replaced by UDP_ENCAP_ESPINUDP [3] in later
versions of draft-ietf-ipsec-udp-encaps, particularly in version 06.
Over time, UDP_ENCAP_ESPINUDP_NON_IKE has lost its relevance, with no
known use cases.
With this commit, we remove support for UDP_ENCAP_ESPINUDP_NON_IKE,
simplifying the codebase and eliminating unnecessary complexity.
Kernel will return an error -ENOPROTOOPT if the userspace tries to set
this option.
References:
[1] https://datatracker.ietf.org/doc/html/draft-ietf-ipsec-udp-encaps-00.txt
[2] Commit that added UDP_ENCAP_ESPINUDP_NON_IKE to the Linux historic
repository.
Author: Andreas Gruenbacher <agruen@suse.de>
Date: Fri Apr 9 01:47:47 2004 -0700
[IPSEC]: Support draft-ietf-ipsec-udp-encaps-00/01, some ipec impls need it.
[3] Commit that added UDP_ENCAP_ESPINUDP to the Linux historic
repository.
Author: Derek Atkins <derek@ihtfp.com>
Date: Wed Apr 2 13:21:02 2003 -0800
[IPSEC]: Implement UDP Encapsulation framework.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
While valid C, anonymous enums confuse Cython (Python to C translator),
as reported by Ritesh (YoSTEALTH) [1] . Since people rely on it when
building against liburing and we want to keep this header in sync with
the library version, let's name the existing enums in the uapi header.
[1] https://github.com/cython/cython/issues/3240
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240328210935.25640-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds "last time" fields last_data_sent, last_data_recv and
last_ack_recv in struct mptcp_sock to record the last time data_sent,
data_recv and ack_recv happened. They all are initialized as
tcp_jiffies32 in __mptcp_init_sock(), and updated as tcp_jiffies32 too
when data is sent in __subflow_push_pending(), data is received in
__mptcp_move_skbs_from_subflow(), and ack is received in ack_update_msk().
Similar to tcpi_last_data_sent, tcpi_last_data_recv and tcpi_last_ack_recv
exposed with TCP, this patch exposes the last time "an action happened" for
MPTCP in mptcp_info, named mptcpi_last_data_sent, mptcpi_last_data_recv and
mptcpi_last_ack_recv, calculated in mptcp_diag_fill_info() as the time
deltas between now and the newly added last time fields in mptcp_sock.
Since msk->last_ack_recv needs to be protected by mptcp_data_lock/unlock,
and lock_sock_fast can sleep and be quite slow, move the entire
mptcp_data_lock/unlock block after the lock/unlock_sock_fast block.
Then mptcpi_last_data_sent and mptcpi_last_data_recv are set in
lock/unlock_sock_fast block, while mptcpi_last_ack_recv is set in
mptcp_data_lock/unlock block, which is protected by a spinlock and
should not block for too long.
Also add three reserved bytes in struct mptcp_info not to have holes in
this structure exposed to userspace.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/446
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240410-upstream-net-next-20240405-mptcp-last-time-info-v2-1-f95bd6b33e51@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043527.3737160-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
nla_put_uint can either write a u32 or u64 netlink attribute value. The
size depends on whether the value can be represented with a u32 or requires
a u64. Use a uint annotation in various documentation to represent this.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20240409232520.237613-2-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Many devices send event notifications for the IO queues,
such as tx and rx queues, through event queues.
Enable a privileged owner, such as a hypervisor PF, to set the number
of IO event queues for the VF and SF during the provisioning stage.
example:
Get maximum IO event queues of the VF device::
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
function:
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10
Set maximum IO event queues of the VF device::
$ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
function:
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VDPA_GET_VRING_SIZE by mistake uses the already occupied
ioctl # 0x80 and we never noticed - it happens to work
because the direction and size are different, but confuses
tools such as perf which like to look at just the number,
and breaks the extra robustness of the ioctl numbering macros.
To fix, sort the entries and renumber the ioctl - not too late
since it wasn't in any released kernels yet.
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Reported-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 1496c47065 ("vhost-vdpa: uapi to support reporting per vq size")
Cc: "Zhu Lingshan" <lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <41c1c5489688abe5bfef9f7cf15584e3fb872ac5.1712092759.git.mst@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the relevant phydev when the command targets a PHY.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>