mirror-linux/net
Jakub Kicinski f22cc6f766 net: ethtool: support including Flow Label in the flow hash for RSS
Some modern NICs support including the IPv6 Flow Label in
the flow hash for RSS queue selection. This is outside
the old "Microsoft spec", but was included in the OCP NIC spec:

  [ ] RSS include flow label in the hash (configurable)

https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling

RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
to recover from receiver congestion / overload.
Rx CPU/queue hotspots are relatively common for data ingest
workloads, and so far we had to try to detect the condition
at the RPC layer and reopen the connection. PLB lets us change
the Flow Label and therefore Rx CPU on RTO, with minimal packet
reordering. PLB reaction times are much faster, and can happen
at any point in the connection, not just at RPC boundaries.

Due to the nature of host processing (relatively long queues,
other kernel subsystems masking IRQs for 100s of msecs)
the risk of reordering within the host is higher than in
the network. But for applications which need it - it is far
preferable to potentially persistent overload of subset of
queues.

It is expected that the hash communicated to the host
may change if the Flow Label changes. This may be surprising
to some host software, but I don't expect the devices
can compute two Toeplitz hashes, one with the Flow Label
for queue selection and one without for the rx hash
communicated to the host. Besides, changing the hash
may potentially help to change the path thru host queues.
User can disable NETIF_F_RXHASH if they require a stable
flow hash.

The name RXH_IP6_FL was chosen based on what we call
Flow Label variables in IPv6 processing (fl). I prefer
fl_lbl but that appears to be an fbnic-only spelling.
We could spell out RXH_IP6_FLOW_LABEL but existing
RXH_ defines are a lot more terse.

Willem notes [1] that Flow Label is defined as identifying the flow
and therefore including both the flow label _and_ the L4 header
fields is not generally necessary. But it should not hurt so
it's not explicitly prevented if the driver supports hashing
on both at the same time.

Link: https://lore.kernel.org/68483433b45e2_3cd66f29440@willemb.c.googlers.com.notmuch [1]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250811234212.580748-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-14 11:40:13 +02:00
..
6lowpan net: replace ND_PRINTK with dynamic debug 2025-07-10 15:27:32 -07:00
9p netfs: Fix the request's work item to not require a ref 2025-05-21 14:35:20 +02:00
802 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
8021q net: s/dev_close_many/netif_close_many/ 2025-07-18 17:27:47 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
atm atm: clip: Fix NULL pointer dereference in vcc_sendmsg() 2025-07-09 19:09:36 -07:00
ax25 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
batman-adv This cleanup patchset includes the following patches: 2025-07-11 17:50:27 -07:00
bluetooth Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
bpf bpf: Add attach_type field to bpf_link 2025-07-11 10:51:55 -07:00
bridge Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
caif caif: Replace memset(0) + strscpy() with strscpy_pad() 2025-08-12 14:08:56 -07:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-12 10:09:10 -07:00
ceph libceph: Rename hmac_sha256() to ceph_hmac_sha256() 2025-07-04 10:18:52 -07:00
core Previous releases - regressions: 2025-08-08 07:03:25 +03:00
dcb
devlink devlink: Fix excessive stack usage in rate TC bandwidth parsing 2025-07-23 17:07:35 -07:00
dns_resolver
dsa net: s/dev_close_many/netif_close_many/ 2025-07-18 17:27:47 -07:00
ethernet
ethtool net: ethtool: support including Flow Label in the flow hash for RSS 2025-08-14 11:40:13 +02:00
handshake net/handshake: Add new parameter 'HANDSHAKE_A_ACCEPT_KEYRING' 2025-07-08 15:31:44 +02:00
hsr treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ieee802154 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ife
ipv4 tcp: cdg: remove redundant __GFP_NOWARN 2025-08-12 14:05:43 -07:00
ipv6 ipv6: reject malicious packets in ipv6_gso_segment() 2025-08-01 14:40:53 -07:00
iucv s390/drivers: Explicitly include <linux/export.h> 2025-06-17 18:18:02 +02:00
kcm kcm: Fix splice support 2025-07-30 18:01:26 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
l2tp net: annotate races around sk->sk_uid 2025-06-23 17:04:03 -07:00
l3mdev net: fib_rules: Fix iif / oif matching on L3 master device 2025-04-15 17:54:56 -07:00
lapb treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
llc net: make sk->sk_rcvtimeo lockless 2025-06-23 17:05:12 -07:00
mac80211 wifi: mac80211: fix WARN_ON for monitor mode on some devices 2025-07-23 12:29:07 +02:00
mac802154
mctp net: mctp: Add bind lookup test 2025-07-15 12:08:39 +02:00
mpls net: s/dev_get_flags/netif_get_flags/ 2025-07-18 17:27:47 -07:00
mptcp mptcp: remove pr_fallback() 2025-07-25 11:29:04 -07:00
ncsi net: ncsi: Fix buffer overflow in fetching version id 2025-06-12 18:21:59 -07:00
netfilter bpf: Check netfilter ctx accesses are aligned 2025-08-01 09:22:44 -07:00
netlabel calipso: unlock rcu before returning -EAFNOSUPPORT 2025-06-05 08:03:38 -07:00
netlink netlink: avoid infinite retry looping in netlink_unicast() 2025-07-30 19:16:49 -07:00
netrom treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-19 13:00:24 -07:00
nsh
openvswitch net: openvswitch: allow providing upcall pid for the 'execute' command 2025-07-07 14:30:39 -07:00
packet net/packet: fix a race in packet_set_ring() and packet_notifier() 2025-08-04 17:21:27 -07:00
phonet phonet: add __rcu annotations 2025-08-12 14:15:24 -07:00
psample
qrtr
rds RDS: remove redundant __GFP_NOWARN 2025-08-12 14:05:44 -07:00
rfkill
rose net: track pfmemalloc drops via SKB_DROP_REASON_PFMEMALLOC 2025-07-18 16:59:05 -07:00
rxrpc rxrpc: Fix to use conn aborts for conn-wide failures 2025-07-17 07:50:48 -07:00
sched net/sched: Remove redundant memset(0) call in reset_policy() 2025-08-12 17:13:29 -07:00
sctp net: dst: annotate data-races around dst->obsolete 2025-07-02 14:32:29 -07:00
shaper
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-17 11:00:33 -07:00
strparser net: make sk->sk_rcvtimeo lockless 2025-06-23 17:05:12 -07:00
sunrpc Massage rpc_pipefs to use saner primitives and clean up the 2025-07-28 09:56:09 -07:00
switchdev
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-10 10:10:49 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-17 11:00:33 -07:00
unix Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
vmw_vsock vsock: use sizeof(struct sockaddr_storage) instead of magic value 2025-08-13 17:05:21 -07:00
wireless Another wireless update: 2025-07-24 17:25:42 -07:00
x25 net/x25: Remove unused x25_terminate_link() 2025-07-14 17:19:13 -07:00
xdp net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt 2025-07-10 14:48:29 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
Kconfig net: Kconfig: add endif/endmenu comments 2025-07-22 18:17:23 -07:00
Kconfig.debug
Makefile net: Retire DCCP socket. 2025-04-11 18:58:10 -07:00
compat.c
devres.c
socket.c net: annotate races around sk->sk_uid 2025-06-23 17:04:03 -07:00
sysctl_net.c