mirror-linux/tools/include/uapi/linux
Kuniyuki Iwashima 38163af068 bpf: Introduce SK_BPF_BYPASS_PROT_MEM.
If a socket has sk->sk_bypass_prot_mem flagged, the socket opts out
of the global protocol memory accounting.

This is easily controlled by net.core.bypass_prot_mem sysctl, but it
lacks flexibility.

Let's support flagging (and clearing) sk->sk_bypass_prot_mem via
bpf_setsockopt() at the BPF_CGROUP_INET_SOCK_CREATE hook.

  int val = 1;

  bpf_setsockopt(ctx, SOL_SOCKET, SK_BPF_BYPASS_PROT_MEM,
                 &val, sizeof(val));

As with net.core.bypass_prot_mem, this is inherited to child sockets,
and BPF always takes precedence over sysctl at socket(2) and accept(2).

SK_BPF_BYPASS_PROT_MEM is only supported at BPF_CGROUP_INET_SOCK_CREATE
and not supported on other hooks for some reasons:

  1. UDP charges memory under sk->sk_receive_queue.lock instead
     of lock_sock()

  2. Modifying the flag after skb is charged to sk requires such
     adjustment during bpf_setsockopt() and complicates the logic
     unnecessarily

We can support other hooks later if a real use case justifies that.

Most changes are inline and hard to trace, but a microbenchmark on
__sk_mem_raise_allocated() during neper/tcp_stream showed that more
samples completed faster with sk->sk_bypass_prot_mem == 1.  This will
be more visible under tcp_mem pressure (but it's not a fair comparison).

  # bpftrace -e 'kprobe:__sk_mem_raise_allocated { @start[tid] = nsecs; }
    kretprobe:__sk_mem_raise_allocated /@start[tid]/
    { @end[tid] = nsecs - @start[tid]; @times = hist(@end[tid]); delete(@start[tid]); }'
  # tcp_stream -6 -F 1000 -N -T 256

Without bpf prog:

  [128, 256)          3846 |                                                    |
  [256, 512)       1505326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [512, 1K)        1371006 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     |
  [1K, 2K)          198207 |@@@@@@                                              |
  [2K, 4K)           31199 |@                                                   |

With bpf prog in the next patch:
  (must be attached before tcp_stream)
  # bpftool prog load sk_bypass_prot_mem.bpf.o /sys/fs/bpf/test type cgroup/sock_create
  # bpftool cgroup attach /sys/fs/cgroup/test cgroup_inet_sock_create pinned /sys/fs/bpf/test

  [128, 256)          6413 |                                                    |
  [256, 512)       1868425 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [512, 1K)        1101697 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
  [1K, 2K)          117031 |@@@@                                                |
  [2K, 4K)           11773 |                                                    |

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://patch.msgid.link/20251014235604.3057003-6-kuniyu@google.com
2025-10-16 12:04:47 -07:00
..
tc_act headers: Remove some left-over license text 2022-09-27 07:48:01 -07:00
bits.h uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again (2) 2025-07-08 10:23:13 -04:00
bpf.h bpf: Introduce SK_BPF_BYPASS_PROT_MEM. 2025-10-16 12:04:47 -07:00
bpf_common.h
bpf_perf_event.h
btf.h docs/bpf: Document the semantics of BTF tags with kind_flag 2025-02-05 16:17:59 -08:00
const.h treewide: fix typo 'unsigned __init128' -> 'unsigned __int128' 2025-03-05 12:00:03 -05:00
coredump.h tools: add coredump.h header 2025-06-12 14:00:32 +02:00
elf.h tools/include: Add uapi/linux/elf.h 2025-03-03 20:00:12 +01:00
erspan.h
fadvise.h
fanotify.h selftests/fs/mount-notify: build with tools include dir 2025-05-12 11:40:12 +02:00
filter.h
fs.h tools headers UAPI: sync linux/fs.h with the kernel sources 2025-05-11 17:48:16 -07:00
fscrypt.h tools headers: Update the fs headers with the kernel sources 2025-06-16 14:05:10 -03:00
genetlink.h tools include: Add headers to make tools builds more hermetic 2025-10-02 15:13:19 -03:00
hw_breakpoint.h Move bp_type_idx to include/linux/hw_breakpoint.h 2023-03-10 21:05:16 +01:00
if_addr.h tools include: Add headers to make tools builds more hermetic 2025-10-02 15:13:19 -03:00
if_link.h netkit: Allow for configuring needed_{head,tail}room 2025-01-06 09:48:49 +01:00
if_tun.h
if_xdp.h net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt 2025-07-10 14:48:29 +02:00
in.h tools headers: Update the socket headers with the kernel sources 2025-04-10 09:28:24 -07:00
io_uring.h tools headers: Grab copy of io_uring.h 2023-10-19 16:42:03 -06:00
kcmp.h
kvm.h tools headers: Sync KVM headers with the kernel source 2025-08-18 11:52:22 -07:00
memfd.h selftests/mm: fix additional build errors for selftests 2024-04-25 20:56:42 -07:00
mman.h mm: add MAP_DROPPABLE for designating always lazily freeable mappings 2024-07-19 20:22:12 +02:00
mount.h selftests/fs/statmount: build with tools include dir 2025-05-12 11:40:12 +02:00
neighbour.h tools include: Add headers to make tools builds more hermetic 2025-10-02 15:13:19 -03:00
netdev.h net: define an enum for the napi threaded state 2025-07-24 18:34:55 -07:00
netfilter.h tools include: Add headers to make tools builds more hermetic 2025-10-02 15:13:19 -03:00
netfilter_arp.h tools include: Add headers to make tools builds more hermetic 2025-10-02 15:13:19 -03:00
netlink.h
nsfs.h tools: update nsfs.h uapi header 2025-09-19 14:26:16 +02:00
perf_event.h perf/uapi: Clean up <uapi/linux/perf_event.h> a bit 2025-05-22 11:03:41 +02:00
pkt_cls.h net/sched: Remove uapi support for tcindex classifier 2024-01-02 14:25:51 +00:00
pkt_sched.h net/sched: Remove uapi support for CBQ qdisc 2024-01-02 14:25:51 +00:00
prctl.h Updates for the generic entry code: 2025-07-29 15:14:29 -07:00
rtnetlink.h tools include: Add headers to make tools builds more hermetic 2025-10-02 15:13:19 -03:00
seccomp.h tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older systems 2023-09-13 08:48:48 -03:00
seg6.h
seg6_local.h
stat.h tools headers: Update the fs headers with the kernel sources 2025-06-16 14:05:10 -03:00
stddef.h stddef: make __struct_group() UAPI C++-friendly 2024-12-20 09:05:53 -08:00
tcp.h
tls.h
types.h tools/include: make uapi/linux/types.h usable from assembly 2025-04-06 12:55:31 -07:00
userfaultfd.h selftests/mm: fix additional build errors for selftests 2024-04-25 20:56:42 -07:00