Verify that 8-, 16- and 32-bit load-acquires are zero-extending by using
immediate values with their highest bit set. Do the same for the 64-bit
variant to keep the style consistent.
Acked-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/11097fd515f10308b3941469ee4c86cb8872db3f.1746588351.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, we pass 0x1234567890abcdef to __retval() for the following
two tests:
verifier_load_acquire/load_acquire_64
verifier_store_release/store_release_64
However, the upper 32 bits of that value are being ignored, since
__retval() expects an int. Actually, the tests would still pass even if
I change '__retval(0x1234567890abcdef)' to e.g. '__retval(0x90abcdef)'.
Restructure the tests a bit to test the entire 64-bit values properly.
Do the same to their 8-, 16- and 32-bit variants as well to keep the
style consistent.
Fixes: ff3afe5da9 ("selftests/bpf: Add selftests for load-acquire and store-release instructions")
Acked-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/d67f4c6f6ee0d0388cbce1f4892ec4176ee2d604.1746588351.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Instead of open-coding the conditions, use
'#ifdef CAN_USE_LOAD_ACQ_STORE_REL' to guard the following tests:
verifier_precision/bpf_load_acquire
verifier_precision/bpf_store_release
verifier_store_release/*
Note that, for the first two tests in verifier_precision.c, switching to
'#ifdef CAN_USE_LOAD_ACQ_STORE_REL' means also checking if
'__clang_major__ >= 18', which has already been guaranteed by the outer
'#if' check.
Acked-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/45d7e025f6e390a8ff36f08fc51e31705ac896bd.1746588351.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds the "list_peek" test to use the new
bpf_list_{front,back} kfunc.
The test_{front,back}* tests ensure that the return value
is a non_own_ref node pointer and requires the spinlock to be held.
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # check non_own_ref marking
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch has a much simplified rbtree usage from the
kernel sch_fq qdisc. It has a "struct node_data" which can be
added to two different rbtrees which are ordered by different keys.
The test first populates both rbtrees. Then search for a lookup_key
from the "groot0" rbtree. Once the lookup_key is found, that node
refcount is taken. The node is then removed from another "groot1"
rbtree.
While searching the lookup_key, the test will also try to remove
all rbnodes in the path leading to the lookup_key.
The test_{root,left,right}_spinlock_true tests ensure that the
return value of the bpf_rbtree functions is a non_own_ref node pointer.
This is done by forcing an verifier error by calling a helper
bpf_jiffies64() while holding the spinlock. The tests then
check for the verifier message
"call bpf_rbtree...R0=rcu_ptr_or_null_node..."
The other test_{root,left,right}_spinlock_false tests ensure that
they must be called with spinlock held.
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # Check non_own_ref marking
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The bpf_rbtree_{remove,left,right} requires the root's lock to be held.
They also check the node_internal->owner is still owned by that root
before proceeding, so it is safe to allow refcounted bpf_rb_node
pointer to be used in these kfuncs.
In a bpf fq implementation which is much closer to the kernel fq,
https://lore.kernel.org/bpf/20250418224652.105998-13-martin.lau@linux.dev/,
a networking flow (allocated by bpf_obj_new) can be added to two different
rbtrees. There are cases that the flow is searched from one rbtree,
held the refcount of the flow, and then removed from another rbtree:
struct fq_flow {
struct bpf_rb_node fq_node;
struct bpf_rb_node rate_node;
struct bpf_refcount refcount;
unsigned long sk_long;
};
int bpf_fq_enqueue(...)
{
/* ... */
bpf_spin_lock(&root->lock);
while (can_loop) {
/* ... */
if (!p)
break;
gc_f = bpf_rb_entry(p, struct fq_flow, fq_node);
if (gc_f->sk_long == sk_long) {
f = bpf_refcount_acquire(gc_f);
break;
}
/* ... */
}
bpf_spin_unlock(&root->lock);
if (f) {
bpf_spin_lock(&q->lock);
bpf_rbtree_remove(&q->delayed, &f->rate_node);
bpf_spin_unlock(&q->lock);
}
}
bpf_rbtree_{left,right} do not need this change but are relaxed together
with bpf_rbtree_remove instead of adding extra verifier logic
to exclude these kfuncs.
To avoid bi-sect failure, this patch also changes the selftests together.
The "rbtree_api_remove_unadded_node" is not expecting verifier's error.
The test now expects bpf_rbtree_remove(&groot, &m->node) to return NULL.
The test uses __retval(0) to ensure this NULL return value.
Some of the "only take non-owning..." failure messages are changed also.
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaBVftAAKCRAraS+Z+3y5
EzJ7AP9dtYuBHmU0tB5upuLzQ9sVxaCXs2linfRKK40A5YJDcgD/fBQqPzhxCqZR
moHEqelMLgrAUcyro5egdRJZPcaTfQE=
=6Las
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-05-02
We've added 14 non-merge commits during the last 10 day(s) which contain
a total of 13 files changed, 740 insertions(+), 121 deletions(-).
The main changes are:
1) Avoid skipping or repeating a sk when using a UDP bpf_iter,
from Jordan Rife.
2) Fixed a crash when a bpf qdisc is set in
the net.core.default_qdisc, from Amery Hung.
3) A few other fixes in the bpf qdisc, from Amery Hung.
- Always call qdisc_watchdog_init() in the .init prologue such that
the .reset/.destroy epilogue can always call qdisc_watchdog_cancel()
without issue.
- bpf_qdisc_init_prologue() was incorrectly returning an error
when the bpf qdisc is set as the default_qdisc and the mq is creating
the default_qdisc. It is now fixed.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Cleanup bpf qdisc selftests
selftests/bpf: Test attaching a bpf qdisc with incomplete operators
bpf: net_sched: Make some Qdisc_ops ops mandatory
selftests/bpf: Test setting and creating bpf qdisc as default qdisc
bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc
selftests/bpf: Add tests for bucket resume logic in UDP socket iterators
selftests/bpf: Return socket cookies from sock_iter_batch progs
bpf: udp: Avoid socket skips and repeats during iteration
bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items
bpf: udp: Get rid of st_bucket_done
bpf: udp: Make sure iter->batch always contains a full bucket snapshot
bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch
bpf: net_sched: Fix using bpf qdisc as default qdisc
selftests/bpf: Fix compilation errors
====================
Link: https://patch.msgid.link/20250503010755.4030524-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some cleanups:
- Remove unnecessary kfuncs declaration
- Use _ns in the test name to run tests in a separate net namespace
- Call skeleton __attach() instead of bpf_map__attach_struct_ops() to
simplify tests.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Implement .destroy in bpf_fq and bpf_fifo as it is now mandatory.
Test attaching a bpf qdisc with a missing operator .init. This is not
allowed as bpf qdisc qdisc_watchdog_cancel() could have been called with
an uninitialized timer.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
First, test that bpf qdisc can be set as default qdisc. Then, attach
an mq qdisc to see if bpf qdisc can be successfully created and grafted.
The test is a sequential test as net.core.default_qdisc is global.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Extend the iter_udp_soreuse and iter_tcp_soreuse programs to write the
cookie of the current socket, so that we can track the identity of the
sockets that the iterator has seen so far. Update the existing do_test
function to account for this change to the iterator program output. At
the same time, teach both programs to work with AF_INET as well.
Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Improve xdp_metadata bpf selftest in order to check it is possible for a
XDP dev-bound program to perform XDP_REDIRECT into a DEVMAP but it is still
not allowed to attach a XDP dev-bound program to a DEVMAP entry.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
If the CONFIG_NET_SCH_BPF configuration is not enabled,
the BPF test compilation will report the following error:
In file included from progs/bpf_qdisc_fq.c:39:
progs/bpf_qdisc_common.h:17:51: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility]
17 | void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym;
| ^
progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility]
309 | struct bpf_sk_buff_ptr *to_free)
| ^
progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility]
progs/bpf_qdisc_fq.c:308:5: error: conflicting types for '____bpf_fq_enqueue'
Fixes: 11c701639b ("selftests/bpf: Add a basic fifo qdisc test")
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250428033445.58113-1-yangfeng59949@163.com
Make sure that CAN_USE_BPF_ST test (compute_live_registers/store) is
enabled when __clang_major__ >= 18.
Fixes: 2ea8f6a1cd ("selftests/bpf: test cases for compute_live_registers()")
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250425213712.1542077-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test that modifies the map while it's being iterated in such a way that
hangs the kernel thread unless the _safe fix is applied to
bpf_for_each_hash_elem.
Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
Link: https://lore.kernel.org/r/20250424153246.141677-3-brandon.kammerdiener@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Copy the big-endian field declarations from qspinlock_types.h,
otherwise some properties won't hold on big-endian systems. For
example, assigning lock->val = 1 should result in lock->locked == 1,
which is not the case there.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424165525.154403-4-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Changing bpf_arena_spin_lock.h does not lead to recompiling
arena_spin_lock.c. By convention, all BPF progs depend on all
header files in progs/, so move this header file there. There
are no other users besides arena_spin_lock.c.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424165525.154403-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding verifier test for accessing const void pointer argument in
tracing programs.
The test program loads 1st argument of bpf_fentry_test10 function
which is const void pointer and checks that verifier allows that.
Signed-off-by: KaFai Wan <mannkafai@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250423121329.3163461-3-mannkafai@gmail.com
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaAFFvgAKCRAraS+Z+3y5
E/1OAP9SGmTMgHuHLlF8en+MaYdtwgcHy6uurXgbSQAAV/RwwQEAh2oXZE1D9I7a
EtxsaJYqbbhD09RPwWa2Rd8iJrJYXQk=
=qcoU
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-04-17
We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 18 files changed, 1748 insertions(+), 19 deletions(-).
The main changes are:
1) bpf qdisc support, from Amery Hung.
A qdisc can be implemented in bpf struct_ops programs and
can be used the same as other existing qdiscs in the
"tc qdisc" command.
2) Add xsk tail adjustment tests, from Tushar Vyavahare.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Test attaching bpf qdisc to mq and non root
selftests/bpf: Add a bpf fq qdisc to selftest
selftests/bpf: Add a basic fifo qdisc test
libbpf: Support creating and destroying qdisc
bpf: net_sched: Disable attaching bpf qdisc to non root
bpf: net_sched: Support updating bstats
bpf: net_sched: Add a qdisc watchdog timer
bpf: net_sched: Add basic bpf qdisc kfuncs
bpf: net_sched: Support implementation of Qdisc_ops in bpf
bpf: Prepare to reuse get_ctx_arg_idx
selftests/xsk: Add tail adjustment tests and support check
selftests/xsk: Add packet stream replacement function
====================
Link: https://patch.msgid.link/20250417184338.3152168-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This test implements a more sophisticated qdisc using bpf. The bpf fair-
queueing (fq) qdisc gives each flow an equal chance to transmit data. It
also respects the timestamp of skb for rate limiting.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409214606.2000194-10-ameryhung@gmail.com
This selftest includes a bare minimum fifo qdisc, which simply enqueues
sk_buffs into the back of a bpf list and dequeues from the front of the
list.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409214606.2000194-9-ameryhung@gmail.com
Let's make sure that we see a EDEADLK and ETIMEDOUT whenever checking
for the AA tests (in case of simple AA and AA after exhausting 31
entries).
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250410170023.2670683-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce tail adjustment functionality in xskxceiver using
bpf_xdp_adjust_tail(). Add `xsk_xdp_adjust_tail` to modify packet sizes
and drop unmodified packets. Implement `is_adjust_tail_supported` to check
helper availability. Develop packet resizing tests, including shrinking
and growing scenarios, with functions for both single-buffer and
multi-buffer cases. Update the test framework to handle various scenarios
and adjust MTU settings. These changes enhance the testing of packet tail
adjustments, improving AF_XDP framework reliability.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250410033116.173617-3-tushar.vyavahare@intel.com
Add a test case to verify the atomic update of existing elements in the
htab of maps. The test proceeds in three steps:
1) fill the outer map with keys in the range [0, 8]
For each inner array map, the value of its first element is set as the
key used to lookup the inner map.
2) create 16 threads to lookup these keys concurrently
Each lookup thread first lookups the inner map, then it checks whether
the first value of the inner array map is the same as the key used to
lookup the inner map.
3) create 8 threads to overwrite these keys concurrently
Each update thread first creates an inner array, it sets the first value
of the array to the key used to update the outer map, then it uses the
key and the inner map to update the outer map.
Without atomic update support, the lookup operation may return -ENOENT
during the lookup of outer map, or return -EINVAL during the comparison
of the first value in the inner map and the key used for inner map, and
the test will fail. After the atomic update change, both the lookup and
the comparison will succeed.
Given that the update of outer map is slow, the test case sets the loop
number for each thread as 5 to reduce the total running time. However,
the loop number could also be adjusted through FD_HTAB_LOOP_NR
environment variable.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add TCP+sockmap-based benchmark.
Since sockmap's own update and delete operations are generally less
critical, the performance of the fast forwarding framework built upon
it is the key aspect.
Also with cgset/cgexec, we can observe the behavior of sockmap under
memory pressure.
The benchmark can be run with:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
'''
In the future, we plan to move socket_helpers.h out of the prog_tests
directory to make it accessible for the benchmark. This will enable
better support for various socket types.
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-5-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add selftests checking that line and func info retrieved by newly added
libbpf APIs are the same as returned by kernel via bpf_prog_get_info_by_fd.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408234417.452565-3-mykyta.yatsenko5@gmail.com
Extend commit e3c9abd0d1 ("selftests/bpf: Implement setting global
variables in veristat") to support applying presets to members of
the global structs or unions in veristat.
For example:
```
./veristat set_global_vars.bpf.o -G "union1.struct3.var_u8_h = 0xBB"
```
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408104544.140317-1-mykyta.yatsenko5@gmail.com
This non-functional change serves as preparation for moving to
subsystem-based rstat trees. To simplify future commits, change the
signatures of existing cgroup-based rstat functions to become css-based and
rename them to reflect that.
Though the signatures have changed, the implementations have not. Within
these functions use the css->cgroup pointer to obtain the associated cgroup
and allow code to function the same just as it did before this patch. At
applicable call sites, pass the subsystem-specific css pointer as an
argument or pass a pointer to cgroup::self if not in subsystem context.
Note that cgroup_rstat_updated_list() and cgroup_rstat_push_children()
are not altered yet since there would be a larger amount of css to
cgroup conversions which may overcomplicate the code at this
intermediate phase.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
A few selftests and, more importantly, consequent changes to the
bpf_helpers.h file, use likely/unlikely macros, so define them here
and remove duplicate definitions from existing selftests.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250331203618.1973691-3-a.s.protopopov@gmail.com
Commit [1] moves percpu data on x86 from address 0x000... to address
0xfff...
Before [1]:
159020: 0000000000030700 0 OBJECT GLOBAL DEFAULT 23 pcpu_hot
After [1]:
152602: ffffffff83a3e034 4 OBJECT GLOBAL DEFAULT 35 pcpu_hot
As a result, verifier_bpf_fastcall tests should now expect a negative
value for pcpu_hot, IOW, the disassemble should show "r=" instead of
"w=".
Fix this in the test.
Note that, a later change created a new variable "cpu_number" for
bpf_get_smp_processor_id() [2]. The inlining logic is updated properly
as part of this change, so there is no need to fix anything on the
kernel side.
[1] commit 9d7de2aa8b ("x86/percpu/64: Use relative percpu offsets")
[2] commit 01c7bc5198 ("x86/smp: Move cpu number to percpu hot section")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250328193124.808784-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The change in struct file [1] moved f_ref to the 3rd cache line.
It made *(u64 *)file dereference invalid from the verifier point of view,
because btf_struct_walk() walks into f_lock field, which is 4-byte long.
Fix the selftests to deference the file pointer as a 4-byte access.
[1] commit e249056c91 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250327185528.1740787-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
happened this development cycle, including:
- kernfs scaling changes to make it even faster thanks to rcu
- bin_attribute constify work in many subsystems
- faux bus minor tweaks for the rust bindings
- rust binding updates for driver core, pci, and platform busses,
making more functionaliy available to rust drivers. These are all
due to people actually trying to use the bindings that were in 6.14.
- make Rafael and Danilo full co-maintainers of the driver core
codebase
- other minor fixes and updates.
This has been in linux-next for a while now, with the only reported
issue being some merge conflicts with the rust tree. Depending on which
tree you pull first, you will have conflicts in one of them. The merge
resolution has been in linux-next as an example of what to do, or can be
found here:
https://lore.kernel.org/r/CANiq72n3Xe8JcnEjirDhCwQgvWoE65dddWecXnfdnbrmuah-RQ@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+mMrg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylRgwCdH58OE3BgL0uoFY5vFImStpmPtqUAoL5HpVWI
jtbJ+UuXGsnmO+JVNBEv
=gy6W
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updatesk from Greg KH:
"Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
happened this development cycle, including:
- kernfs scaling changes to make it even faster thanks to rcu
- bin_attribute constify work in many subsystems
- faux bus minor tweaks for the rust bindings
- rust binding updates for driver core, pci, and platform busses,
making more functionaliy available to rust drivers. These are all
due to people actually trying to use the bindings that were in
6.14.
- make Rafael and Danilo full co-maintainers of the driver core
codebase
- other minor fixes and updates"
* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
rust: platform: require Send for Driver trait implementers
rust: pci: require Send for Driver trait implementers
rust: platform: impl Send + Sync for platform::Device
rust: pci: impl Send + Sync for pci::Device
rust: platform: fix unrestricted &mut platform::Device
rust: pci: fix unrestricted &mut pci::Device
rust: device: implement device context marker
rust: pci: use to_result() in enable_device_mem()
MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
rust/kernel/faux: mark Registration methods inline
driver core: faux: only create the device if probe() succeeds
rust/faux: Add missing parent argument to Registration::new()
rust/faux: Drop #[repr(transparent)] from faux::Registration
rust: io: fix devres test with new io accessor functions
rust: io: rename `io::Io` accessors
kernfs: Move dput() outside of the RCU section.
efi: rci2: mark bin_attribute as __ro_after_init
rapidio: constify 'struct bin_attribute'
firmware: qemu_fw_cfg: constify 'struct bin_attribute'
powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfcq3kACgkQ6rmadz2v
bToxkw/8DHIqjVnzU2O9hbRM1anYo6yM8e34IxCt0ajHTSEVJ93+C161QDWo/6Dk
+RNlaeGekaBUk+QOLb4u+rzZ2eR/pWSm37xuDRAiBCQ+3MgR60gGRaSljpS3IUem
0FvS6C1HObBCEUXMU2rNv/5cJB5/qrQYa9FEEjRvBTLqgQkdS7yaW/KKuZaNb+Ts
KiEeWvPrPSZXStfRGy8Wr4eS2rYhxPAikUR+xde9CM+HtMWwKTCTSp8qXrqA92Dj
Cz9ix01scznuf78QCRDZp09im3lZys8ZQprmPgMxyEscN+CDL7n68wAhmTJq0uo3
3NqIv7zBQ8wMChj0f0HjwZ0Wrj7BJAveY2Q0RterxdzT4vMKdtNkThX46ISaCoX/
XQAAhZHemK6MvBJk+LKkqqMgrD+3FAzvY7O+SCyUBAMs4FK1myRJQihdLXHGfiBU
DMDZE1jsE8qBaeUbz4LIuCy8fx2LhtVwVNwbNIBUZHdyfjxIXnQT/8Cnrgklwy2i
tnYekhAsHDQY+QDkrvJpc4E1vUtiXwSDI5ErcnWdSzctEOyVeUg7OuuGD4riCd1c
emdJmtASM1z9Ajqa1dytDxVaF6wjKlbhQgnKamuex5JLGCK6makk8ZoB+DBfKYHD
VoWummTu8ldf+Dp4ehBh7AbeF2vn4kLqcF1PLRsBO6ytJs4HIt8=
=5O7h
-----END PGP SIGNATURE-----
Merge tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf relisient spinlock support from Alexei Starovoitov:
"This patch set introduces Resilient Queued Spin Lock (or rqspinlock
with res_spin_lock() and res_spin_unlock() APIs).
This is a qspinlock variant which recovers the kernel from a stalled
state when the lock acquisition path cannot make forward progress.
This can occur when a lock acquisition attempt enters a deadlock
situation (e.g. AA, or ABBA), or more generally, when the owner of the
lock (which we’re trying to acquire) isn’t making forward progress.
Deadlock detection is the main mechanism used to provide instant
recovery, with the timeout mechanism acting as a final line of
defense. Detection is triggered immediately when beginning the waiting
loop of a lock slow path.
Additionally, BPF programs attached to different parts of the kernel
can introduce new control flow into the kernel, which increases the
likelihood of deadlocks in code not written to handle reentrancy.
There have been multiple syzbot reports surfacing deadlocks in
internal kernel code due to the diverse ways in which BPF programs can
be attached to different parts of the kernel. By switching the BPF
subsystem’s lock usage to rqspinlock, all of these issues are
mitigated at runtime.
This spin lock implementation allows BPF maps to become safer and
remove mechanisms that have fallen short in assuring safety when
nesting programs in arbitrary ways in the same context or across
different contexts.
We run benchmarks that stress locking scalability and perform
comparison against the baseline (qspinlock). For the rqspinlock case,
we replace the default qspinlock with it in the kernel, such that all
spin locks in the kernel use the rqspinlock slow path. As such,
benchmarks that stress kernel spin locks end up exercising rqspinlock.
More details in the cover letter in commit 6ffb9017e9 ("Merge branch
'resilient-queued-spin-lock'")"
* tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (24 commits)
selftests/bpf: Add tests for rqspinlock
bpf: Maintain FIFO property for rqspinlock unlock
bpf: Implement verifier support for rqspinlock
bpf: Introduce rqspinlock kfuncs
bpf: Convert lpm_trie.c to rqspinlock
bpf: Convert percpu_freelist.c to rqspinlock
bpf: Convert hashtab.c to rqspinlock
rqspinlock: Add locktorture support
rqspinlock: Add entry to Makefile, MAINTAINERS
rqspinlock: Add macros for rqspinlock usage
rqspinlock: Add basic support for CONFIG_PARAVIRT
rqspinlock: Add a test-and-set fallback
rqspinlock: Add deadlock detection and recovery
rqspinlock: Protect waiters in trylock fallback from stalls
rqspinlock: Protect waiters in queue from stalls
rqspinlock: Protect pending bit owners from stalls
rqspinlock: Hardcode cond_acquire loops for arm64
rqspinlock: Add support for timeouts
rqspinlock: Drop PV and virtualization support
rqspinlock: Add rqspinlock.h header
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfi6ZAACgkQ6rmadz2v
bTpLOg/+J7xUddPMhlpFAUlifQEadE5hmw6v1tXpM3zyKHzUWJiv/qsx3j8/ckgD
D+d4P8bqIbI9SSuIS4oZ0+D9pr/g7GYztnoYZmPiYJ7v2AijPuof5dsagFQE8E2y
rhfbt9KHTMzzkdkTvaAZaITS/HWAoJ2YVRB6gfLex2ghcXYHcgmtKRZniQrbBiFZ
MIXBN8Rg6HP+pUdIVllSXFcQCb3XIgjPONRAos4hr5tIm+3Ku7Jvkgk2H/9vUcoF
bdXAcg8xygyH7eY+1l3e7nEPQlG0jUZEsL+tq+vpdoLRLqlIpAUYmwUvqcmq4dPS
QGFjiUcpDbXlxsUFpzjXHIFto7fXCfND7HEICQPwAncdflIIfYaATSQUfkEexn0a
wBCFlAChrEzAmg2vFl4EeEr0fdSe/3jswrgKx0m6ctKieMjgloBUeeH4fXOpfkhS
9tvhuduVFuronlebM8ew4w9T/mBgbyxkE5KkvP4hNeB3ni3N0K6Mary5/u2HyN1e
lqTlnZxRA4p6lrvxce/mDrR4VSwlKLcSeQVjxAL1afD5KRkuZJnUv7bUhS361vkG
IjNrQX30EisDAz+X7tMn3ndBf9vVatwFT4+c3yaxlQRor1WofhDfT88HPiyB4QqQ
Kdx2EHgbQxJp4vkzhp4/OXlTfkihsMEn8egzZuphdPEQ9Y+Jdwg=
=aN/V
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"For this merge window we're splitting BPF pull request into three for
higher visibility: main changes, res_spin_lock, try_alloc_pages.
These are the main BPF changes:
- Add DFA-based live registers analysis to improve verification of
programs with loops (Eduard Zingerman)
- Introduce load_acquire and store_release BPF instructions and add
x86, arm64 JIT support (Peilin Ye)
- Fix loop detection logic in the verifier (Eduard Zingerman)
- Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)
- Add kfunc for populating cpumask bits (Emil Tsalapatis)
- Convert various shell based tests to selftests/bpf/test_progs
format (Bastien Curutchet)
- Allow passing referenced kptrs into struct_ops callbacks (Amery
Hung)
- Add a flag to LSM bpf hook to facilitate bpf program signing
(Blaise Boscaccy)
- Track arena arguments in kfuncs (Ihor Solodrai)
- Add copy_remote_vm_str() helper for reading strings from remote VM
and bpf_copy_from_user_task_str() kfunc (Jordan Rome)
- Add support for timed may_goto instruction (Kumar Kartikeya
Dwivedi)
- Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)
- Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
local storage (Martin KaFai Lau)
- Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)
- Allow retrieving BTF data with BTF token (Mykyta Yatsenko)
- Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
(Song Liu)
- Reject attaching programs to noreturn functions (Yafang Shao)
- Introduce pre-order traversal of cgroup bpf programs (Yonghong
Song)"
* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
bpf: Fix out-of-bounds read in check_atomic_load/store()
libbpf: Add namespace for errstr making it libbpf_errstr
bpf: Add struct_ops context information to struct bpf_prog_aux
selftests/bpf: Sanitize pointer prior fclose()
selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
selftests/bpf: test_xdp_vlan: Rename BPF sections
bpf: clarify a misleading verifier error message
selftests/bpf: Add selftest for attaching fexit to __noreturn functions
bpf: Reject attaching fexit/fmod_ret to __noreturn functions
bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
bpf: Make perf_event_read_output accessible in all program types.
bpftool: Using the right format specifiers
bpftool: Add -Wformat-signedness flag to detect format errors
selftests/bpf: Test freplace from user namespace
libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
bpf: Return prog btf_id without capable check
bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
bpf, x86: Fix objtool warning for timed may_goto
bpf: Check map->record at the beginning of check_and_free_fields()
...
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires
This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported out-of-bounds read in check_atomic_load/store() when the
register number is invalid in this context:
https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c
To avoid the issue from now on, let's add tests where the register number
is invalid for load-acquire/store-release.
After discussion with Eduard, I decided to use R15 as invalid register
because the actual slab-out-of-bounds read issue occurs when the register
number is R12 or larger.
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250322045340.18010-6-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
syzbot reported the following splat [0].
In check_atomic_load/store(), register validity is not checked before
atomic_ptr_type_ok(). This causes the out-of-bounds read in is_ctx_reg()
called from atomic_ptr_type_ok() when the register number is MAX_BPF_REG
or greater.
Call check_load_mem()/check_store_reg() before atomic_ptr_type_ok()
to avoid the OOB read.
However, some tests introduced by commit ff3afe5da9 ("selftests/bpf: Add
selftests for load-acquire and store-release instructions") assume
calling atomic_ptr_type_ok() before checking register validity.
Therefore the swapping of order unintentionally changes verifier messages
of these tests.
For example in the test load_acquire_from_pkt_pointer(), expected message
is 'BPF_ATOMIC loads from R2 pkt is not allowed' although actual messages
are different.
validate_msgs:FAIL:754 expect_msg
VERIFIER LOG:
=============
Global function load_acquire_from_pkt_pointer() doesn't return scalar. Only those are supported.
0: R1=ctx() R10=fp0
; asm volatile ( @ verifier_load_acquire.c:140
0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx() R2_w=pkt(r=0)
1: (d3) r0 = load_acquire((u8 *)(r2 +0))
invalid access to packet, off=0 size=1, R2(id=0,off=0,r=0)
R2 offset is outside of the packet
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
=============
EXPECTED SUBSTR: 'BPF_ATOMIC loads from R2 pkt is not allowed'
#505/19 verifier_load_acquire/load-acquire from pkt pointer:FAIL
This is because instructions in the test don't pass check_load_mem() and
therefore don't enter the atomic_ptr_type_ok() path.
In this case, we have to modify instructions so that they pass the
check_load_mem() and trigger atomic_ptr_type_ok().
Similarly for store-release tests, we need to modify instructions so that
they pass check_store_reg().
Like load_acquire_from_pkt_pointer(), modify instructions in:
load_acquire_from_sock_pointer()
store_release_to_ctx_pointer()
store_release_to_pkt_pointer()
Also in store_release_to_sock_pointer(), check_store_reg() returns error
early and atomic_ptr_type_ok() is not triggered, since write to sock
pointer is not possible in general.
We might be able to remove the test, but for now let's leave it and just
change the expected message.
[0]
BUG: KASAN: slab-out-of-bounds in is_ctx_reg kernel/bpf/verifier.c:6185 [inline]
BUG: KASAN: slab-out-of-bounds in atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223
Read of size 4 at addr ffff888141b0d690 by task syz-executor143/5842
CPU: 1 UID: 0 PID: 5842 Comm: syz-executor143 Not tainted 6.14.0-rc3-syzkaller-gf28214603dc6 #0
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0x16e/0x5b0 mm/kasan/report.c:521
kasan_report+0x143/0x180 mm/kasan/report.c:634
is_ctx_reg kernel/bpf/verifier.c:6185 [inline]
atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223
check_atomic_store kernel/bpf/verifier.c:7804 [inline]
check_atomic kernel/bpf/verifier.c:7841 [inline]
do_check+0x89dd/0xedd0 kernel/bpf/verifier.c:19334
do_check_common+0x1678/0x2080 kernel/bpf/verifier.c:22600
do_check_main kernel/bpf/verifier.c:22691 [inline]
bpf_check+0x165c8/0x1cca0 kernel/bpf/verifier.c:23821
Reported-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c
Tested-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com
Fixes: e24bbad29a8d ("bpf: Introduce load-acquire and store-release instructions")
Fixes: ff3afe5da9 ("selftests/bpf: Add selftests for load-acquire and store-release instructions")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250322045340.18010-5-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The __load() helper expects BPF sections to be names 'xdp' or 'tc'
Rename BPF sections so they can be loaded with the __load() helper in
upcoming patch.
Rename the BPF functions with their previous section's name.
Update the 'ip link' commands in the script to use the program name
instead of the section name to load the BPF program.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250221-xdp_vlan-v1-1-7d29847169af@bootlin.com
Introduce selftests that trigger AA, ABBA deadlocks, and test the edge
case where the held locks table runs out of entries, since we then
fallback to the timeout as the final line of defense. Also exercise
verifier's AA detection where applicable.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-26-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add selftests to verify that it is possible to load freplace program
from user namespace if BPF token is initialized by bpf_object__prepare
before calling bpf_program__set_attach_target.
Negative test is added as well.
Modified type of the priv_prog to xdp, as kprobe did not work on aarch64
and s390x.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250317174039.161275-5-mykyta.yatsenko5@gmail.com
The SO_RCVLOWAT option is defined as 18 in the selftest header,
which matches the generic definition. However, on powerpc,
SO_RCVLOWAT is defined as 16. This discrepancy causes
sol_socket_sockopt() to fail with the default switch case on powerpc.
This commit fixes by defining SO_RCVLOWAT as 16 for powerpc.
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20250311084647.3686544-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The strncmp benchmark uses the bpf_strncmp helper and a hand-written
loop to compare two strings. The values of the strings are filled from
userspace. One of the strings is non-const (in .bss) while the other is
const (in .rodata) since that is the requirement of bpf_strncmp.
The problem is that in the hand-written loop, Clang optimizes the reads
from the const string to always return 0 which breaks the benchmark.
Use barrier_var to prevent the optimization.
The effect can be seen on the strncmp-no-helper variant.
Before this change:
# ./bench strncmp-no-helper
Setting up benchmark 'strncmp-no-helper'...
Benchmark 'strncmp-no-helper' started.
Iter 0 (112.309us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 1 (-23.238us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 2 ( 58.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 3 (-30.466us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 4 ( 29.996us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 5 ( 16.949us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Iter 6 (-60.035us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s
Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s
After this change:
# ./bench strncmp-no-helper
Setting up benchmark 'strncmp-no-helper'...
Benchmark 'strncmp-no-helper' started.
Iter 0 ( 77.711us): hits 5.534M/s ( 5.534M/prod), drops 0.000M/s, total operations 5.534M/s
Iter 1 ( 11.215us): hits 6.006M/s ( 6.006M/prod), drops 0.000M/s, total operations 6.006M/s
Iter 2 (-14.253us): hits 5.931M/s ( 5.931M/prod), drops 0.000M/s, total operations 5.931M/s
Iter 3 ( 59.087us): hits 6.005M/s ( 6.005M/prod), drops 0.000M/s, total operations 6.005M/s
Iter 4 (-21.379us): hits 6.010M/s ( 6.010M/prod), drops 0.000M/s, total operations 6.010M/s
Iter 5 (-20.310us): hits 5.861M/s ( 5.861M/prod), drops 0.000M/s, total operations 5.861M/s
Iter 6 ( 53.937us): hits 6.004M/s ( 6.004M/prod), drops 0.000M/s, total operations 6.004M/s
Summary: hits 5.969 ± 0.061M/s ( 5.969M/prod), drops 0.000 ± 0.000M/s, total operations 5.969 ± 0.061M/s
Fixes: 9c42652f8b ("selftests/bpf: Add benchmark for bpf_strncmp() helper")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20250313122852.1365202-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This test exercises the kernel flag added to security_bpf by
effectively blocking light-skeletons from loading while allowing
normal skeletons to function as-is. Since this should work with any
arbitrary BPF program, an existing program from LSKELS_EXTRA was
used as a test payload.
Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250310221737.821889-3-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Certain bpf syscall subcommands are available for usage from both
userspace and the kernel. LSM modules or eBPF gatekeeper programs may
need to take a different course of action depending on whether or not
a BPF syscall originated from the kernel or userspace.
Additionally, some of the bpf_attr struct fields contain pointers to
arbitrary memory. Currently the functionality to determine whether or
not a pointer refers to kernel memory or userspace memory is exposed
to the bpf verifier, but that information is missing from various LSM
hooks.
Here we augment the LSM hooks to provide this data, by simply passing
a boolean flag indicating whether or not the call originated in the
kernel, in any hook that contains a bpf_attr struct that corresponds
to a subcommand that may be called from the kernel.
Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250310221737.821889-2-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF cpumask selftests are currently run twice in
test_progs/cpumask.c, once by traversing cpumask_success_testcases, and
once by invoking RUN_TESTS(cpumask_success). Remove the invocation of
RUN_TESTS to properly run the selftests only once.
Now that the tests are run only through cpumask_success_testscases, add
to it the missing test_refcount_null_tracking testcase. Also remove the
__success annotation from it, since it is now loaded and invoked by the
runner.
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250309230427.26603-5-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add selftests for the bpf_cpumask_populate helper that sets a
bpf_cpumask to a bit pattern provided by a BPF program.
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250309230427.26603-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add some basic selftests for qspinlock built over BPF arena using
cond_break_label macro.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250306035431.2186189-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
may_goto instruction does not use any registers,
but in compute_insn_live_regs() it was treated as a regular
conditional jump of kind BPF_K with r0 as source register.
Thus unnecessarily marking r0 as used.
Fixes: 14c8552db6 ("bpf: simple DFA-based live registers analysis")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250305085436.2731464-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add several ./test_progs tests:
- arena_atomics/load_acquire
- arena_atomics/store_release
- verifier_load_acquire/*
- verifier_store_release/*
- verifier_precision/bpf_load_acquire
- verifier_precision/bpf_store_release
The last two tests are added to check if backtrack_insn() handles the
new instructions correctly.
Additionally, the last test also makes sure that the verifier
"remembers" the value (in src_reg) we store-release into e.g. a stack
slot. For example, if we take a look at the test program:
#0: r1 = 8;
/* store_release((u64 *)(r10 - 8), r1); */
#1: .8byte %[store_release];
#2: r1 = *(u64 *)(r10 - 8);
#3: r2 = r10;
#4: r2 += r1;
#5: r0 = 0;
#6: exit;
At #1, if the verifier doesn't remember that we wrote 8 to the stack,
then later at #4 we would be adding an unbounded scalar value to the
stack pointer, which would cause the program to be rejected:
VERIFIER LOG:
=============
...
math between fp pointer and register with unbounded min value is not allowed
For easier CI integration, instead of using built-ins like
__atomic_{load,store}_n() which depend on the new
__BPF_FEATURE_LOAD_ACQ_STORE_REL pre-defined macro, manually craft
load-acquire/store-release instructions using __imm_insn(), as suggested
by Eduard.
All new tests depend on:
(1) Clang major version >= 18, and
(2) ENABLE_ATOMICS_TESTS is defined (currently implies -mcpu=v3 or
v4), and
(3) JIT supports load-acquire/store-release (currently arm64 and
x86-64)
In .../progs/arena_atomics.c:
/* 8-byte-aligned */
__u8 __arena_global load_acquire8_value = 0x12;
/* 1-byte hole */
__u16 __arena_global load_acquire16_value = 0x1234;
That 1-byte hole in the .addr_space.1 ELF section caused clang-17 to
crash:
fatal error: error in backend: unable to write nop sequence of 1 bytes
To work around such llvm-17 CI job failures, conditionally define
__arena_global variables as 64-bit if __clang_major__ < 18, to make sure
.addr_space.1 has no holes. Ideally we should avoid compiling this file
using clang-17 at all (arena tests depend on
__BPF_FEATURE_ADDR_SPACE_CAST, and are skipped for llvm-17 anyway), but
that is a separate topic.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/1b46c6feaf0f1b6984d9ec80e500cc7383e9da1a.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement the arch_bpf_timed_may_goto function using inline assembly to
have control over which registers are spilled, and use our special
protocol of using BPF_REG_AX as an argument into the function, and as
the return value when going back.
Emit call depth accounting for the call made from this stub, and ensure
we don't have naked returns (when rethunk mitigations are enabled) by
falling back to the RET macro (instead of retq). After popping all saved
registers, the return address into the BPF program should be on top of
the stack.
Since the JIT support is now enabled, ensure selftests which are
checking the produced may_goto sequences do not break by adjusting them.
Make sure we still test the old may_goto sequence on other
architectures, while testing the new sequence on x86_64.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250304003239.2390751-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add tests for freplace behavior with the combination of sleepable
and non-sleepable global subprogs. The changes_pkt_data selftest
did all the hardwork, so simply rename it and include new support
for more summarization tests for might_sleep bit.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250301151846.1552362-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add tests for rejecting sleepable and accepting non-sleepable global
function calls in atomic contexts. For spin locks, we still reject
all global function calls. Once resilient spin locks land, we will
carefully lift in cases where we deem it safe.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250301151846.1552362-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_select_reuseport_kern.c is currently including <stdlib.h>, but it
does not use any definition from there.
Remove stdlib.h inclusion from test_select_reuseport_kern.c
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250227-remove_wrong_header-v1-1-bc94eb4e2f73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek
BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm
SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+
t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd
n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ
JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc
eWLFUeg=
=Wypt
-----END PGP SIGNATURE-----
Merge 6.14-rc6 into driver-core-next
We need the driver core fix in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ8qHFwAKCRAraS+Z+3y5
E6QRAQCI/irYxpT6nBgT3ejyO6CIHjbHjdNP5KRkE8JT61zmJgD8Diet4dwwOxLK
c1Pq5Jnl9KLpEPG99TcjtH3c++N3fww=
=ltml
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-03-06
We've added 6 non-merge commits during the last 13 day(s) which contain
a total of 6 files changed, 230 insertions(+), 56 deletions(-).
The main changes are:
1) Add XDP metadata support for tun driver, from Marcus Wichelmann.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Fix file descriptor assertion in open_tuntap helper
selftests/bpf: Add test for XDP metadata support in tun driver
selftests/bpf: Refactor xdp_context_functional test and bpf program
selftests/bpf: Move open_tuntap to network helpers
net: tun: Enable transfer of XDP metadata to skb
net: tun: Enable XDP metadata support
====================
Link: https://patch.msgid.link/20250307055335.441298-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The existing XDP metadata test works by creating a veth pair and
attaching XDP & TC programs that drop the packet when the condition of
the test isn't fulfilled. The test then pings through the veth pair and
succeeds when the ping comes through.
While this test works great for a veth pair, it is hard to replicate for
tap devices to test the XDP metadata support of them. A similar test for
the tun driver would either involve logic to reply to the ping request,
or would have to capture the packet to check if it was dropped or not.
To make the testing of other drivers easier while still maximizing code
reuse, this commit refactors the existing xdp_context_functional test to
use a test_result map. Instead of conditionally passing or dropping the
packet, the TC program is changed to copy the received metadata into the
value of that single-entry array map. Tests can then verify that the map
value matches the expectation.
This testing logic is easy to adapt to other network drivers as the only
remaining requirement is that there is some way to send a custom
Ethernet packet through it that triggers the XDP & TC programs.
The Ethernet header of that custom packet is all-zero, because it is not
required to be valid for the test to work. The zero ethertype also helps
to filter out packets that are not related to the test and would
otherwise interfere with it.
The payload of the Ethernet packet is used as the test data that is
expected to be passed as metadata from the XDP to the TC program and
written to the map. It has a fixed size of 32 bytes which is a
reasonable size that should be supported by both drivers. Additional
packet headers are not necessary for the test and were therefore skipped
to keep the testing code short.
This new testing methodology no longer requires the veth interfaces to
have IP addresses assigned, therefore these were removed.
Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250305213438.3863922-5-marcus.wichelmann@hetzner-cloud.de
Introducing test for veristat, part of test_progs.
Test cases cover functionality of setting global variables in BPF
program.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250225163101.121043-3-mykyta.yatsenko5@gmail.com
Update usdt tests to also check for correct behavior of
bpf_usdt_arg_size().
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250224235756.2612606-2-ihor.solodrai@linux.dev
Add netns cookie test that verifies the helper is now supported and work
in the context of cgroup_skb programs.
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Link: https://lore.kernel.org/r/20250225125031.258740-2-mahe.tardy@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test gen_prologue and gen_epilogue that generate kfuncs that have not
been seen in the main program.
The main bpf program and return value checks are identical to
pro_epilogue.c introduced in commit 47e69431b5 ("selftests/bpf: Test
gen_prologue and gen_epilogue"). However, now when bpf_testmod_st_ops
detects a program name with prefix "test_kfunc_", it generates slightly
different prologue and epilogue: They still add 1000 to args->a in
prologue, add 10000 to args->a and set r0 to 2 * args->a in epilogue,
but involve kfuncs.
At high level, the alternative version of prologue and epilogue look
like this:
cgrp = bpf_cgroup_from_id(0);
if (cgrp)
bpf_cgroup_release(cgrp);
else
/* Perform what original bpf_testmod_st_ops prologue or
* epilogue does
*/
Since 0 is never a valid cgroup id, the original prologue or epilogue
logic will be performed. As a result, the __retval check should expect
the exact same return value.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250225233545.285481-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test if the verifier rejects struct_ops program with __ref argument
calling bpf_tail_call().
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250220221532.1079331-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Fix a soft-lockup in BPF arena_map_free on 64k page size
kernels (Alan Maguire)
- Fix a missing allocation failure check in BPF verifier's
acquire_lock_state (Kumar Kartikeya Dwivedi)
- Fix a NULL-pointer dereference in trace_kfree_skb by adding
kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima)
- Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
- Fix a syzbot-reported deadlock when holding BPF map's
freeze_mutex (Andrii Nakryiko)
- Fix a use-after-free issue in bpf_test_init when
eth_skb_pkt_type is accessing skb data not containing an
Ethernet header (Shigeru Yoshida)
- Fix skipping non-existing keys in generic_map_lookup_batch
(Yan Zhai)
- Several BPF sockmap fixes to address incorrect TCP copied_seq
calculations, which prevented correct data reads from recv(2)
in user space (Jiayuan Chen)
- Two fixes for BPF map lookup nullness elision (Daniel Xu)
- Fix a NULL-pointer dereference from vmlinux BTF lookup in
bpf_sk_storage_tracing_allowed (Jared Kangas)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZ7evlRUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISIPwHgD/dTvM00Ck4Q73fPivyT7tcqxeXJlD
D6ggzWl/SG9LAbwA/2/cSgAM9Jm1g7ddvn/S9QaDYOs5GmFl6urq6krs+tYD
=FCs9
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull BPF fixes from Daniel Borkmann:
- Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
(Alan Maguire)
- Fix a missing allocation failure check in BPF verifier's
acquire_lock_state (Kumar Kartikeya Dwivedi)
- Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
to the raw_tp_null_args set (Kuniyuki Iwashima)
- Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
- Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
(Andrii Nakryiko)
- Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
accessing skb data not containing an Ethernet header (Shigeru
Yoshida)
- Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)
- Several BPF sockmap fixes to address incorrect TCP copied_seq
calculations, which prevented correct data reads from recv(2) in user
space (Jiayuan Chen)
- Two fixes for BPF map lookup nullness elision (Daniel Xu)
- Fix a NULL-pointer dereference from vmlinux BTF lookup in
bpf_sk_storage_tracing_allowed (Jared Kangas)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests: bpf: test batch lookup on array of maps with holes
bpf: skip non exist keys in generic_map_lookup_batch
bpf: Handle allocation failure in acquire_lock_state
bpf: verifier: Disambiguate get_constant_map_key() errors
bpf: selftests: Test constant key extraction on irrelevant maps
bpf: verifier: Do not extract constant map keys for irrelevant maps
bpf: Fix softlockup in arena_map_free on 64k page kernel
net: Add rx_skb of kfree_skb to raw_tp_null_args[].
bpf: Fix deadlock when freeing cgroup storage
selftests/bpf: Add strparser test for bpf
selftests/bpf: Fix invalid flag of recv()
bpf: Disable non stream socket for strparser
bpf: Fix wrong copied_seq calculation
strparser: Add read_sock callback
bpf: avoid holding freeze_mutex during mmap operation
bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
selftests/bpf: Adjust data size to have ETH_HLEN
bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
BPF program calculates a couple of latency deltas between each tx
timestamping callbacks. It can be used in the real world to diagnose
the kernel behaviour in the tx path.
Check the safety issues by accessing a few bpf calls in
bpf_test_access_bpf_calls() which are implemented in the patch 3 and 4.
Check if the bpf timestamping can co-exist with socket timestamping.
There remains a few realistic things[1][2] to highlight:
1. in general a packet may pass through multiple qdiscs. For instance
with bonding or tunnel virtual devices in the egress path.
2. packets may be resent, in which case an ACK might precede a repeat
SCHED and SND.
3. erroneous or malicious peers may also just never send an ACK.
[1]: https://lore.kernel.org/all/67a389af981b0_14e0832949d@willemb.c.googlers.com.notmuch/
[2]: https://lore.kernel.org/all/c329a0c1-239b-4ca1-91f2-cb30b8dd2f6a@linux.dev/
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-13-kerneljasonxing@gmail.com
A test case with ridiculously deep bpf_for() nesting and
a conditional update of a stack location.
Consider the innermost loop structure:
1: bpf_for(o, 0, 10)
2: if (unlikely(bpf_get_prandom_u32()))
3: buf[0] = 42;
4: <exit>
Assuming that verifier.c:clean_live_states() operates w/o change from
the previous patch (e.g. as on current master) verification would
proceed as follows:
- at (1) state {buf[0]=?,o=drained}:
- checkpoint
- push visit to (2) for later
- at (4) {buf[0]=?,o=drained}
- pop (2) {buf[0]=?,o=active}, push visit to (3) for later
- at (1) {buf[0]=?,o=active}
- checkpoint
- push visit to (2) for later
- at (4) {buf[0]=?,o=drained}
- pop (2) {buf[0]=?,o=active}, push visit to (3) for later
- at (1) {buf[0]=?,o=active}:
- checkpoint reached, checkpoint's branch count becomes 0
- checkpoint is processed by clean_live_states() and
becomes {o=active}
- pop (3) {buf[0]=42,o=active}
- at (1), {buf[0]=42,o=active}
- checkpoint
- push visit to (2) for later
- at (4) {buf[0]=42,o=drained}
- pop (2) {buf[0]=42,o=active}, push visit to (3) for later
- at (1) {buf[0]=42,o=active}, checkpoint reached
- pop (3) {buf[0]=42,o=active}
- at (1) {buf[0]=42,o=active}:
- checkpoint reached, checkpoint's branch count becomes 0
- checkpoint is processed by clean_live_states() and
becomes {o=active}
- ...
Note how clean_live_states() converted the checkpoint
{buf[0]=42,o=active} to {o=active} and it can no longer be matched
against {buf[0]=<any>,o=active}, because iterator based states
are compared using stacksafe(... RANGE_WITHIN), that requires
stack slots to have same types. At the same time there are
still states {buf[0]=42,o=active} pushed to DFS stack.
This behaviour becomes exacerbated with multiple nesting levels,
here are veristat results:
- nesting level 1: 69 insns
- nesting level 2: 258 insns
- nesting level 3: 900 insns
- nesting level 4: 4754 insns
- nesting level 5: 35944 insns
- nesting level 6: 312558 insns
- nesting level 7: 1M limit
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250215110411.3236773-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A somewhat cumbersome test case sensitive to correct copying of
bpf_verifier_state->loop_entry fields in
verifier.c:copy_verifier_state().
W/o the fix from a previous commit the program is accepted as safe.
1: /* poison block */
2: if (random() != 24) { // assume false branch is placed first
3: i = iter_new();
4: while (iter_next(i));
5: iter_destroy(i);
6: return;
7: }
8:
9: /* dfs_depth block */
10: for (i = 10; i > 0; i--);
11:
12: /* main block */
13: i = iter_new(); // fp[-16]
14: b = -24; // r8
15: for (;;) {
16: if (iter_next(i))
17: break;
18: if (random() == 77) { // assume false branch is placed first
19: *(u64 *)(r10 + b) = 7; // this is not safe when b == -25
20: iter_destroy(i);
21: return;
22: }
23: if (random() == 42) { // assume false branch is placed first
24: b = -25;
25: }
26: }
27: iter_destroy(i);
The goal of this example is to:
(a) poison env->cur_state->loop_entry with a state S,
such that S->branches == 0;
(b) set state S as a loop_entry for all checkpoints in
/* main block */, thus forcing NOT_EXACT states comparisons;
(c) exploit incorrect loop_entry set for checkpoint at line 18
by first creating a checkpoint with b == -24 and then
pruning the state with b == -25 using that checkpoint.
The /* poison block */ is responsible for goal (a).
It forces verifier to first validate some unrelated iterator based
loop, which leads to an update_loop_entry() call in is_state_visited(),
which places checkpoint created at line 4 as env->cur_state->loop_entry.
Starting from line 8, the branch count for that checkpoint is 0.
The /* dfs_depth block */ is responsible for goal (b).
It abuses the fact that update_loop_entry(cur, hdr) only updates
cur->loop_entry when hdr->dfs_depth <= cur->dfs_depth.
After line 12 every state has dfs_depth bigger then dfs_depth of
poisoned env->cur_state->loop_entry. Thus the above condition is never
true for lines 12-27.
The /* main block */ is responsible for goal (c).
Verification proceeds as follows:
- checkpoint {b=-24,i=active} created at line 16;
- jump 18->23 is verified first, jump to 19 pushed to stack;
- jump 23->26 is verified first, jump to 24 pushed to stack;
- checkpoint {b=-24,i=active} created at line 15;
- current state is pruned by checkpoint created at line 16,
this sets branches count for checkpoint at line 15 to 0;
- jump to 24 is popped from stack;
- line 16 is reached in state {b=-25,i=active};
- this is pruned by a previous checkpoint {b=-24,i=active}:
- checkpoint's loop_entry is poisoned and has branch count of 0,
hence states are compared using NOT_EXACT rules;
- b is not marked precise yet.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250215110411.3236773-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
XDP programs loaded on egress is tested by test_xdp_redirect_multi.sh
but not by the test_progs framework.
Add a test case in test_xdp_veth.c to test the XDP program on egress.
Use the same BPF program than test_xdp_redirect_multi.sh that replaces
the source MAC address by one provided through a BPF map.
Use a BPF program that stores the source MAC of received packets in a
map to check the test results.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-5-fd0d39fca6e6@bootlin.com
XDP redirections with BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS flags
are tested by test_xdp_redirect_multi.sh but not within the test_progs
framework.
Add a broadcast test case in test_xdp_veth.c to test them.
Use the same BPF programs than the one used by
test_xdp_redirect_multi.sh.
Use a BPF map to select the broadcast flags.
Use a BPF map with an entry per veth to check whether packets are
received or not
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-4-fd0d39fca6e6@bootlin.com
Broadcasting flags are hardcoded for each kind for protocol.
Create a redirect_flags map that allows to select the broadcasting flags
to use in the bpf_redirect_map(). The protocol ID is used as a key.
Set the old hardcoded values as default if the map isn't filled by the
BPF caller.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-3-fd0d39fca6e6@bootlin.com
Test struct_ops programs returning referenced kptr. When the return type
of a struct_ops operator is pointer to struct, the verifier should
only allow programs that return a scalar NULL or a non-local kptr with the
correct type in its unmodified form.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250217190640.1748177-6-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test referenced kptr acquired through struct_ops argument tagged with
"__ref". The success case checks whether 1) a reference to the correct
type is acquired, and 2) the referenced kptr argument can be accessed in
multiple paths as long as it hasn't been released. In the fail cases,
we first confirm that a referenced kptr acquried through a struct_ops
argument is not allowed to be leaked. Then, we make sure this new
referenced kptr acquiring mechanism does not accidentally allow referenced
kptrs to flow into global subprograms through their arguments.
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250217190640.1748177-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
kernfs_rename_lock is used to obtain stable kernfs_node::{name|parent}
pointer. This is a preparation to access kernfs_node::parent under RCU
and ensure that the pointer remains stable under the RCU lifetime
guarantees.
For a complete path, as it is done in kernfs_path_from_node(), the
kernfs_rename_lock is still required in order to obtain a stable parent
relationship while computing the relevant node depth. This must not
change while the nodes are inspected in order to build the path.
If the kernfs user never moves the nodes (changes the parent) then the
kernfs_rename_lock is not required and the RCU guarantees are
sufficient. This "restriction" can be set with
KERNFS_ROOT_INVARIANT_PARENT. Otherwise the lock is required.
Rename kernfs_node::parent to kernfs_node::__parent to denote the RCU
access and use RCU accessor while accessing the node.
Make cgroup use KERNFS_ROOT_INVARIANT_PARENT since the parent here can
not change.
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20250213145023.2820193-6-bigeasy@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a simple repro for the issue of miscalculating LDX/STX/ST CO-RE
relocation size adjustment when the CO-RE relocation target type is an
ARRAY.
We need to make sure that compiler generates LDX/STX/ST instruction with
CO-RE relocation against entire ARRAY type, not ARRAY's element. With
the code pattern in selftest, we get this:
59: 61 71 00 00 00 00 00 00 w1 = *(u32 *)(r7 + 0x0)
00000000000001d8: CO-RE <byte_off> [5] struct core_reloc_arrays::a (0:0)
Where offset of `int a[5]` is embedded (through CO-RE relocation) into memory
load instruction itself.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250207014809.1573841-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Added test cases to ensure that programs with stack sizes exceeding 512
bytes are restricted in non-JITed mode, and can be executed normally in
JITed mode, even with stack sizes exceeding 512 bytes due to the presence
of may_goto instructions.
Test result:
echo "0" > /proc/sys/net/core/bpf_jit_enable
./test_progs -t verifier_stack_ptr
...
stack size 512 with may_goto with jit:SKIP
stack size 512 with may_goto without jit:OK
...
Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED
echo "1" > /proc/sys/net/core/bpf_jit_enable
./test_progs -t verifier_stack_ptr
...
stack size 512 with may_goto with jit:OK
stack size 512 with may_goto without jit:SKIP
...
Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Link: https://lore.kernel.org/r/20250214091823.46042-4-mrpre@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In some cases, the verification logic under the interpreter and JIT
differs, such as may_goto, and the test program behaves differently under
different runtime modes, requiring separate verification logic for each
result.
Introduce __load_if_JITed and __load_if_no_JITed annotation for tests.
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Link: https://lore.kernel.org/r/20250214091823.46042-3-mrpre@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Two sets of tests are added to exercise the not _locked and _locked
version of the kfuncs. For both tests, user space accesses xattr
security.bpf.foo on a testfile. The BPF program is triggered by user
space access (on LSM hook inode_[set|get]_xattr) and sets or removes
xattr security.bpf.bar. Then user space then validates that xattr
security.bpf.bar is set or removed as expected.
Note that, in both tests, the BPF programs use the not _locked kfuncs.
The verifier picks the proper kfuncs based on the calling context.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250130213549.3353349-6-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend test_progs fs_kfuncs to cover different xattr names. Specifically:
xattr name "user.kfuncs" and "security.bpf.xxx" can be read from BPF
program with kfuncs bpf_get_[file|dentry]_xattr(); while "security.bpf"
and "security.selinux" cannot be read.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250130213549.3353349-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since commit 7e92e01b72 ("powerpc: Provide syscall wrapper")
landed in v6.1, syscall wrapper is enabled on powerpc. Commit
9474689020 ("powerpc: Don't add __powerpc_ prefix to syscall
entry points") , that drops the prefix to syscall entry points,
also landed in the same release. So, add the missing empty
SYS_PREFIX prefix definition for powerpc, to fix some fentry
and kprobe selftests.
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/7192d6aa9501115dc242435970df82b3d190f257.1738302337.git.skb99@linux.ibm.com
Most of the filesystem methods where we care about dentry name
and parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for
expected name and expected parent of the dentry being
validated. That kills quite a bit of boilerplate in
->d_revalidate() instances, along with a bunch of races
where they used to access ->d_name without sufficient
precautions.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ
6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm
xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE=
=TWY8
-----END PGP SIGNATURE-----
Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_revalidate updates from Al Viro:
"Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name and
parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for expected name
and expected parent of the dentry being validated. That kills quite a
bit of boilerplate in ->d_revalidate() instances, along with a bunch
of races where they used to access ->d_name without sufficient
precautions"
* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: fix ->rename_sem exclusion
orangefs_d_revalidate(): use stable parent inode and name passed by caller
ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
nfs: fix ->d_revalidate() UAF on ->d_name accesses
nfs{,4}_lookup_validate(): use stable parent inode passed by caller
gfs2_drevalidate(): use stable parent inode and name passed by caller
fuse_dentry_revalidate(): use stable parent inode and name passed by caller
vfat_revalidate{,_ci}(): use stable parent inode passed by caller
exfat_d_revalidate(): use stable parent inode passed by caller
fscrypt_d_revalidate(): use stable parent inode passed by caller
ceph_d_revalidate(): propagate stable name down into request encoding
ceph_d_revalidate(): use stable parent inode passed by caller
afs_d_revalidate(): use stable name and parent inode passed by caller
Pass parent directory inode and expected name to ->d_revalidate()
generic_ci_d_compare(): use shortname_storage
ext4 fast_commit: make use of name_snapshot primitives
dissolve external_name.u into separate members
make take_dentry_name_snapshot() lockless
dcache: back inline names with a struct-wrapped array of unsigned long
make sure that DNAME_INLINE_LEN is a multiple of word size
Add test cases for bpf + strparser and separated them from
sockmap_basic, as strparser has more encapsulation and parsing
capabilities compared to standard sockmap.
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://patch.msgid.link/20250122100917.49845-6-mrpre@163.com
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmeOu1YACgkQ6rmadz2v
bTrrHxAAn6eqEsluWnDlzhI0OGsPjvgS00sf+MOeqiXYeS2eJ8yJuKifp38+nIQZ
lIplsWU2ReUY20eizPqLPnQ7TXZGvLgp08E8yHUoZ0siWanqr9iDRfbZCCNrDMNm
lMqeR1SLapMws2R/UX9JbvPn2ajIJ6Lb4wxenTfdlW6q+0hAGM6Dt0k/jBod+quq
/oo+xwG3L0q4APBovJfiAFN2z6IYN03b+zLiOrpIJtMACGewEXnl3m4mkL8ZM/FV
nZGPIxIUPXCpKTGEkNqxfkrnHN2wZQ4ZSKEJ6lhEEp4jrgCVITaGZ/E7jlx6fZoj
bbd4YMonIPo9Nhim8p1dt8yYBhKKiE5IXIq0GqlMv5+MvAN8ylrlydpsouW1fu66
hZ1W1BxbxmrgyF0Bwo9JPOMhBHwMrmD6iH9LgiMpZf0ASeF+q9cJpoSOU5j5E9XB
LpLIRf5jYTd4wZjhDmrQREReLo+Bng9DlCBu+jjh2+YTz6l6Qed+ETpENcd7lL5i
IHZVbgD2RVPNJoUfdrd763HfYfDTk+50MF5FIMEyfKHz11if0E/LhBMzto22hm6b
2f8ruj/8yvg8s2dxEP3ySQgcnynlwEnGxLenUVv7uEOYKeWri1rq+fvTK5ne1OLK
oHnTlkViwQb74c0r8cFW+nkyfUYTfhhBAql14rl/fMjGDO2KZ10=
=f2CA
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"A smaller than usual release cycle.
The main changes are:
- Prepare selftest to run with GCC-BPF backend (Ihor Solodrai)
In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile
only mode. Half of the tests are failing, since support for
btf_decl_tag is still WIP, but this is a great milestone.
- Convert various samples/bpf to selftests/bpf/test_progs format
(Alexis Lothoré and Bastien Curutchet)
- Teach verifier to recognize that array lookup with constant
in-range index will always succeed (Daniel Xu)
- Cleanup migrate disable scope in BPF maps (Hou Tao)
- Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao)
- Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin
KaFai Lau)
- Refactor verifier lock support (Kumar Kartikeya Dwivedi)
This is a prerequisite for upcoming resilient spin lock.
- Remove excessive 'may_goto +0' instructions in the verifier that
LLVM leaves when unrolls the loops (Yonghong Song)
- Remove unhelpful bpf_probe_write_user() warning message (Marco
Elver)
- Add fd_array_cnt attribute for prog_load command (Anton Protopopov)
This is a prerequisite for upcoming support for static_branch"
* tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits)
selftests/bpf: Add some tests related to 'may_goto 0' insns
bpf: Remove 'may_goto 0' instruction in opt_remove_nops()
bpf: Allow 'may_goto 0' instruction in verifier
selftests/bpf: Add test case for the freeing of bpf_timer
bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
bpf: Free special fields after unlock in htab_lru_map_delete_node()
tools: Sync if_xdp.h uapi tooling header
libbpf: Work around kernel inconsistently stripping '.llvm.' suffix
bpf: selftests: verifier: Add nullness elision tests
bpf: verifier: Support eliding map lookup nullness
bpf: verifier: Refactor helper access type tracking
bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
bpf: verifier: Add missing newline on verbose() call
selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED
libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED
libbpf: Fix return zero when elf_begin failed
selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test
veristat: Load struct_ops programs only once
...
Add both asm-based and C-based tests which have 'may_goto 0' insns.
For the following code in C-based test,
int i, tmp[3];
for (i = 0; i < 3 && can_loop; i++)
tmp[i] = 0;
The clang compiler (clang 19 and 20) generates
may_goto 2
may_goto 1
may_goto 0
r1 = 0
r2 = 0
r3 = 0
The above asm codes are due to llvm pass SROAPass. This ensures the
successful verification since tmp[0-2] are initialized. Otherwise,
the code without SROAPass like
may_goto 5
r1 = 0
may_goto 3
r2 = 0
may_goto 1
r3 = 0
will have verification failure.
Although from the source code C-based test should have verification
failure, clang compiler optimization generates code with successful
verification. If gcc generates different asm codes than clang, the
following code can be used for gcc:
int i, tmp[3];
for (i = 0; i < 3; i++)
tmp[i] = 0;
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250118192034.2124952-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The main purpose of the test is to demonstrate the lock problem for the
free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is
running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel()
will try to acquire a spin-lock (namely softirq_expiry_lock), however
the freeing procedure has already held a raw-spin-lock.
The test first creates two threads: one to start timers and the other to
free timers. The start-timers thread will start the timer and then wake
up the free-timers thread to free these timers when the starts complete.
After freeing, the free-timer thread will wake up the start-timer thread
to complete the current iteration. A loop of 10 iterations is used.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250117101816.2101857-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
... so that they can be copied with struct assignment (which generates
better code) and accessed word-by-word.
The type is union shortname_storage; it's a union of arrays of
unsigned char and unsigned long.
struct name_snapshot.inline_name turned into union shortname_storage;
users (all in fs/dcache.c) adjusted.
struct dentry.d_iname has some users outside of fs/dcache.c; to
reduce the amount of noise in commit, it is replaced with
union shortname_storage d_shortname and d_iname is turned into a macro
that expands to d_shortname.string (similar to d_lock handling).
That compat macro is temporary - most of the remaining instances will
be taken out by debugfs series, and once that is merged and few others
are taken care of this will go away.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Test that nullness elision works for common use cases. For example, we
want to check that both constant scalar spills and STACK_ZERO functions.
As well as when there's both const and non-const values of R2 leading up
to a lookup. And obviously some bound checks.
Particularly tricky are spills both smaller or larger than key size. For
smaller, we need to ensure verifier doesn't let through a potential read
into unchecked bytes. For larger, endianness comes into play, as the
native endian value tracked in the verifier may not be the bytes the
kernel would have read out of the key pointer. So check that we disallow
both.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/f1dacaa777d4516a5476162e0ea549f7c3354d73.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail)
unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors. These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.
Tests also have to be updated in sync with these changes, as the
verifier is more efficient with this change. Notable, iters.c tests had
to be changed to use a map type that still requires null checks, as it's
exercising verifier tracking logic w.r.t iterators.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/68f3ea96ff3809a87e502a11a4bd30177fc5823e.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previously, the verifier was treating all PTR_TO_STACK registers passed
to a helper call as potentially written to by the helper. However, all
calls to check_stack_range_initialized() already have precise access type
information available.
Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass
enum bpf_access_type to check_stack_range_initialized() to more
precisely track helper arguments.
One benefit from this precision is that registers tracked as valid
spills and passed as a read-only helper argument remain tracked after
the call. Rather than being marked STACK_MISC afterwards.
An additional benefit is the verifier logs are also more precise. For
this particular error, users will enjoy a slightly clearer message. See
included selftest updates for examples.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/ff885c0e5859e0cd12077c3148ff0754cad4f7ed.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
prog_tests/xdp_do_redirect.c is the only user of the BPF programs
located in progs/test_xdp_do_redirect.c and progs/test_xdp_redirect.c.
There is no need to keep both files with such close names.
Move test_xdp_redirect.c contents to test_xdp_do_redirect.c and remove
progs/test_xdp_redirect.c
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250110-xdp_redirect-v2-3-b8f3ae53e894@bootlin.com
SEC("redirect_to_111") and SEC("redirect_to_222") can't be loaded by the
__load() helper.
Rename both sections SEC("xdp") so it can be interpreted by the __load()
helper in upcoming patch.
Update the test_xdp_redirect.sh to use the program name instead of the
section name to load the BPF program.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://patch.msgid.link/20250110-xdp_redirect-v2-1-b8f3ae53e894@bootlin.com
Adding kprobe.session probe to bpf_kfunc_common_test that misses bpf
program execution due to recursion check and making sure it increases
the program missed count properly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250106175048.1443905-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZ30g5RUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISIMfSwD5AceGwL4lOliAKuD+jt5FGBbR4Wag
HBn/16BGwMwfxMEBAP1UoXeo4gs6AawT+JcvZjwfNlXwdAiyAx9zY4MtvxoL
=TcTF
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2025-01-07
We've added 7 non-merge commits during the last 32 day(s) which contain
a total of 11 files changed, 190 insertions(+), 103 deletions(-).
The main changes are:
1) Migrate the test_xdp_meta.sh BPF selftest into test_progs
framework, from Bastien Curutchet.
2) Add ability to configure head/tailroom for netkit devices,
from Daniel Borkmann.
3) Fixes and improvements to the xdp_hw_metadata selftest,
from Song Yoong Siang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Extend netkit tests to validate set {head,tail}room
netkit: Add add netkit {head,tail}room to rt_link.yaml
netkit: Allow for configuring needed_{head,tail}room
selftests/bpf: Migrate test_xdp_meta.sh into xdp_context_test_run.c
selftests/bpf: test_xdp_meta: Rename BPF sections
selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata
selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata
====================
Link: https://patch.msgid.link/20250107130908.143644-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a selftest to ensure BPF for loops within critical sections are
accepted by the verifier.
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250104202528.882482-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similarly to the previous test, we also need a test case to cover
positive offsets as well, TC is an excellent hook for this.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Zijian Zhang <zijianzhang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-5-xiyou.wangcong@gmail.com
As requested by Daniel, we need to add a selftest to cover
bpf_skb_change_tail() cases in skb_verdict. Here we test trimming,
growing and error cases, and validate its expected return values and the
expected sizes of the payload.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241213034057.246437-3-xiyou.wangcong@gmail.com
SEC("t") and SEC("x") can't be loaded by the __load() helper.
Rename these sections SEC("tc") and SEC("xdp") so they can be
interpreted by the __load() helper in upcoming patch.
Update the test_xdp_meta.sh to fit these new names.
Signed-off-by: Bastien Curutchet <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241213-xdp_meta-v2-1-634582725b90@bootlin.com
Add tests to ensure that arguments are correctly marked based on their
specified positions, and whether they get marked correctly as maybe
null. For modules, all tracepoint parameters should be marked
PTR_MAYBE_NULL by default.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL. However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].
Thus, there is a discrepancy between the reality, that raw_tp arguments can
actually be NULL, and the verifier's knowledge, that they are never NULL,
causing explicit NULL check branch to be dead code eliminated.
A previous attempt [1], i.e. the second fixed commit, was made to
simulate symbolic execution as if in most accesses, the argument is a
non-NULL raw_tp, except for conditional jumps. This tried to suppress
branch prediction while preserving compatibility, but surfaced issues
with production programs that were difficult to solve without increasing
verifier complexity. A more complete discussion of issues and fixes is
available at [2].
Fix this by maintaining an explicit list of tracepoints where the
arguments are known to be NULL, and mark the positional arguments as
PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments
are known to be ERR_PTR, and mark these arguments as scalar values to
prevent potential dereference.
Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2),
shifted by the zero-indexed argument number x 4. This can be represented
as follows:
1st arg: 0x1
2nd arg: 0x10
3rd arg: 0x100
... and so on (likewise for ERR_PTR case).
In the future, an automated pass will be used to produce such a list, or
insert __nullable annotations automatically for tracepoints. Each
compilation unit will be analyzed and results will be collated to find
whether a tracepoint pointer is definitely not null, maybe null, or an
unknown state where verifier conservatively marks it PTR_MAYBE_NULL.
A proof of concept of this tool from Eduard is available at [3].
Note that in case we don't find a specification in the raw_tp_null_args
array and the tracepoint belongs to a kernel module, we will
conservatively mark the arguments as PTR_MAYBE_NULL. This is because
unlike for in-tree modules, out-of-tree module tracepoints may pass NULL
freely to the tracepoint. We don't protect against such tracepoints
passing ERR_PTR (which is uncommon anyway), lest we mark all such
arguments as SCALAR_VALUE.
While we are it, let's adjust the test raw_tp_null to not perform
dereference of the skb->mark, as that won't be allowed anymore, and make
it more robust by using inline assembly to test the dead code
elimination behavior, which should still stay the same.
[0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
[1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
[2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
[3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
Fixes: 3f00c52393 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Fixes: cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch reverts commit
cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The
patch was well-intended and meant to be as a stop-gap fixing branch
prediction when the pointer may actually be NULL at runtime. Eventually,
it was supposed to be replaced by an automated script or compiler pass
detecting possibly NULL arguments and marking them accordingly.
However, it caused two main issues observed for production programs and
failed to preserve backwards compatibility. First, programs relied on
the verifier not exploring == NULL branch when pointer is not NULL, thus
they started failing with a 'dereference of scalar' error. Next,
allowing raw_tp arguments to be modified surfaced the warning in the
verifier that warns against reg->off when PTR_MAYBE_NULL is set.
More information, context, and discusson on both problems is available
in [0]. Overall, this approach had several shortcomings, and the fixes
would further complicate the verifier's logic, and the entire masking
scheme would have to be removed eventually anyway.
Hence, revert the patch in preparation of a better fix avoiding these
issues to replace this commit.
[0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
Reported-by: Manu Bretelle <chantra@meta.com>
Fixes: cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replace magic constants in a BTF structure initialization code by
proper macros, as is done in other similar selftests.
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241213130934.1087929-8-aspsk@isovalent.com
Ensure that performing narrow ctx loads other than size == 8 are
rejected when the argument is a pointer type.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241212092050.3204165-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Robert Morris reported the following program type which passes the
verifier in [0]:
SEC("struct_ops/bpf_cubic_init")
void BPF_PROG(bpf_cubic_init, struct sock *sk)
{
asm volatile("r2 = *(u16*)(r1 + 0)"); // verifier should demand u64
asm volatile("*(u32 *)(r2 +1504) = 0"); // 1280 in some configs
}
The second line may or may not work, but the first instruction shouldn't
pass, as it's a narrow load into the context structure of the struct ops
callback. The code falls back to btf_ctx_access to ensure correctness
and obtaining the types of pointers. Ensure that the size of the access
is correctly checked to be 8 bytes, otherwise the verifier thinks the
narrow load obtained a trusted BTF pointer and will permit loads/stores
as it sees fit.
Perform the check on size after we've verified that the load is for a
pointer field, as for scalar values narrow loads are fine. Access to
structs passed as arguments to a BPF program are also treated as
scalars, therefore no adjustment is needed in their case.
Existing verifier selftests are broken by this change, but because they
were incorrect. Verifier tests for d_path were performing narrow load
into context to obtain path pointer, had this program actually run it
would cause a crash. The same holds for verifier_btf_ctx_access tests.
[0]: https://lore.kernel.org/bpf/51338.1732985814@localhost
Fixes: 9e15db6613 ("bpf: Implement accurate raw_tp context access via BTF")
Reported-by: Robert Morris <rtm@mit.edu>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241212092050.3204165-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend changes_pkt_data tests with test cases freplacing the main
program that does not have subprograms. Try four combinations when
both main program and replacement do and do not change packet data.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241212070711.427443-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test case with a tail call done from a global sub-program. Such
tails calls should be considered as invalidating packet pointers.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tail-called programs could execute any of the helpers that invalidate
packet pointers. Hence, conservatively assume that each tail call
invalidates packet pointers.
Making the change in bpf_helper_changes_pkt_data() automatically makes
use of check_cfg() logic that computes 'changes_pkt_data' effect for
global sub-programs, such that the following program could be
rejected:
int tail_call(struct __sk_buff *sk)
{
bpf_tail_call_static(sk, &jmp_table, 0);
return 0;
}
SEC("tc")
int not_safe(struct __sk_buff *sk)
{
int *p = (void *)(long)sk->data;
... make p valid ...
tail_call(sk);
*p = 42; /* this is unsafe */
...
}
The tc_bpf2bpf.c:subprog_tc() needs change: mark it as a function that
can invalidate packet pointers. Otherwise, it can't be freplaced with
tailcall_freplace.c:entry_freplace() that does a tail call.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Try different combinations of global functions replacement:
- replace function that changes packet data with one that doesn't;
- replace function that changes packet data with one that does;
- replace function that doesn't change packet data with one that does;
- replace function that doesn't change packet data with one that doesn't;
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241210041100.1898468-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With CONFIG_KPROBES_ON_FTRACE enabled on powerpc, ftrace_location_range
returns ftrace location for bpf_fentry_test1 at offset of 4 bytes from
function entry. This is because branch to _mcount function is at offset
of 4 bytes in function profile sequence.
To fix this, add entry_offset of 4 bytes while verifying the address for
kprobe entry address of bpf_fentry_test1 in verify_perf_link_info in
selftest, when CONFIG_KPROBES_ON_FTRACE is enabled.
Disassemble of bpf_fentry_test1:
c000000000e4b080 <bpf_fentry_test1>:
c000000000e4b080: a6 02 08 7c mflr r0
c000000000e4b084: b9 e2 22 4b bl c00000000007933c <_mcount>
c000000000e4b088: 01 00 63 38 addi r3,r3,1
c000000000e4b08c: b4 07 63 7c extsw r3,r3
c000000000e4b090: 20 00 80 4e blr
When CONFIG_PPC_FTRACE_OUT_OF_LINE [1] is enabled, these function profile
sequence is moved out of line with an unconditional branch at offset 0.
So, the test works without altering the offset for
'CONFIG_KPROBES_ON_FTRACE && CONFIG_PPC_FTRACE_OUT_OF_LINE' case.
Disassemble of bpf_fentry_test1:
c000000000f95190 <bpf_fentry_test1>:
c000000000f95190: 00 00 00 60 nop
c000000000f95194: 01 00 63 38 addi r3,r3,1
c000000000f95198: b4 07 63 7c extsw r3,r3
c000000000f9519c: 20 00 80 4e blr
[1] https://lore.kernel.org/all/20241030070850.1361304-13-hbathini@linux.ibm.com/
Fixes: 23cf7aa539 ("selftests/bpf: Add selftest for fill_link_info")
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241209065720.234344-1-skb99@linux.ibm.com
The selftests build four kernel modules which use copy-pasted Makefile
targets. This is a bit messy, and doesn't scale so well when we add more
modules, so let's consolidate these rules into a single rule generated
for each module name, and move the module sources into a single
directory.
To avoid parallel builds of the different modules stepping on each
other's toes during the 'modpost' phase of the Kbuild 'make modules',
the module files should really be a grouped target. However, make only
added explicit support for grouped targets in version 4.3, which is
newer than the minimum version supported by the kernel. However, make
implicitly treats pattern matching rules with multiple targets as a
grouped target, so we can work around this by turning the rule into a
pattern matching target. We do this by replacing '.ko' with '%ko' in the
targets with subst().
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/bpf/20241204-bpf-selftests-mod-compile-v5-1-b96231134a49@redhat.com
Add a test case to verify that without CAP_PERFMON, the test now
succeeds instead of failing due to a verification error.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ensure that when CAP_PERFMON is dropped, and the verifier sees
allow_ptr_leaks as false, we are not permitted to read from a
STACK_INVALID slot. Without the fix, the test will report unexpected
success in loading.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a __caps_unpriv annotation so that tests requiring specific
capabilities while dropping the rest can conveniently specify them
during selftest declaration instead of munging with capabilities at
runtime from the testing binary.
While at it, let us convert test_verifier_mtu to use this new support
instead.
Since we do not want to include linux/capability.h, we only defined the
four main capabilities BPF subsystem deals with in bpf_misc.h for use in
tests. If the user passes a CAP_SYS_NICE or anything else that's not
defined in the header, capability parsing code will return a warning.
Also reject strtol returning 0. CAP_CHOWN = 0 but we'll never need to
use it, and strtol doesn't errno on failed conversion. Fail the test in
such a case.
The original diff for this idea is available at link [0].
[0]: https://lore.kernel.org/bpf/a1e48f5d9ae133e19adc6adf27e19d585e06bab4.camel@gmail.com
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
[ Kartikeya: rebase on bpf-next, add warn to parse_caps, convert test_verifier_mtu ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Include tests that check for rejection in erroneous cases, like
unbalanced IRQ-disabled counts, within and across subprogs, invalid IRQ
flag state or input to kfuncs, behavior upon overwriting IRQ saved state
on stack, interaction with sleepable kfuncs/helpers, global functions,
and out of order restore. Include some success scenarios as well to
demonstrate usage.
#128/1 irq/irq_save_bad_arg:OK
#128/2 irq/irq_restore_bad_arg:OK
#128/3 irq/irq_restore_missing_2:OK
#128/4 irq/irq_restore_missing_3:OK
#128/5 irq/irq_restore_missing_3_minus_2:OK
#128/6 irq/irq_restore_missing_1_subprog:OK
#128/7 irq/irq_restore_missing_2_subprog:OK
#128/8 irq/irq_restore_missing_3_subprog:OK
#128/9 irq/irq_restore_missing_3_minus_2_subprog:OK
#128/10 irq/irq_balance:OK
#128/11 irq/irq_balance_n:OK
#128/12 irq/irq_balance_subprog:OK
#128/13 irq/irq_global_subprog:OK
#128/14 irq/irq_restore_ooo:OK
#128/15 irq/irq_restore_ooo_3:OK
#128/16 irq/irq_restore_3_subprog:OK
#128/17 irq/irq_restore_4_subprog:OK
#128/18 irq/irq_restore_ooo_3_subprog:OK
#128/19 irq/irq_restore_invalid:OK
#128/20 irq/irq_save_invalid:OK
#128/21 irq/irq_restore_iter:OK
#128/22 irq/irq_save_iter:OK
#128/23 irq/irq_flag_overwrite:OK
#128/24 irq/irq_flag_overwrite_partial:OK
#128/25 irq/irq_ooo_refs_array:OK
#128/26 irq/irq_sleepable_helper:OK
#128/27 irq/irq_sleepable_kfunc:OK
#128 irq:OK
Summary: 1/27 PASSED, 0 SKIPPED, 0 FAILED
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204030400.208005-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For preemption-related kfuncs, we don't test their interaction with
sleepable kfuncs (we do test helpers) even though the verifier has
code to protect against such a pattern. Expand coverage of the selftest
to include this case.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204030400.208005-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The verifier log when leaking resources on BPF_EXIT may be a bit
confusing, as it's a problem only when finally existing from the main
prog, not from any of the subprogs. Hence, update the verifier error
string and the corresponding selftests matching on it.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204030400.208005-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii spotted that process_dynptr_func's rejection of incorrect
argument register type will print an error string where argument numbers
are not zero-indexed, unlike elsewhere in the verifier. Fix this by
subtracting 1 from regno. The same scenario exists for iterator
messages. Fix selftest error strings that match on the exact argument
number while we're at it to ensure clean bisection.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203002235.3776418-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add selftests to cover argument type check for iterator kfuncs, and
cover all three kinds (new, next, destroy). Without the fix in the
previous patch, the selftest would not cause a verifier error.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203000238.3602922-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, KF_ARG_PTR_TO_ITER handling missed checking the reg->type and
ensuring it is PTR_TO_STACK. Instead of enforcing this in the caller of
process_iter_arg, move the check into it instead so that all callers
will gain the check by default. This is similar to process_dynptr_func.
An existing selftest in verifier_bits_iter.c fails due to this change,
but it's because it was passing a NULL pointer into iter_next helper and
getting an error further down the checks, but probably meant to pass an
uninitialized iterator on the stack (as is done in the subsequent test
below it). We will gain coverage for non-PTR_TO_STACK arguments in later
patches hence just change the declaration to zero-ed stack object.
Fixes: 06accc8779 ("bpf: add support for open-coded iterator loops")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Lyu <tao.lyu@epfl.ch>
[ Kartikeya: move check into process_iter_arg, rewrite commit log ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203000238.3602922-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This verifies that programs of BPF_PROG_TYPE_CGROUP_SKB can access
skb->data_end with direct packet access when being run with
BPF_PROG_TEST_RUN.
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Link: https://lore.kernel.org/r/20241125152603.375898-2-mahe.tardy@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Trying to add udp-dedicated helpers in network_helpers involves
including some udp header, which makes multiple test_progs tests build
fail:
In file included from ./progs/test_cls_redirect.h:13,
from [...]/prog_tests/cls_redirect.c:15:
[...]/usr/include/linux/udp.h:23:8: error: redefinition of ‘struct udphdr’
23 | struct udphdr {
| ^~~~~~
In file included from ./network_helpers.h:17,
from [...]/prog_tests/cls_redirect.c:13:
[...]/usr/include/netinet/udp.h:55:8: note: originally defined here
55 | struct udphdr
| ^~~~~~
This error is due to struct udphdr being defined in both <linux/udp.h>
and <netinet/udp.h>.
Use only <netinet/udp.h> in every test. While at it, perform the same
for tcp.h. For some tests, the change needs to be done in the eBPF
program part as well, because of some headers sharing between both
sides.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-11-45b46494f937@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
CONFIG_PREEMPT is a preemtion model the so called "Low-Latency Desktop".
A different preemption model is PREEMPT_RT the so called "Real-Time".
Both implement preemption in kernel and set CONFIG_PREEMPTION.
There is also the so called "LAZY PREEMPT" which the "Scheduler
controlled preemption model". Here we have also preemption in the kernel
the rules are slightly different.
Therefore the testsuite should not check for CONFIG_PREEMPT (as one
model) but for CONFIG_PREEMPTION to figure out if preemption in the
kernel is possible.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241119161819.qvEcs-n_@linutronix.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core
----
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read
access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code
--------------------------------------------
- Add small file operations for debugfs, to reduce the struct ops size.
- Refactoring and optimization for the implementation of page_frag API,
This is a preparatory work to consolidate the page_frag
implementation.
Netfilter
---------
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users
the option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent
CI improvements.
BPF
---
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols
---------
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after close,
the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per device
neigh lists.
Driver API
----------
- Introduce a unified interface to configure transmission H/W shaping,
and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling
-----------------
- forwarding: introduce deferred commands, to simplify
the cleanup phase
Drivers
-------
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- adds support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Adds representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- adds support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implements page pool support
- macsec:
- inherit lower device's features and TSO limits when offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- Add the dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- adds clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
THcvVM+bupEZ
=GzPr
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
llvm 19 fails to compile arena self test:
CLNG-BPF [test_progs] verifier_arena_large.bpf.o
progs/verifier_arena_large.c:90:24: error: unsupported signed division, please convert to unsigned div/mod.
90 | pg_idx = (pg - base) / PAGE_SIZE;
Though llvm <= 18 and llvm >= 20 don't have this issue,
fix the test to avoid the build error.
Reported-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test that verifies specific behavior of arena range tree
algorithm and adjust existing big_alloc1 test due to use
of global data in arena.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20241108025616.17625-3-alexei.starovoitov@gmail.com
Cross-merge bpf fixes after downstream PR.
In particular to bring the fix in
commit aa30eb3260 ("bpf: Force checkpoint when jmp history is too long").
The follow up verifier work depends on it.
And the fix in
commit 6801cf7890 ("selftests/bpf: Use -4095 as the bad address for bits iterator").
It's fixing instability of BPF CI on s390 arch.
No conflicts.
Adjacent changes in:
Auto-merging arch/Kconfig
Auto-merging kernel/bpf/helpers.c
Auto-merging kernel/bpf/memalloc.c
Auto-merging kernel/bpf/verifier.c
Auto-merging mm/slab_common.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add three tests for struct_ops using private stack.
./test_progs -t struct_ops_private_stack
#336/1 struct_ops_private_stack/private_stack:OK
#336/2 struct_ops_private_stack/private_stack_fail:OK
#336/3 struct_ops_private_stack/private_stack_recur:OK
#336 struct_ops_private_stack:OK
The following is a snippet of a struct_ops check_member() implementation:
u32 moff = __btf_member_bit_offset(t, member) / 8;
switch (moff) {
case offsetof(struct bpf_testmod_ops3, test_1):
prog->aux->priv_stack_requested = true;
prog->aux->recursion_detected = test_1_recursion_detected;
fallthrough;
default:
break;
}
return 0;
The first test is with nested two different callback functions where the
first prog has more than 512 byte stack size (including subprogs) with
private stack enabled.
The second test is a negative test where the second prog has more than 512
byte stack size without private stack enabled.
The third test is the same callback function recursing itself. At run time,
the jit trampoline recursion check kicks in to prevent the recursion. The
recursion_detected() callback function is implemented by the bpf_testmod,
the following message in dmesg
bpf_testmod: oh no, recursing into test_1, recursion_misses 1
demonstrates the callback function is indeed triggered when recursion miss
happens.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163938.2225528-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Some private stack tests are added including:
- main prog only with stack size greater than BPF_PSTACK_MIN_SIZE.
- main prog only with stack size smaller than BPF_PSTACK_MIN_SIZE.
- prog with one subprog having MAX_BPF_STACK stack size and another
subprog having non-zero small stack size.
- prog with callback function.
- prog with exception in main prog or subprog.
- prog with async callback without nesting
- prog with async callback with possible nesting
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163927.2224750-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test cases to verify the following four update operations on htab of
maps don't trigger lockdep warning:
(1) add then delete
(2) add, overwrite, then delete
(3) add, then lookup_and_delete
(4) add two elements, then lookup_and_delete_batch
Test cases are added for pre-allocated and non-preallocated htab of maps
respectively.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241106063542.357743-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Adding uprobe session consumers to the consumer test,
so we get the session into the test mix.
In addition scaling down the test to have just 1 uprobe
and 1 uretprobe, otherwise the test time grows and is
unsuitable for CI even with threads.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-13-jolsa@kernel.org
Testing that the session ret_handler bypass works on single
uprobe with multiple consumers, each with different session
ignore return value.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-12-jolsa@kernel.org
Making sure kprobe.session program can return only [0,1] values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-11-jolsa@kernel.org
Making sure uprobe.session program can return only [0,1] values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-10-jolsa@kernel.org
Adding uprobe session test that verifies the cookie value is stored
properly when single uprobe-ed function is executed recursively.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-9-jolsa@kernel.org
Adding uprobe session test that verifies the cookie value
get properly propagated from entry to return program.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-8-jolsa@kernel.org
Adding uprobe session test and testing that the entry program
return value controls execution of the return probe program.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-7-jolsa@kernel.org
As reported by Byeonguk, the bad_words test in verifier_bits_iter.c
occasionally fails on s390 host. Quoting Ilya's explanation:
s390 kernel runs in a completely separate address space, there is no
user/kernel split at TASK_SIZE. The same address may be valid in both
the kernel and the user address spaces, there is no way to tell by
looking at it. The config option related to this property is
ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
Also, unfortunately, 0 is a valid address in the s390 kernel address
space.
Fix the issue by using -4095 as the bad address for bits iterator, as
suggested by Ilya. Verify that bpf_iter_bits_new() returns -EINVAL for
NULL address and -EFAULT for bad address.
Fixes: ebafc1e535 ("selftests/bpf: Add three test cases for bits_iter")
Reported-by: Byeonguk Jeong <jungbu2855@gmail.com>
Closes: https://lore.kernel.org/bpf/ZycSXwjH4UTvx-Cn@ub22/
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20241105043057.3371482-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in
raw_tp program. Without the previous fix, this selftest crashes the
kernel due to a NULL-pointer dereference. Also ensure that dead code
elimination does not kick in for checks on the pointer.
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241104171959.2938862-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL. However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].
Thus, there is a discrepancy between the reality, that raw_tp arguments
can actually be NULL, and the verifier's knowledge, that they are never
NULL, causing explicit NULL checks to be deleted, and accesses to such
pointers potentially crashing the kernel.
To fix this, mark raw_tp arguments as PTR_MAYBE_NULL, and then special
case the dereference and pointer arithmetic to permit it, and allow
passing them into helpers/kfuncs; these exceptions are made for raw_tp
programs only. Ensure that we don't do this when ref_obj_id > 0, as in
that case this is an acquired object and doesn't need such adjustment.
The reason we do mask_raw_tp_trusted_reg logic is because other will
recheck in places whether the register is a trusted_reg, and then
consider our register as untrusted when detecting the presence of the
PTR_MAYBE_NULL flag.
To allow safe dereference, we enable PROBE_MEM marking when we see loads
into trusted pointers with PTR_MAYBE_NULL.
While trusted raw_tp arguments can also be passed into helpers or kfuncs
where such broken assumption may cause issues, a future patch set will
tackle their case separately, as PTR_TO_BTF_ID (without PTR_TRUSTED) can
already be passed into helpers and causes similar problems. Thus, they
are left alone for now.
It is possible that these checks also permit passing non-raw_tp args
that are trusted PTR_TO_BTF_ID with null marking. In such a case,
allowing dereference when pointer is NULL expands allowed behavior, so
won't regress existing programs, and the case of passing these into
helpers is the same as above and will be dealt with later.
Also update the failure case in tp_btf_nullable selftest to capture the
new behavior, as the verifier will no longer cause an error when
directly dereference a raw tracepoint argument marked as __nullable.
[0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Fixes: 3f00c52393 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241104171959.2938862-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There are similar checks for covering locks, references, RCU read
sections and preempt_disable sections in 3 places in the verifer, i.e.
for tail calls, bpf_ld_[abs, ind], and exit path (for BPF_EXIT and
bpf_throw). Unify all of these into a common check_resource_leak
function to avoid code duplication.
Also update the error strings in selftests to the new ones in the same
change to ensure clean bisection.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241103225940.1408302-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZyP6TxUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISINz7QD/RTuJAzPJXPQmjdzMj7pepjnSQH4K
DnOc1soDqjJPSFkBAMlklDCZqSsFoNtNxagbyILrYQBC/MsV9jngimK46DEN
=pDzC
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-10-31
We've added 13 non-merge commits during the last 16 day(s) which contain
a total of 16 files changed, 710 insertions(+), 668 deletions(-).
The main changes are:
1) Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it, from Puranjay Mohan.
2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest
into test_progs so that it can be run in BPF CI, from Alexis Lothoré.
3) Two BPF sockmap selftest fixes, from Zijian Zhang.
4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check,
from Vincent Li.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add a selftest for bpf_csum_diff()
selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier
bpf: bpf_csum_diff: Optimize and homogenize for all archs
net: checksum: Move from32to16() to generic header
selftests/bpf: remove xdp_synproxy IP_DF check
selftests/bpf: remove test_tcp_check_syncookie
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap
selftests/bpf: Fix msg_verify_data in test_sockmap
====================
Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The new subtest runs with bpf_prog_test_run_opts() as a syscall prog.
It iterates the kmem_cache using bpf_for_each loop and count the number
of entries. Finally it checks it with the number of entries from the
regular iterator.
$ ./vmtest.sh -- ./test_progs -t kmem_cache_iter
...
#130/1 kmem_cache_iter/check_task_struct:OK
#130/2 kmem_cache_iter/check_slabinfo:OK
#130/3 kmem_cache_iter/open_coded_iter:OK
#130 kmem_cache_iter:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
Also simplify the code by using attach routine of the skeleton.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241030222819.1800667-2-namhyung@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add more test cases for bits iterator:
(1) huge word test
Verify the multiplication overflow of nr_bits in bits_iter. Without
the overflow check, when nr_words is 67108865, nr_bits becomes 64,
causing bpf_probe_read_kernel_common() to corrupt the stack.
(2) max word test
Verify correct handling of maximum nr_words value (511).
(3) bad word test
Verify early termination of bits iteration when bits iterator
initialization fails.
Also rename bits_nomem to bits_too_big to better reflect its purpose.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a selftest for the bpf_csum_diff() helper. This selftests runs the
helper in all three configurations(push, pull, and diff) and verifies
its output. The correct results have been computed by hand and by the
helper's older implementation.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-5-puranjay@kernel.org
The bpf_csum_diff() helper has been fixed to return a 16-bit value for
all archs, so now we don't need to mask the result.
This commit is basically reverting the below:
commit 6185266c5a ("selftests/bpf: Mask bpf_csum_diff() return value
to 16 bits in test_verifier")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-4-puranjay@kernel.org
Drop bpf_iter.h header which uses vmlinux.h but re-defines a bunch of
iterator structures and some of BPF constants for use in BPF iterator
selftests.
None of that is necessary when fresh vmlinux.h header is generated for
vmlinux image that matches latest selftests. So drop ugly hacks and have
a nice plain vmlinux.h usage everywhere.
We could do the same with all the kfunc __ksym redefinitions, but that
has dependency on very fresh pahole, so I'm not addressing that here.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241029203919.1948941-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In real world production websites, the IP_DF flag
is not always set for each packet from these websites.
the IP_DF flag check breaks Internet connection to
these websites for home based firewall like BPFire
when XDP synproxy program is attached to firewall
Internet facing side interface. see [0]
[0] https://github.com/vincentmli/BPFire/issues/59
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
Link: https://lore.kernel.org/r/20241025031952.1351150-1-vincent.mc.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The test added is a simplified reproducer from syzbot report [1].
If verifier does not insert checkpoint somewhere inside the loop,
verification of the program would take a very long time.
This would happen because mark_chain_precision() for register r7 would
constantly trace jump history of the loop back, processing many
iterations for each mark_chain_precision() call.
[1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241029172641.1042523-2-eddyz87@gmail.com
This patch tests the map creation failure when the map_value
has unsupported uptr. The three cases are the struct is larger
than one page, the struct is empty, and the struct is a kernel struct.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-13-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch test the following failures in syscall update_elem
1. The first update_elem(BPF_F_LOCK) should be EOPNOTSUPP. syscall.c takes
care of unpinning the uptr.
2. The second update_elem(BPF_EXIST) fails. syscall.c takes care of
unpinning the uptr.
3. The forth update_elem(BPF_NOEXIST) fails. bpf_local_storage_update
takes care of unpinning the uptr.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-11-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sure the memory of uptrs have been mapped to the kernel properly.
Also ensure the values of uptrs in the kernel haven't been copied
to userspace.
It also has the syscall update_elem/delete_elem test to test the
pin/unpin code paths.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Global variables of special types (like `struct bpf_spin_lock`) make
underlying ARRAY maps non-mmapable. To make this work with libbpf's
mmaping logic, application is expected to declare such special variables
as static, so libbpf doesn't even attempt to mmap() such ARRAYs.
test_spin_lock_fail.c didn't follow this rule, but given it relied on
this test to trigger failures, this went unnoticed, as we never got to
the step of mmap()'ing these ARRAY maps.
It is fragile and relies on specific sequence of libbpf steps, which are
an internal implementation details.
Fix the test by marking lockA and lockB as static.
Fixes: c48748aea4 ("selftests/bpf: Add failure test cases for spin lock pairing")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241023043908.3834423-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move LOAD_REJECT test cases from test_sock.c to an equivalent set of
verifier tests in progs/verifier_sock.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20241022152913.574836-3-jrife@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add testcases to test bpf_send_signal_task(). In these new test cases,
the main process triggers the BPF program and the forked process
receives the signals. The target process's signal handler receives a
cookie from the bpf program.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241016084136.10305-3-puranjay@kernel.org
Now that btf_skc_cls_ingress has the same coverage as
test_tcp_check_syncookie, remove the second one and keep the first one
as it is integrated in test_progs
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-6-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
One remaining difference between test_tcp_check_syncookie.sh and
btf_skc_cls_ingress is a small test on the mss value embedded in the
cookie generated with the eBPF helper.
Bring the corresponding test in btf_skc_cls_ingress.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-5-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
btf_skc_cls_ingress test currently checks that syncookie and
bpf_sk_assign/release helpers behave correctly in multiple scenarios,
but only with ipv6 socket.
Increase those helpers coverage by adding testing support for IPv4
sockets and IPv4/IPv6 sockets. The rework is mostly based on features
brought earlier in test_tcp_check_syncookie.sh to cover some fixes
performed on those helpers, but test_tcp_check_syncookie.sh is not
integrated in test_progs. The most notable changes linked to this are:
- some rework in the corresponding eBPF program to support both types of
traffic
- the switch from start_server to start_server_str to allow to check
some socket options
- the introduction of new subtests for ipv4 and ipv4/ipv6
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241020-syncookie-v2-4-2db240225fed@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
- Fix BPF verifier to not affect subreg_def marks in its range
propagation, from Eduard Zingerman.
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx, from Dimitar Kanaliev.
- Fix the BPF verifier's delta propagation between linked
registers under 32-bit addition, from Daniel Borkmann.
- Fix a NULL pointer dereference in BPF devmap due to missing
rxq information, from Florian Kauer.
- Fix a memory leak in bpf_core_apply, from Jiri Olsa.
- Fix an UBSAN-reported array-index-out-of-bounds in BTF
parsing for arrays of nested structs, from Hou Tao.
- Fix build ID fetching where memory areas backing the file
were created with memfd_secret, from Andrii Nakryiko.
- Fix BPF task iterator tid filtering which was incorrectly
using pid instead of tid, from Jordan Rome.
- Several fixes for BPF sockmap and BPF sockhash redirection
in combination with vsocks, from Michal Luczaj.
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered,
from Andrea Parri.
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the
possibility of an infinite BPF tailcall, from Pu Lehui.
- Fix a build warning from resolve_btfids that bpf_lsm_key_free
cannot be resolved, from Thomas Weißschuh.
- Fix a bug in kfunc BTF caching for modules where the wrong
BTF object was returned, from Toke Høiland-Jørgensen.
- Fix a BPF selftest compilation error in cgroup-related tests
with musl libc, from Tony Ambardar.
- Several fixes to BPF link info dumps to fill missing fields,
from Tyrone Wu.
- Add BPF selftests for kfuncs from multiple modules, checking
that the correct kfuncs are called, from Simon Sundberg.
- Ensure that internal and user-facing bpf_redirect flags
don't overlap, also from Toke Høiland-Jørgensen.
- Switch to use kvzmalloc to allocate BPF verifier environment,
from Rik van Riel.
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic
splat under RT, from Wander Lairson Costa.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZxK4OhUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISIOCrwEAib2kC5EEQn5+wKVE/bnZryVX2leT
YXdfItDCBU6zCYUA+wTU5hGGn9lcDUcZx72l/KZPDyPw7HdzNJ+6iR1zQqoM
=f9kv
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to not affect subreg_def marks in its range
propagation (Eduard Zingerman)
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx (Dimitar Kanaliev)
- Fix the BPF verifier's delta propagation between linked registers
under 32-bit addition (Daniel Borkmann)
- Fix a NULL pointer dereference in BPF devmap due to missing rxq
information (Florian Kauer)
- Fix a memory leak in bpf_core_apply (Jiri Olsa)
- Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
arrays of nested structs (Hou Tao)
- Fix build ID fetching where memory areas backing the file were
created with memfd_secret (Andrii Nakryiko)
- Fix BPF task iterator tid filtering which was incorrectly using pid
instead of tid (Jordan Rome)
- Several fixes for BPF sockmap and BPF sockhash redirection in
combination with vsocks (Michal Luczaj)
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
of an infinite BPF tailcall (Pu Lehui)
- Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
be resolved (Thomas Weißschuh)
- Fix a bug in kfunc BTF caching for modules where the wrong BTF object
was returned (Toke Høiland-Jørgensen)
- Fix a BPF selftest compilation error in cgroup-related tests with
musl libc (Tony Ambardar)
- Several fixes to BPF link info dumps to fill missing fields (Tyrone
Wu)
- Add BPF selftests for kfuncs from multiple modules, checking that the
correct kfuncs are called (Simon Sundberg)
- Ensure that internal and user-facing bpf_redirect flags don't overlap
(Toke Høiland-Jørgensen)
- Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
Riel)
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
under RT (Wander Lairson Costa)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
lib/buildid: Handle memfd_secret() files in build_id_parse()
selftests/bpf: Add test case for delta propagation
bpf: Fix print_reg_state's constant scalar dump
bpf: Fix incorrect delta propagation between linked registers
bpf: Properly test iter/task tid filtering
bpf: Fix iter/task tid filtering
riscv, bpf: Make BPF_CMPXCHG fully ordered
bpf, vsock: Drop static vsock_bpf_prot initialization
vsock: Update msg_count on read_skb()
vsock: Update rx_bytes on read_skb()
bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
selftests/bpf: Add asserts for netfilter link info
bpf: Fix link info netfilter flags to populate defrag flag
selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
bpf: Fix truncation bug in coerce_reg_to_size_sx()
selftests/bpf: Assert link info uprobe_multi count & path_size if unset
bpf: Fix unpopulated path_size when uprobe_multi fields unset
selftests/bpf: Fix cross-compiling urandom_read
selftests/bpf: Add test for kfunc module order
...
Add a test case to ensure that attaching a tail callee program with an
freplace program fails, and that updating an extended program to a
prog_array map is also prohibited.
This test is designed to prevent the potential infinite loop issue caused
by the combination of tail calls and freplace, ensuring the correct
behavior and stability of the system.
Additionally, fix the broken tailcalls/tailcall_freplace selftest
because an extension prog should not be tailcalled.
cd tools/testing/selftests/bpf; ./test_progs -t tailcalls
337/25 tailcalls/tailcall_freplace:OK
337/26 tailcalls/tailcall_bpf2bpf_freplace:OK
337 tailcalls:OK
Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20241015150207.70264-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds test cases for bpf_task_from_vpid() kfunc.
task_kfunc_from_vpid_no_null_check is used to test the case where
the return value is not checked for NULL pointer.
test_task_from_vpid_current is used to test obtaining the
struct task_struct of the process in the pid namespace based on vpid.
test_task_from_vpid_invalid is used to test the case of invalid vpid.
test_task_from_vpid_current and test_task_from_vpid_invalid will run
in the new namespace.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848F13435CD650AC4B7BD7099442@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test traverses all slab caches using the kmem_cache_iter and save
the data into slab_result array map. And check if current task's
pointer is from "task_struct" slab cache using bpf_get_kmem_cache().
Also compare the result array with /proc/slabinfo if available (when
CONFIG_SLUB_DEBUG is on). Note that many of the fields in the slabinfo
are transient, so it only compares the name and objsize fields.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241010232505.1339892-4-namhyung@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test for unsigned ranges after signed extension instruction. This
case isn't currently covered by existing tests in verifier_movsx.c.
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Dimitar Kanaliev <dimitar.kanaliev@siteground.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241014121155.92887-4-dimitar.kanaliev@siteground.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test that checks whether unsigned ranges deduced by the verifier for
sign extension instruction is correct. Without previous patch that
fixes truncation in coerce_reg_to_size_sx() this test fails.
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Dimitar Kanaliev <dimitar.kanaliev@siteground.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241014121155.92887-3-dimitar.kanaliev@siteground.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZw1/jBUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISIO/ZwEAuAVkRgyuC0njVV9PyT7EbZqxHjY+
10v6I6XR8vWmILABALrTIR9wTOyBVgmZzW7AUq8wiFv9FSZmhJfp1KxPdNYA
=L6hT
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-10-14
The following pull-request contains BPF updates for your *net-next* tree.
We've added 21 non-merge commits during the last 18 day(s) which contain
a total of 21 files changed, 1185 insertions(+), 127 deletions(-).
The main changes are:
1) Put xsk sockets on a struct diet and add various cleanups. Overall, this helps
to bump performance by 12% for some workloads, from Maciej Fijalkowski.
2) Extend BPF selftests to increase coverage of XDP features in combination
with BPF cpumap, from Alexis Lothoré (eBPF Foundation).
3) Extend netkit with an option to delegate skb->{mark,priority} scrubbing to
its BPF program, from Daniel Borkmann.
4) Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs,
from Mahe Tardy.
5) Extend BPF selftests covering a BPF program setting socket options per MPTCP
subflow, from Geliang Tang and Nicolas Rybowski.
bpf-next-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (21 commits)
xsk: Use xsk_buff_pool directly for cq functions
xsk: Wrap duplicated code to function
xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool
xsk: Get rid of xdp_buff_xsk::orig_addr
xsk: s/free_list_node/list_node/
xsk: Get rid of xdp_buff_xsk::xskb_list_node
selftests/bpf: check program redirect in xdp_cpumap_attach
selftests/bpf: make xdp_cpumap_attach keep redirect prog attached
selftests/bpf: fix bpf_map_redirect call for cpu map test
selftests/bpf: add tcx netns cookie tests
bpf: add get_netns_cookie helper to tc programs
selftests/bpf: add missing header include for htons
selftests/bpf: Extend netkit tests to validate skb meta data
tools: Sync if_link.h uapi tooling header
netkit: Add add netkit scrub support to rt_link.yaml
netkit: Simplify netkit mode over to use NLA_POLICY_MAX
netkit: Add option for scrubbing skb meta data
bpf: Remove unused macro
selftests/bpf: Add mptcp subflow subtest
selftests/bpf: Add getsockopt to inspect mptcp subflow
...
====================
Link: https://patch.msgid.link/20241014211110.16562-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a subprogram to BPF object file that otherwise has no entry BPF
programs to validate that libbpf can still load this correctly.
Until this was fixed, user could expect this very confusing error message:
libbpf: prog 'dangling_subprog': missing BPF prog type, check ELF section name '.text'
libbpf: prog 'dangling_subprog': failed to load: -22
libbpf: failed to load object 'struct_ops_detach'
libbpf: failed to load BPF skeleton 'struct_ops_detach': -22
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241010211731.4121837-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
xdp_cpumap_attach, in its current form, only checks that an xdp cpumap
program can be executed, but not that it performs correctly the cpu
redirect as configured by userspace (bpf_prog_test_run_opts will return
success even if the redirect program returns an error)
Add a check to ensure that the program performs the configured redirect
as well. The check is based on a global variable incremented by a
chained program executed only if the redirect program properly executes.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241009-convert_xdp_tests-v3-3-51cea913710c@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
xdp_redir_prog currently redirects packets based on the entry at index 1
in cpu_map, but the corresponding test only manipulates the entry at
index 0. This does not really affect the test in its current form since
the program is detached before having the opportunity to execute, but it
needs to be fixed before being able improve the corresponding test (ie,
not only test attach/detach but also the redirect feature)
Fix this XDP program by making it redirect packets based on entry 0 in
cpu_map instead of entry 1.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20241009-convert_xdp_tests-v3-1-51cea913710c@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add a test case for kfuncs from multiple external modules, checking
that the correct kfuncs are called regardless of which order they're
called in. Specifically, check that calling the kfuncs in an order
different from the one the modules' BTF are loaded in works.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20241010-fix-kfunc-btf-caching-for-modules-v2-3-745af6c1af98@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add three success test cases to test the flattening of array of nested
struct. For these three tests, the number of special fields in map is
BTF_FIELDS_MAX, but the array is defined in structs with different
nested level.
Add one failure test case for the flattening as well. In the test case,
the number of special fields in map is BTF_FIELDS_MAX + 1. It will make
btf_parse_fields() in map_create() return -E2BIG, the creation of map
will succeed, but the load of program will fail because the btf_record
is invalid for the map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241008071114.3718177-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add netns cookie test that verifies the helper is now supported and work
in the context of tc programs.
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Link: https://lore.kernel.org/r/20241007095958.97442-2-mahe.tardy@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Previously when multiple BPF object files referencing the same extern
function (usually kfunc) are statically linked using `bpftool gen
object`, libbpf tries to get the nonexistent size of BTF_KIND_FUNC_PROTO
and fails. This test ensures it is fixed.
Signed-off-by: Eric Long <i@hack3r.moe>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241002-libbpf-dup-extern-funcs-v4-2-560eb460ff90@hack3r.moe