Refactor the nested TSC scaling test's check on a stable system TSC to
use TEST_REQUIRE() to do the heavy lifting when the system doesn't have
a stable TSC. Using a helper+TEST_REQUIRE() eliminates the need for
gotos and a custom message.
Cc: Hao Ge <gehao@kylinos.cn>
Cc: Vipin Sharma <vipinsh@google.com>
Link: https://lore.kernel.org/r/20230406001724.706668-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add two selftests where map creation key/value type_id's are
decl_tags. Without previous patch, kernel warnings will
appear similar to the one in the previous patch. With the previous
patch, both kernel warnings are silenced.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230530205034.266643-1-yhs@fb.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
CXL PMU devices can be found from entries in the Register
Locator DVSEC.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230526095824.16336-4-Jonathan.Cameron@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 259a834fad ("selftests: mptcp: functional tests for the userspace PM type")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: dc65fe82fb ("selftests: mptcp: add packet mark test case")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf2410 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: eedbc68532 ("selftests: add PM netlink functional tests")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped". Note that this check can also
mark the test as failed if 'SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES' env
var is set to 1: by doing that, we can make sure a test is not being
skipped by mistake.
A new shared file is added here to be able to re-used the same check in
the different selftests we have.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
BusyBox's 'cmp' command doesn't support the '--bytes' parameter.
Some CIs -- i.e. LKFT -- use BusyBox and have the mptcp_join.sh test
failing [1] because their 'cmp' command doesn't support this '--bytes'
option:
cmp: unrecognized option '--bytes=1024'
BusyBox v1.35.0 () multi-call binary.
Usage: cmp [-ls] [-n NUM] FILE1 [FILE2]
Instead, 'head --bytes' can be used as this option is supported by
BusyBox. A temporary file is needed for this operation.
Because it is apparently quite common to use BusyBox, it is certainly
better to backport this fix to impacted kernels.
Fixes: 6bf41020b7 ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Link: https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc5-5-g148341f0a2f5/testrun/16088933/suite/kselftest-net-mptcp/test/net_mptcp_userspace_pm_sh/log [1]
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
User events:
- Use long instead of int for storing the enable set/clear bit, as it was
found that big endian machines could end up using the wrong bits.
- Split allocating mm and attaching it. This keeps the allocation separate
from the registration and avoids various races.
- Remove RCU locking around pin_user_pages_remote() as that can schedule. The
RCU protection is no longer needed with the above split of mm allocation and
attaching.
- Rename the "link" fields of the various structs to something more
meaningful.
- Add comments around user_event_mm struct usage and locking requirements.
Timerlat tracer:
- Fix missed wakeup of timerlat thread caused by the timerlat interrupt
triggering when tracing is off. The timer interrupt handler needs to always
wake up the timerlat thread regardless if tracing is enabled or not,
otherwise, it will never wake up.
Histograms:
- Fix regression of breaking the "stacktrace" modifier for variables. That
modifier cannot be used for values, but can be used for variables that are
passed from one histogram to the next. This was broken when adding the
restriction to values as the variable logic used the same code.
- Rename the special field "stacktrace" to "common_stacktrace". Special fields
(that are not actually part of the event, but can act just like event
fields, like 'comm' and 'timestamp') should be prefixed with 'common_' for
consistency. To keep backward compatibility, 'stacktrace' can still be used
(as with the special field 'cpu'), but can be overridden if the event has a
field called 'stacktrace'.
- Update the synthetic event selftests to use the new name (synthetic events
are created by histograms)
Tracing bootup selftests:
- Reorganize the code to keep artifacts of the selftests not compiled in when
selftests are not configured.
- Add various cond_resched() around the selftest code, as the softlock
watchdog was triggering much more often. It appears that the kernel runs
slower now with full debugging enabled.
- While debugging ftrace with ftrace (using an instance ring buffer instead of
the top level one), I found that the selftests were disabling prints to the
debug instance. This should not happen, as the selftests only disable
printing to the main buffer as the selftests examine the main buffer to see
if it has what it expects, and prints can make the tests fail. Make the
selftests only disable printing to the toplevel buffer, and leave the
instance buffers alone.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZHQGJBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qu6hAQCJ1WebZUTJ/s7pFo36mXirLnrW4afB
Ua6sALseqKNesgEAyhLmd2+sMeqmAbCCIUWtcWJb/Pod0jGOt0U8+cBxfw8=
=PhaX
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"User events:
- Use long instead of int for storing the enable set/clear bit, as it
was found that big endian machines could end up using the wrong
bits.
- Split allocating mm and attaching it. This keeps the allocation
separate from the registration and avoids various races.
- Remove RCU locking around pin_user_pages_remote() as that can
schedule. The RCU protection is no longer needed with the above
split of mm allocation and attaching.
- Rename the "link" fields of the various structs to something more
meaningful.
- Add comments around user_event_mm struct usage and locking
requirements.
Timerlat tracer:
- Fix missed wakeup of timerlat thread caused by the timerlat
interrupt triggering when tracing is off. The timer interrupt
handler needs to always wake up the timerlat thread regardless if
tracing is enabled or not, otherwise, it will never wake up.
Histograms:
- Fix regression of breaking the "stacktrace" modifier for variables.
That modifier cannot be used for values, but can be used for
variables that are passed from one histogram to the next. This was
broken when adding the restriction to values as the variable logic
used the same code.
- Rename the special field "stacktrace" to "common_stacktrace".
Special fields (that are not actually part of the event, but can
act just like event fields, like 'comm' and 'timestamp') should be
prefixed with 'common_' for consistency. To keep backward
compatibility, 'stacktrace' can still be used (as with the special
field 'cpu'), but can be overridden if the event has a field called
'stacktrace'.
- Update the synthetic event selftests to use the new name (synthetic
events are created by histograms)
Tracing bootup selftests:
- Reorganize the code to keep artifacts of the selftests not compiled
in when selftests are not configured.
- Add various cond_resched() around the selftest code, as the
softlock watchdog was triggering much more often. It appears that
the kernel runs slower now with full debugging enabled.
- While debugging ftrace with ftrace (using an instance ring buffer
instead of the top level one), I found that the selftests were
disabling prints to the debug instance.
This should not happen, as the selftests only disable printing to
the main buffer as the selftests examine the main buffer to see if
it has what it expects, and prints can make the tests fail.
Make the selftests only disable printing to the toplevel buffer,
and leave the instance buffers alone"
* tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have function_graph selftest call cond_resched()
tracing: Only make selftest conditionals affect the global_trace
tracing: Make tracing_selftest_running/delete nops when not used
tracing: Have tracer selftests call cond_resched() before running
tracing: Move setting of tracing_selftest_running out of register_tracer()
tracing/selftests: Update synthetic event selftest to use common_stacktrace
tracing: Rename stacktrace field to common_stacktrace
tracing/histograms: Allow variables to have some modifiers
tracing/user_events: Document user_event_mm one-shot list usage
tracing/user_events: Rename link fields for clarity
tracing/user_events: Remove RCU lock while pinning pages
tracing/user_events: Split up mm alloc and attach
tracing/timerlat: Always wakeup the timerlat thread
tracing/user_events: Use long vs int for atomic bit ops
- Stop trusting capacity data before the "media ready" indication
- Add missing HDM decoder capability enable for the cold-plug case
- Fix a debug message induced crash
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZHFLXgAKCRDfioYZHlFs
Z2XxAQDaJl6CVwalMooLpnDN15eiGH2CFDRec9FK62ZH+m27CgD/Zx5QMZUp6YBC
b9A3Tmx1MzIekO1CYRq9vi4ybXMCAAg=
=cmO2
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link fixes from Dan Williams:
"The 'media ready' series prevents the driver from acting on bad
capacity information, and it moves some checks earlier in the init
sequence which impacts topics in the queue for 6.5.
Additional hotplug testing uncovered a missing enable for memory
decode. A debug crash fix is also included.
Summary:
- Stop trusting capacity data before the "media ready" indication
- Add missing HDM decoder capability enable for the cold-plug case
- Fix a debug message induced crash"
* tag 'cxl-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Explicitly initialize resources when media is not ready
cxl/port: Fix NULL pointer access in devm_cxl_add_port()
cxl: Move cxl_await_media_ready() to before capacity info retrieval
cxl: Wait Memory_Info_Valid before access memory related info
cxl/port: Enable the HDM decoder capability for switch ports
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZHEm+wAKCRDbK58LschI
gyIKAQCqO7B4sIu8hYVxBTwfHV2tIuXSMSCV4P9e78NUOPcO2QEAvLP/WVSjB0Bm
vpyTKKM22SpZvPe/jSp52j6t20N+qAc=
=HFxD
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-05-26
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 76 files changed, 2729 insertions(+), 1003 deletions(-).
The main changes are:
1) Add the capability to destroy sockets in BPF through a new kfunc,
from Aditi Ghag.
2) Support O_PATH fds in BPF_OBJ_PIN and BPF_OBJ_GET commands,
from Andrii Nakryiko.
3) Add capability for libbpf to resize datasec maps when backed via mmap,
from JP Kobryn.
4) Move all the test kfuncs for CI out of the kernel and into bpf_testmod,
from Jiri Olsa.
5) Big batch of xsk selftest improvements to prep for multi-buffer testing,
from Magnus Karlsson.
6) Show the target_{obj,btf}_id in tracing link's fdinfo and dump it
via bpftool, from Yafang Shao.
7) Various misc BPF selftest improvements to work with upcoming LLVM 17,
from Yonghong Song.
8) Extend bpftool to specify netdevice for resolving XDP hints,
from Larysa Zaremba.
9) Document masking in shift operations for the insn set document,
from Dave Thaler.
10) Extend BPF selftests to check xdp_feature support for bond driver,
from Lorenzo Bianconi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
bpf: Fix bad unlock balance on freeze_mutex
libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()
libbpf: Ensure libbpf always opens files with O_CLOEXEC
selftests/bpf: Check whether to run selftest
libbpf: Change var type in datasec resize func
bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
libbpf: Selftests for resizing datasec maps
libbpf: Add capability for resizing datasec maps
selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests
libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd
bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
libbpf: Start v1.3 development cycle
bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM
bpftool: Specify XDP Hints ifname when loading program
selftests/bpf: Add xdp_feature selftest for bond device
selftests/bpf: Test bpf_sock_destroy
selftests/bpf: Add helper to get port using getsockname
bpf: Add bpf_sock_destroy kfunc
bpf: Add kfunc filter function to 'struct btf_kfunc_id_set'
bpf: udp: Implement batching for sockets iterator
...
====================
Link: https://lore.kernel.org/r/20230526222747.17775-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- fix incorrect output in in-tree gpio tools
- fix a shell coding issue in gpio-sim selftests
- correctly set the permissions for debugfs attributes exposed by gpio-mockup
- fix chip name and pin count in gpio-f7188x for one of the supported models
- fix numberspace pollution when using dynamically and statically allocated
GPIOs together
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmRws3YACgkQEacuoBRx
13JHlhAAkyUNjtvkcTdO9buB2AXB3e/JH31MgSEEzoAi9kr54n70D+mtmSz5iX0X
4a1KjZrd5Gd7nSGA3AHll7jxL1Rfa4vJsTID/khfhIPB0arGl7brZ0J2fItRUGqz
oxAvpH3OpSWI+QWQMzp/NEZV0F5rfWBme93w6OqYItzGNr4LfP+3x49zPDUngOjp
d3EsDzQbgxezYvrmWjPvZuZvk3RYud+dfU8bjvtnI6TJZqIxzePmcqEAFgqsnDJF
VLEJQFPGpLaX7U+ZepcAQAs2s5ggk6Mb0JdGKp1w9hQ9w91TClBUgUURG6hzdLmZ
w6/X3iGaFGaHugA4DmWhYd4FWE7H58dWazw/RVK4RM2ZGZ9cT8R8OVQlDsBhesyG
tOTY4RJlU1DgMt66nlUvf8r3ECdm2ayvFq7qSN9xRDQ0W4Ti3+QU2+/re3s8jevR
VYAo1BHtVai28+lCMjopK+fYBe8G4DqsGx9Uh3y0RxYIFvMKdo4EQF9Ku3yTEcq3
3FNFO00KU5ktKgnx7z6bnO7+F3OvHUeSRh44U8O6pm/odfdkUn7CHjFVi2dM6yO5
763UKwMZq9rDT2owdwljolIRECb8IAN0siBhhHYYW4oAFTD2w2vTQ/y2KYrxbntv
CptwauQMM9vjzg9bxMP+4R5YVZhZhtJ4I1oa2kQx7eXIPBy/6OM=
=R4dV
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix incorrect output in in-tree gpio tools
- fix a shell coding issue in gpio-sim selftests
- correctly set the permissions for debugfs attributes exposed by
gpio-mockup
- fix chip name and pin count in gpio-f7188x for one of the supported
models
- fix numberspace pollution when using dynamically and statically
allocated GPIOs together
* tag 'gpio-fixes-for-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio-f7188x: fix chip name and pin count on Nuvoton chip
gpiolib: fix allocation of mixed dynamic/static GPIOs
gpio: mockup: Fix mode of debugfs files
selftests: gpio: gpio-sim: Fix BUG: test FAILED due to recent change
tools: gpio: fix debounce_period_us output of lsgpio
The arch_get_kallsym() function was introduced so that x86 could override
it, but that override was removed in bf904d2762 ("x86/pti/64: Remove
the SYSCALL64 entry trampoline"), so now this does nothing except causing
a warning about a missing prototype:
kernel/kallsyms.c:662:12: error: no previous prototype for 'arch_get_kallsym' [-Werror=missing-prototypes]
662 | int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
Restore the old behavior before d83212d5dd ("kallsyms, x86: Export
addresses of PTI entry trampolines") to simplify the code and avoid
the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
[mcgrof: fold in bpf selftest fix]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
There was a report that the hardware breakpoints and watch points weren't
reporting the debug architecture version as expected, they were reporting
a version of 0 which is not defined in the architecture. This happens
when running in a KVM guest if the host has a debug architecture version
not supported by KVM, it in turn confuses GDB which rejects any debug
architecture version it does not know about.
Add a test that covers that situation and while we're at it reports the
debug architecture version and number of slots available to aid with
figuring out problems that may arise.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230414-arm64-test-hw-breakpoint-v2-1-90a19e3b1059@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The sockopt test invokes test__start_subtest and then unconditionally
asserts the success. That means that even if deny-listed, any test will
still run and potentially fail.
Evaluate the return value of test__start_subtest() to achieve the
desired behavior, as other tests do.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230525232248.640465-1-deso@posteo.net
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZG4AiAAKCRDbK58LschI
g+xlAQCmefGbDuwPckZLnomvt6gl4bkIjs7kc1ySbG9QBnaInwD/WyrJaQIPijuD
qziHPAyx+MEgPseFU1b7Le35SZ66IwM=
=s4R1
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-05-24
We've added 19 non-merge commits during the last 10 day(s) which contain
a total of 20 files changed, 738 insertions(+), 448 deletions(-).
The main changes are:
1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
from John Fastabend.
2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
from Anton Protopopov.
3) Init the BPF offload table earlier than just late_initcall,
from Jakub Kicinski.
4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
from Will Deacon.
5) Remove a now unsupported __fallthrough in BPF samples,
from Andrii Nakryiko.
6) Fix a typo in pkg-config call for building sign-file,
from Jeremy Sowden.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Test progs verifier error with latest clang
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
bpf, sockmap: Build helper to create connected socket pair
bpf, sockmap: Pull socket helpers out of listen test for general use
bpf, sockmap: Incorrectly handling copied_seq
bpf, sockmap: Wake up polling after data copy
bpf, sockmap: TCP data stall on recv before accept
bpf, sockmap: Handle fin correctly
bpf, sockmap: Improved check for empty queue
bpf, sockmap: Reschedule is now done through backlog
bpf, sockmap: Convert schedule_work into delayed_work
bpf, sockmap: Pass skb ownership through read_skb
bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
samples/bpf: Drop unnecessary fallthrough
bpf: netdev: init the offload table earlier
selftests/bpf: Fix pkg-config call building sign-file
====================
Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds test coverage for resizing datasec maps. The first two
subtests resize the bss and custom data sections. In both cases, an
initial array (of length one) has its element set to one. After resizing
the rest of the array is filled with ones as well. A BPF program is then
run to sum the respective arrays and back on the userspace side the sum
is checked to be equal to the number of elements.
The third subtest attempts to perform resizing under conditions that
will result in either the resize failing or the BTF info being cleared.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230524004537.18614-3-inwardvessel@gmail.com
This test is aimed at verifying the memblock_alloc_node() to work as
expected, so setting the correct NUMA node for the new allocated
region. The memblock_alloc_node() is called directly without using any
stub. The core check is between the requested NUMA node and the `nid`
field inside the memblock_region structure. These two are supposed to
be equal for the test to succeed.
Signed-off-by: Claudio Migliorelli <claudio.migliorelli@mail.polimi.it>
Link: https://lore.kernel.org/r/ea5e938e-6b74-b188-af59-4b94b18bc0@mail.polimi.it
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
With the rename of the stacktrace field to common_stacktrace, update the
selftests to reflect this change. Copy the current selftest to test the
backward compatibility "stacktrace" keyword. Also the "requires" of that
test was incorrect, so it would never actually ran before. That is fixed
now.
Link: https://lore.kernel.org/linux-trace-kernel/20230523225402.55951f2f@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add a selftest demonstrating using detach-mounted BPF FS using new mount
APIs, and pinning and getting BPF map using such mount. This
demonstrates how something like container manager could setup BPF FS,
pin and adjust all the necessary objects in it, all before exposing BPF
FS to a particular mount namespace.
Also add a few subtests validating all meaningful combinations of
path_fd and pathname. We use mounted /sys/fs/bpf location for these.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230523170013.728457-5-andrii@kernel.org
When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.
Original patch series broke this resulting in
test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
After updated patch with fix.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-14-john.fastabend@gmail.com
A bug was reported where ioctl(FIONREAD) returned zero even though the
socket with a SK_SKB verdict program attached had bytes in the msg
queue. The result is programs may hang or more likely try to recover,
but use suboptimal buffer sizes.
Add a test to check that ioctl(FIONREAD) returns the correct number of
bytes.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-13-john.fastabend@gmail.com
When session gracefully shutdowns epoll needs to wake up and any recv()
readers should return 0 not the -EAGAIN they previously returned.
Note we use epoll instead of select to test the epoll wake on shutdown
event as well.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-12-john.fastabend@gmail.com
A common operation for testing is to spin up a pair of sockets that are
connected. Then we can use these to run specific tests that need to
send data, check BPF programs and so on.
The sockmap_listen programs already have this logic lets move it into
the new sockmap_helpers header file for general use.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-11-john.fastabend@gmail.com
No functional change here we merely pull the helpers in sockmap_listen.c
into a header file so we can use these in other programs. The tests we
are about to add aren't really _listen tests so doesn't make sense
to add them here.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-10-john.fastabend@gmail.com
The test cases for destroying sockets mirror the intended usages of the
bpf_sock_destroy kfunc using iterators.
The destroy helpers set `ECONNABORTED` error code that we can validate
in the test code with client sockets. But UDP sockets have an overriding
error code from `disconnect()` called during abort, so the error code
validation is only done for TCP sockets.
The failure test cases validate that the `bpf_sock_destroy` kfunc is not
allowed from program attach types other than BPF trace iterator, and
such programs fail to load.
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-10-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The helper will be used to programmatically retrieve
and pass ports in userspace and kernel selftest programs.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-9-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()
Tests passed: 201
Tests failed: 0
Cannot remove namespace file "/run/netns/ns1": No such file or directory
This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.
Redirect the error message to /dev/null to mute it.
V2: Update commit message and fixes tag.
V3: resubmit due to missing netdev ML in V2
Fixes: b60417a9f2 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a test case fails, the mptcp_join.sh script can dump the
netns MIBs multiple times, leading to confusing output.
Let's dump such info only once per test-case, when needed.
This additionally allow removing some code duplication.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of duplicating the all existing TX check with
the TX side, add the new ones on selected test cases.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently we don't track explicitly a few events related to address
management suboption handling; this patch adds new mibs for ADD_ADDR
and RM_ADDR options tx and for missed tx events due to internal storage
exhaustion.
The self-tests must be updated to properly handle different mibs with
the same/shared prefix.
Additionally removes a couple of warning tracking the loss event.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/378
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information. Allow cxl_pci_probe() to succeed
even if media is not ready. Cache the media failure in cxlds and don't ask
the device for any media information.
The rationale for proceeding in the !media_ready case is to allow for
mailbox operations to interrogate and/or remediate the device. After
media is repaired then rebinding the cxl_pci driver is expected to
restart the capacity scan.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Fixes: b39cb1052a ("cxl/mem: Register CXL memX devices")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168445310026.3251520.8124296540679268206.stgit@djiang5-mobl3
[djbw: fixup cxl_test]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Conflicts:
drivers/net/ethernet/freescale/fec_main.c
6ead9c98ca ("net: fec: remove the xdp_return_frame when lack of tx BDs")
144470c88c ("net: fec: using the standard return codes when xdp xmit errors")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Derick noticed, when testing hot plug, that hot-add behaves nominally
after a removal. However, if the hot-add is done without a prior
removal, CXL.mem accesses fail. It turns out that the original
implementation of the port driver and region programming wrongly assumed
that platform-firmware always enables the host-bridge HDM decoder
capability. Add support turning on switch-level HDM decoders in the case
where platform-firmware has not.
The implementation is careful to only arrange for the enable to be
undone if the current instance of the driver was the one that did the
enable. This is to interoperate with platform-firmware that may expect
CXL.mem to remain active after the driver is shutdown. This comes at the
cost of potentially not shutting down the enable on kexec flows, but it
is mitigated by the fact that the related HDM decoders still need to be
enabled on an individual basis.
Cc: <stable@vger.kernel.org>
Reported-by: Derick Marks <derick.w.marks@intel.com>
Fixes: 54cdbf845c ("cxl/port: Add a driver for 'struct cxl_port' objects")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/168437998331.403037.15719879757678389217.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
bluetooth and netfilter.
Current release - regressions:
- ipv6: fix RCU splat in ipv6_route_seq_show()
- wifi: iwlwifi: disable RFI feature
Previous releases - regressions:
- tcp: fix possible sk_priority leak in tcp_v4_send_reset()
- tipc: do not update mtu if msg_max is too small in mtu negotiation
- netfilter: fix null deref on element insertion
- devlink: change per-devlink netdev notifier to static one
- phylink: fix ksettings_set() ethtool call
- wifi: mac80211: fortify the spinlock against deadlock by interrupt
- wifi: brcmfmac: check for probe() id argument being NULL
- eth: ice:
- fix undersized tx_flags variable
- fix ice VF reset during iavf initialization
- eth: hns3: fix sending pfc frames after reset issue
Previous releases - always broken:
- xfrm: release all offloaded policy memory
- nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
- vsock: avoid to close connected socket after the timeout
- dsa: rzn1-a5psw: enable management frames for CPU port
- eth: virtio_net: fix error unwinding of XDP initialization
- eth: tun: fix memory leak for detached NAPI queue.
Misc:
- MAINTAINERS: sctp: move Neil to CREDITS
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRmEHoSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkXu0P/AnSeVtu2CYZSjyCQYvkKpdpbDOSlsee
GOnG0jduOdJ+OabYyM6prg9JHp1t3QxxTVJs+Spc7Eh9EX+YHRwK5DNPhv3GQ7RI
pSwQxwiEhLVXVjaEtqUo1Wf8JgCQJ+ThisSIDgQRnaHKQnrRIlbZRngwn6TwNFba
kxBpCUn2RZcZZhL8xdVF4UbSbEeLEupN76rHiePBVKNG70QVRptX3Rd6EV6FxmTa
EzAtfNh1r0p3BHW1YYgsbpy7PeXhKGhlVLqIld8h9r/y4hATsrQ2f7Bv0RNrVBDf
f8r1bdG5E0K3V8AzFPpyOe304G3GAwV+V/wtvA3GRjiwmPUwzJOzaNnSgJZZ/sbq
mnR4pCwJ4gDGnDxLa8hBh6+emQGB6LJJX0JTQY7vMPNmUwtIuEQc6tLxJ4DpXTzW
psEndQPDA7yR/7pNoE7ax+8CKCxPvfiBRnV9sxzmPV6FcxWtzJeLQihOuOA4IB8i
Ddhq2OYH+HCodTNOLWNyMSjk65O7Whee1O/YGiVW9+iUbKBUSBFatJ9fJjH4bXMT
VRZZnhlFGGIMuZXhkL5+a4ZnomqfPRXNGJ/QBbB4Ty7CXr84mXb0SX5gW4qsLOBA
YEuxiqD8Oej2Mrid4ypF5GmwBYLAf4CCZajGVii0yyz2hp69RvlQ0c5lA0WisQu3
wvY2HDFGBsa+
=PfWP
-----END PGP SIGNATURE-----
Merge tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm, bluetooth and netfilter.
Current release - regressions:
- ipv6: fix RCU splat in ipv6_route_seq_show()
- wifi: iwlwifi: disable RFI feature
Previous releases - regressions:
- tcp: fix possible sk_priority leak in tcp_v4_send_reset()
- tipc: do not update mtu if msg_max is too small in mtu negotiation
- netfilter: fix null deref on element insertion
- devlink: change per-devlink netdev notifier to static one
- phylink: fix ksettings_set() ethtool call
- wifi: mac80211: fortify the spinlock against deadlock by interrupt
- wifi: brcmfmac: check for probe() id argument being NULL
- eth: ice:
- fix undersized tx_flags variable
- fix ice VF reset during iavf initialization
- eth: hns3: fix sending pfc frames after reset issue
Previous releases - always broken:
- xfrm: release all offloaded policy memory
- nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
- vsock: avoid to close connected socket after the timeout
- dsa: rzn1-a5psw: enable management frames for CPU port
- eth: virtio_net: fix error unwinding of XDP initialization
- eth: tun: fix memory leak for detached NAPI queue.
Misc:
- MAINTAINERS: sctp: move Neil to CREDITS"
* tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
MAINTAINERS: skip CCing netdev for Bluetooth patches
mdio_bus: unhide mdio_bus_init prototype
bridge: always declare tunnel functions
atm: hide unused procfs functions
net: isa: include net/Space.h
Revert "ARM: dts: stm32: add CAN support on stm32f746"
netfilter: nft_set_rbtree: fix null deref on element insertion
netfilter: nf_tables: fix nft_trans type confusion
netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
net: wwan: t7xx: Ensure init is completed before system sleep
net: selftests: Fix optstring
net: pcs: xpcs: fix C73 AN not getting enabled
net: wwan: iosm: fix NULL pointer dereference when removing device
vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()
mailmap: add entries for Nikolay Aleksandrov
igb: fix bit_shift to be in [1..8] range
net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset
cassini: Fix a memory leak in the error handling path of cas_init_one()
tun: Fix memory leak for detached NAPI queue.
can: kvaser_pciefd: Disable interrupts in probe error path
...
This Kselftest fixes update for Linux 6.4-rc3 consists of:
- sgx test fix for false negatives.
- ftrace output is hard to parses and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRlDzgACgkQCwJExA0N
Qxzlfw//fc+Uq5hC2KUeVOiIAPYejEZ3+ToY+OUjaKvykWndMpYys5bCB68qERm8
vtgzYOKm04ovvnyh2jXzxYTjd8439qeRxr0Eybqc5HjiVDhyA1jm/pcsy3EDBf1+
5QGjgIN2fK9rdv7TmyMHL8pSxLvYQfe2zMjLtULuHZWtjzIY98hh3zjA9mdnn3pK
f2zqEJzu2lysTiwkhDF/yspOa1TymVp1W6dCVvfcxr5wJOWYtZaZyOTYNiCI3gX6
9Qy2fm4txPcsooNrwSEekXUIV669fet9vbQX5PF3pRDipxydarAAUUbTm1hOP4X+
3YnKk9dx/Pg483Go/sGMhtf+/P3ZJlWj2YZSro+maULt8O1Jb8TGe1MB7myheiFc
CiXj5duiG+WlwteF8WMsm58vjplgbnFAfyVZ+mumRhE1e4qLtjnc5YxSdNYoOHOF
40wI1Nebfwofh/qW7/ipn5tx7bN29UUSXRd26q1T4qF3KwKYaaj720lcisS7kBR0
gK8FFCdqroKilJ/Fj3e+AM3xqmlJxornj7EhQoAVtBZCsjN90Wyaw2gaCceABp+f
lDAisdO00v/+fpVhWFjXy4Mww5t6MDsRxmE5gLGOS1QHZUV9/mV7H9e9gd9ix6mV
GXYAeytvRrQQBCDEbOz1ax8SS3eA2MFVdbmMxHYn/gWVx6ClpL4=
=DOlj
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- sgx test fix for false negatives
- ftrace output is hard to parses and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner
* tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Improve integration with kselftest runner
selftests/sgx: Add "test_encl.elf" to TEST_FILES
Currently kernel kfunc bpf_dynptr_is_rdonly() has prototype ...
__bpf_kfunc bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
... while selftests bpf_kfuncs.h has:
extern int bpf_dynptr_is_rdonly(const struct bpf_dynptr *ptr) __ksym;
Such a mismatch might cause problems although currently it is okay in
selftests. Fix it to prevent future potential surprise.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230517040409.4024618-1-yhs@fb.com
With latest llvm17, dynptr/test_dynptr_is_null subtest failed in my testing
VM. The failure log looks like below:
All error logs:
tester_init:PASS:tester_log_buf 0 nsec
process_subtest:PASS:obj_open_mem 0 nsec
process_subtest:PASS:Can't alloc specs array 0 nsec
verify_success:PASS:dynptr_success__open 0 nsec
verify_success:PASS:bpf_object__find_program_by_name 0 nsec
verify_success:PASS:dynptr_success__load 0 nsec
verify_success:PASS:bpf_program__attach 0 nsec
verify_success:FAIL:err unexpected err: actual 4 != expected 0
#65/9 dynptr/test_dynptr_is_null:FAIL
The error happens for bpf prog test_dynptr_is_null in dynptr_success.c:
if (bpf_dynptr_is_null(&ptr2)) {
err = 4;
goto exit;
}
The bpf_dynptr_is_null(&ptr) unexpectedly returned a non-zero value and
the control went to the error path. Digging further, I found the root cause
is due to function signature difference between kernel and user space.
In kernel, we have ...
__bpf_kfunc bool bpf_dynptr_is_null(struct bpf_dynptr_kern *ptr)
... while in bpf_kfuncs.h we have:
extern int bpf_dynptr_is_null(const struct bpf_dynptr *ptr) __ksym;
The kernel bpf_dynptr_is_null disasm code:
ffffffff812f1a90 <bpf_dynptr_is_null>:
ffffffff812f1a90: f3 0f 1e fa endbr64
ffffffff812f1a94: 0f 1f 44 00 00 nopl (%rax,%rax)
ffffffff812f1a99: 53 pushq %rbx
ffffffff812f1a9a: 48 89 fb movq %rdi, %rbx
ffffffff812f1a9d: e8 ae 29 17 00 callq 0xffffffff81464450 <__asan_load8_noabort>
ffffffff812f1aa2: 48 83 3b 00 cmpq $0x0, (%rbx)
ffffffff812f1aa6: 0f 94 c0 sete %al
ffffffff812f1aa9: 5b popq %rbx
ffffffff812f1aaa: c3 retq
Note that only 1-byte register %al is set and the other 7-bytes are not
touched. In bpf program, the asm code for the above bpf_dynptr_is_null(&ptr2):
266: 85 10 00 00 ff ff ff ff call -0x1
267: b4 01 00 00 04 00 00 00 w1 = 0x4
268: 16 00 03 00 00 00 00 00 if w0 == 0x0 goto +0x3 <LBB9_8>
Basically, 4-byte subregister is tested. This might cause error as the value
other than the lowest byte might not be 0.
This patch fixed the issue by using the identical func prototype across kernel
and selftest user space. The fixed bpf asm code:
267: 85 10 00 00 ff ff ff ff call -0x1
268: 54 00 00 00 01 00 00 00 w0 &= 0x1
269: b4 01 00 00 04 00 00 00 w1 = 0x4
270: 16 00 03 00 00 00 00 00 if w0 == 0x0 goto +0x3 <LBB9_8>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230517040404.4023912-1-yhs@fb.com
The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c,
but the utility should not be called as a test. Executing this utility produces
the following error:
selftests: /linux/tools/testing/selftests/bpf: urandom_read
ok 16 selftests: /linux/tools/testing/selftests/bpf: urandom_read
selftests: /linux/tools/testing/selftests/bpf: sign-file
not ok 17 selftests: /linux/tools/testing/selftests/bpf: sign-file # exit=2
Also, urandom_read is mistakenly used as a test. It does not lead to an error,
but should be moved over to TEST_GEN_FILES as well. The empty TEST_CUSTOM_PROGS
can then be removed.
Fixes: fc97590668 ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc")
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/ZEuWFk3QyML9y5QQ@example.org
Link: https://lore.kernel.org/bpf/88e3ab23029d726a2703adcf6af8356f7a2d3483.1684316821.git.legion@kernel.org
The cited commit added a stray colon to the 'v' option. That makes the
option work incorrectly.
ex:
tools/testing/selftests/net# ./fib_nexthops.sh -v
(should enable verbose mode, instead it shows help text due to missing arg)
Fixes: 5feba47273 ("selftests: fib_nexthops: Make ping timeout configurable")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify the packet pacing algorithm so that it works with multi-buffer
packets. This algorithm makes sure we do not send too many buffers to
the receiving thread so that packets have to be dropped. The previous
algorithm made the assumption that each packet only consumes one
buffer, but that is not true anymore when multi-buffer support gets
added. Instead, we find out what the largest packet size is in the
packet stream and assume that each packet will consume this many
buffers. This is conservative and overly cautious as there might be
smaller packets in the stream that need fewer buffers per packet. But
it keeps the algorithm simple.
Also simplify it by removing the pthread conditional and just test if
there is enough space in the Rx thread before trying to send one more
batch. Also makes the tests run faster.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-11-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the ability to generate data in the packets that are correct for
multi-buffer packets. The ethernet header should only go into the
first fragment followed by data and the others should only have
data. We also need to modify the pkt_dump function so that it knows
what fragment has an ethernet header so it can print this.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-10-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Populate the fill ring based on the number of frags a packet
needs. With multi-buffer support, a packet might require more than a
single fragment/buffer, so the function xsk_populate_fill_ring() needs
to consider how many buffers a packet will consume, and put that many
buffers on the fill ring for each packet it should receive. As we are
still not sending any multi-buffer packets, the function will only
produce one buffer per packet at the moment.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-9-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test for hugepages only once at the beginning of the execution of the
whole test suite, instead of before each test that needs huge
pages. These are the tests that use unaligned mode. As more unaligned
tests will be added, so the current system just does not scale.
With this change, there are now three possible outcomes of a test run:
fail, pass, or skip. To simplify the handling of this, the function
testapp_validate_traffic() now returns this value to the main loop. As
this function is used by nearly all tests, it meant a small change to
most of them.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-8-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Store the offset in struct pkt instead of the address. This is
important since address is only meaningful in the context of a packet
that is stored in a single umem buffer and thus a single Tx
descriptor. If the packet, in contrast need to be represented by
multiple buffers in the umem, storing the address makes no sense since
the packet will consist of multiple buffers in the umem at various
addresses. This change is in preparation for the upcoming
multi-buffer support in AF_XDP and the corresponding tests.
So instead of indicating the address, we instead indicate the offset
of the packet in the first buffer. The actual address of the buffer is
allocated from the umem with a new function called
umem_alloc_buffer(). This also means we can get rid of the
use_fill_for_addr flag as the addresses fed into the fill ring will
always be the offset from the pkt specification in the packet stream
plus the address of the allocated buffer from the umem. No special
casing needed.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-7-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Convert the current variable rx_pkt_nb to an iterator that can be used
for both Rx and Tx. This to simplify the code and making Tx more like
Rx that already has this feature.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-6-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Dump the content of the packet when a test finds that packets are
received out of order, the length is wrong, or some other packet
error. Use the already existing pkt_dump function for this and call it
when the above errors are detected. Get rid of the command line option
for dumping packets as it is not useful to print out thousands of
good packets followed by the faulty one you would like to see.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-5-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a varying payload pattern within the packet. Instead of having
just a packet number that is the same for all words in a packet, make
each word different in the packet. The upper 16-bits are set to the
packet number and the lower 16-bits are the sequence number of the
words in this packet. So the 3rd packet's 5th 32-bit word of data will
contain the number (2<<32) | 4 as they are numbered from 0.
This will make it easier to detect fragments that are out of order
when starting to test multi-buffer support.
The member payload in the packet is renamed pkt_nb to reflect that it
is now only a pkt_nb, not the real payload as seen above.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-4-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement support for generating pkts with variable length. Before
this patch, they were all 64 bytes, exception for some packets of zero
length and some that were too large. This feature will be used to test
multi-buffer support for which large packets are needed.
The packets are also made simpler, just a valid Ethernet header
followed by a sequence number. This so that it will become easier to
implement packet generation when each packet consists of multiple
fragments. There is also a maintenance burden associated with carrying
all this code for generating proper UDP/IP packets, especially since
they are not needed.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-3-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Do not change the XDP program for the Tx thread when not needed. It
was erroneously compared to the XDP program for the Rx thread, which
is always going to be different, which meant that the code made
unnecessary switches to the same program it had before. This did not
affect functionality, just performance.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-2-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Moving kernel test kfuncs into bpf_testmod kernel module, and adding
necessary init calls and BTF IDs records.
We need to keep following structs in kernel:
struct prog_test_ref_kfunc
struct prog_test_member (embedded in prog_test_ref_kfunc)
The reason is because they need to be marked as rcu safe (check test
prog mark_ref_as_untrusted_or_null) and such objects are being required
to be defined only in kernel at the moment (see rcu_safe_kptr check
in kernel).
We need to keep also dtor functions for both objects in kernel:
bpf_kfunc_call_test_release
bpf_kfunc_call_memb_release
We also keep the copy of these struct in bpf_testmod_kfunc.h, because
other test functions use them. This is unfortunate, but this is just
temporary solution until we are able to these structs them to bpf_testmod
completely.
As suggested by David adding bpf_testmod.ko make dependency for
bpf programs, so they are rebuilt if we change the bpf_testmod.ko
module.
Also adding missing __bpf_kfunc to bpf_kfunc_call_test4 functions.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230515133756.1658301-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There's no need to keep the extern in kfuncs declarations.
Suggested-by: David Vernet <void@manifault.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently the test_verifier allows test to specify kfunc symbol
and search for it in the kernel BTF.
Adding the possibility to search for kfunc also in bpf_testmod
module when it's not found in kernel BTF.
To find bpf_testmod btf we need to get back SYS_ADMIN cap.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Loading bpf_testmod kernel module for verifier test. We will
move all the tests kfuncs into bpf_testmod in following change.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that we have un/load_bpf_testmod helpers in testing_helpers.h,
we can use it in other tests and save some lines.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Do not unload bpf_testmod in load_bpf_testmod, instead call
unload_bpf_testmod separatelly.
This way we will be able use un/load_bpf_testmod functions
in other tests that un/load bpf_testmod module.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We are about to use un/load_bpf_testmod functions in couple tests
and it's better to print output to stdout, so it's aligned with
tests ASSERT macros output, which use stdout as well.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Moving test_progs helpers to testing_helpers object so they can be
used from test_verifier in following changes.
Also adding missing ifndef header guard to testing_helpers.h header.
Using stderr instead of env.stderr because un/load_bpf_testmod helpers
will be used outside test_progs. Also at the point of calling them
in test_progs the std files are not hijacked yet and stderr is the
same as env.stderr.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move all kfunc exports into separate bpf_testmod_kfunc.h header file
and include it in tests that need it.
We will move all test kfuncs into bpf_testmod in following change,
so it's convenient to have declarations in single place.
The bpf_testmod_kfunc.h is included by both bpf_testmod and bpf
programs that use test kfuncs.
As suggested by David, the bpf_testmod_kfunc.h includes vmlinux.h
and bpf/bpf_helpers.h for bpf programs build, so the declarations
have proper __ksym attribute and we can resolve all the structs.
Note in kfunc_call_test_subprog.c we can no longer use the sk_state
define from bpf_tcp_helpers.h (because it clashed with vmlinux.h)
and we need to address __sk_common.skc_state field directly.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm patch [1] enabled cross-function optimization for func arguments
(ArgumentPromotion) at -O2 level. And this caused s390 sock_fields
test failure ([2]). The failure is gone right now as patch [1] was
reverted in [3]. But it is possible that patch [3] will be reverted
again and then the test failure in [2] will show up again. So it is
desirable to fix the failure regardless.
The following is an analysis why sock_field test fails with
llvm patch [1].
The main problem is in
static __noinline bool sk_dst_port__load_word(struct bpf_sock *sk)
{
__u32 *word = (__u32 *)&sk->dst_port;
return word[0] == bpf_htons(0xcafe);
}
static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
{
__u16 *half = (__u16 *)&sk->dst_port;
return half[0] == bpf_htons(0xcafe);
}
...
int read_sk_dst_port(struct __sk_buff *skb)
{
...
sk = skb->sk;
...
if (!sk_dst_port__load_word(sk))
RET_LOG();
if (!sk_dst_port__load_half(sk))
RET_LOG();
...
}
Through some cross-function optimization by ArgumentPromotion
optimization, the compiler does:
static __noinline bool sk_dst_port__load_word(__u32 word_val)
{
return word_val == bpf_htons(0xcafe);
}
static __noinline bool sk_dst_port__load_half(__u16 half_val)
{
return half_val == bpf_htons(0xcafe);
}
...
int read_sk_dst_port(struct __sk_buff *skb)
{
...
sk = skb->sk;
...
__u32 *word = (__u32 *)&sk->dst_port;
__u32 word_val = word[0];
...
if (!sk_dst_port__load_word(word_val))
RET_LOG();
__u16 half_val = word_val >> 16;
if (!sk_dst_port__load_half(half_val))
RET_LOG();
...
}
In current uapi bpf.h, we have
struct bpf_sock {
...
__be16 dst_port; /* network byte order */
__u16 :16; /* zero padding */
...
};
But the old kernel (e.g., 5.6) we have
struct bpf_sock {
...
__u32 dst_port; /* network byte order */
...
};
So for backward compatability reason, 4-byte load of
dst_port is converted to 2-byte load internally.
Specifically, 'word_val = word[0]' is replaced by 2-byte load
by the verifier and this caused the trouble for later
sk_dst_port__load_half() where half_val becomes 0.
Typical usr program won't have such a code pattern tiggering
the above bug, so let us fix the test failure with source
code change. Adding an empty asm volatile statement seems
enough to prevent undesired transformation.
[1] https://reviews.llvm.org/D148269
[2] https://lore.kernel.org/bpf/e7f2c5e8-a50c-198d-8f95-388165f1e4fd@meta.com/
[3] https://reviews.llvm.org/rG141be5c062ecf22bd287afffd310e8ac4711444a
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230516214945.1013578-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Change netcnt to demand at least 10K packets, as we frequently see some
stray packet arriving during the test in BPF CI. It seems more important
to make sure we haven't lost any packet than enforcing exact number of
packets.
Cc: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230515204833.2832000-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZGKqEAAKCRDbK58LschI
g6LYAQDp1jAszCOkmJ8VUA0ZyC5NAFDv+7y9Nd1toYWYX1btzAEAkf8+5qBJ1qmI
P5M0hjMTbH4MID9Aql10ZbMHheyOBAo=
=NUQM
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-05-16
We've added 57 non-merge commits during the last 19 day(s) which contain
a total of 63 files changed, 3293 insertions(+), 690 deletions(-).
The main changes are:
1) Add precision propagation to verifier for subprogs and callbacks,
from Andrii Nakryiko.
2) Improve BPF's {g,s}setsockopt() handling with wrong option lengths,
from Stanislav Fomichev.
3) Utilize pahole v1.25 for the kernel's BTF generation to filter out
inconsistent function prototypes, from Alan Maguire.
4) Various dyn-pointer verifier improvements to relax restrictions,
from Daniel Rosenberg.
5) Add a new bpf_task_under_cgroup() kfunc for designated task,
from Feng Zhou.
6) Unblock tests for arm64 BPF CI after ftrace supporting direct call,
from Florent Revest.
7) Add XDP hint kfunc metadata for RX hash/timestamp for igc,
from Jesper Dangaard Brouer.
8) Add several new dyn-pointer kfuncs to ease their usability,
from Joanne Koong.
9) Add in-depth LRU internals description and dot function graph,
from Joe Stringer.
10) Fix KCSAN report on bpf_lru_list when accessing node->ref,
from Martin KaFai Lau.
11) Only dump unprivileged_bpf_disabled log warning upon write,
from Kui-Feng Lee.
12) Extend test_progs to directly passing allow/denylist file,
from Stephen Veiss.
13) Fix BPF trampoline memleak upon failure attaching to fentry,
from Yafang Shao.
14) Fix emitting struct bpf_tcp_sock type in vmlinux BTF,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (57 commits)
bpf: Fix memleak due to fentry attach failure
bpf: Remove bpf trampoline selector
bpf, arm64: Support struct arguments in the BPF trampoline
bpftool: JIT limited misreported as negative value on aarch64
bpf: fix calculation of subseq_idx during precision backtracking
bpf: Remove anonymous union in bpf_kfunc_call_arg_meta
bpf: Document EFAULT changes for sockopt
selftests/bpf: Correctly handle optlen > 4096
selftests/bpf: Update EFAULT {g,s}etsockopt selftests
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
libbpf: fix offsetof() and container_of() to work with CO-RE
bpf: Address KCSAN report on bpf_lru_list
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
selftests/bpf: Accept mem from dynptr in helper funcs
bpf: verifier: Accept dynptr mem as mem in helpers
selftests/bpf: Check overflow in optional buffer
selftests/bpf: Test allowing NULL buffer in dynptr slice
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Add testcase for bpf_task_under_cgroup
bpf: Add bpf_task_under_cgroup() kfunc
...
====================
Link: https://lore.kernel.org/r/20230515225603.27027-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit ba38961a06 ("um: Enable FORTIFY_SOURCE"), it's possible
to run the FORTIFY tests under UML. Enable CONFIG_FORTIFY_SOURCE when
running with --alltests to gain additional coverage, and by default under
UML.
Signed-off-by: Kees Cook <keescook@chromium.org>
The qemu argument -enable-kvm is duplicated because the qemu_args bash
variable in kvm-test-1-run.sh already provides it. This commit therefore
removes the ppc64-specific copy in functions.sh.
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
This extends the BPF trampoline JIT to support attachment to functions
that take small structures (up to 128bit) as argument. This is trivially
achieved by saving/restoring a number of "argument registers" rather
than a number of arguments.
The AAPCS64 section 6.8.2 describes the parameter passing ABI.
"Composite types" (like C structs) below 16 bytes (as enforced by the
BPF verifier) are provided as part of the 8 argument registers as
explained in the section C.12.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/20230511140507.514888-1-revest@chromium.org
When building sign-file, the call to get the CFLAGS for libcrypto is
missing white-space between `pkg-config` and `--cflags`:
$(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null)
Removing the redirection of stderr, we see:
$ make -C tools/testing/selftests/bpf sign-file
make: Entering directory '[...]/tools/testing/selftests/bpf'
make: pkg-config--cflags: No such file or directory
SIGN-FILE sign-file
make: Leaving directory '[...]/tools/testing/selftests/bpf'
Add the missing space.
Fixes: fc97590668 ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/bpf/20230426215032.415792-1-jeremy@azazel.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Fix a compilation issue with DEFINE_STATIC_SRCU() in the unit tests
- Fix leaking kernel memory to a root-only sysfs attribute
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZGETgQAKCRDfioYZHlFs
Zy+FAQDDwPDprMrALvuWz3rYPROPH0h6X2zLYH5JFq29cqjO9wD/RVlrXFFkGaG+
3n7Uip2rZaW3OpC2TOaqBaDxTkXo0ww=
=yFDG
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link fixes from Dan Williams:
- Fix a compilation issue with DEFINE_STATIC_SRCU() in the unit tests
- Fix leaking kernel memory to a root-only sysfs attribute
* tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Add missing return to cdat read error path
tools/testing/cxl: Use DEFINE_STATIC_SRCU()
Even though it's not relevant in selftests, the people
might still copy-paste from them. So let's take care
of optlen > 4096 cases explicitly.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-4-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Instead of assuming EFAULT, let's assume the BPF program's
output is ignored.
Remove "getsockopt: deny arbitrary ctx->retval" because it
was actually testing optlen. We have separate set of tests
for retval.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add test to make sure that the localbypass option is on by default.
Add test to change vxlan localbypass to nolocalbypass and check
that packets are delivered to userspace.
Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting with commit:
95433f7263 ("srcu: Begin offloading srcu_struct fields to srcu_update")
...it is no longer possible to do:
static DEFINE_SRCU(x)
Switch to DEFINE_STATIC_SRCU(x) to fix:
tools/testing/cxl/test/mock.c:22:1: error: duplicate ‘static’
22 | static DEFINE_SRCU(cxl_mock_srcu);
| ^~~~~~
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168392709546.1135523.10424917245934547117.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use ping -r to test the kernel behaviour with raw and ping sockets
having the SO_DONTROUTE option.
Since ipv4_ping_novrf() is called with different values of
net.ipv4.ping_group_range, then it tests both raw and ping sockets
(ping uses ping sockets if its user ID belongs to ping_group_range
and raw sockets otherwise).
With both socket types, sending packets to a neighbour (on link) host,
should work. When the host is behind a router, sending should fail.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nettest --client-dontroute to test the kernel behaviour with UDP
sockets having the SO_DONTROUTE option. Sending packets to a neighbour
(on link) host, should work. When the host is behind a router, sending
should fail.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nettest --{client,server}-dontroute to test the kernel behaviour
with TCP sockets having the SO_DONTROUTE option. Sending packets to a
neighbour (on link) host, should work. When the host is behind a
router, sending should fail.
Client and server sockets are tested independently, so that we can
cover different TCP kernel paths.
SO_DONTROUTE also affects the syncookies path. So ipv4_tcp_dontroute()
is made to work with or without syncookies, to cover both paths.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add --client-dontroute and --server-dontroute options to nettest. They
allow to set the SO_DONTROUTE option to the client and server sockets
respectively. This will be used by the following patches to test
the SO_DONTROUTE kernel behaviour with TCP and UDP.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT4 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes: 2195444e09 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The srv6_end_dt4_l3vpn_test instantiates a virtual network consisting of
several routers (rt-1, rt-2) and hosts.
When the IPv6 addresses of rt-{1,2} routers are configured, the Deduplicate
Address Detection (DAD) kicks in when enabled in the Linux distros running
the selftests. DAD is used to check whether an IPv6 address is already
assigned in a network. Such a mechanism consists of sending an ICMPv6 Echo
Request and waiting for a reply.
As the DAD process could take too long to complete, it may cause the
failing of some tests carried out by the srv6_end_dt4_l3vpn_test script.
To make the srv6_end_dt4_l3vpn_test more robust, we disable DAD on routers
since we configure the virtual network manually and do not need any address
deduplication mechanism at all.
Fixes: 2195444e09 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The BUSTED-BOOST and TREE03 scenarios specify a mythical tree.use_softirq
module parameter, which means a failure to get full test coverage. This
commit therefore corrects the name to rcutree.use_softirq.
Fixes: e2b949d543 ("rcutorture: Make TREE03 use real-time tree.use_softirq setting")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
According to Mirsad the gpio-sim.sh test appears to FAIL in a wrong way
due to missing initialisation of shell variables:
4.2. Bias settings work correctly
cat: /sys/devices/platform/gpio-sim.0/gpiochip18/sim_gpio0/value: No such file or directory
./gpio-sim.sh: line 393: test: =: unary operator expected
bias setting does not work
GPIO gpio-sim test FAIL
After this change the test passed:
4.2. Bias settings work correctly
GPIO gpio-sim test PASS
His testing environment is AlmaLinux 8.7 on Lenovo desktop box with
the latest Linux kernel based on v6.2:
Linux 6.2.0-mglru-kmlk-andy-09238-gd2980d8d8265 x86_64
Suggested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmRbUnQACgkQ1V2XiooU
IOQNVw//eoCiid3J4TuNdzHaBHXUlZvln1n5Z6K5fIz5ytrY1T7oKC8Y9StkzzWR
29ToFLOJt1iRAcgxsghWRIwUzNwpuUdqgd9cUZMHMQxT0BJItp6FXUql2+1LkF/I
b/gnnb90zyE7lBS/VSRyOiqMiJlP+Som22d7Nn5k2KfTYEdXKwfzjsWAu3W3Sb0s
Lv/MA9DE42qcwiZubmFmDtOtAunPJFZm3HgkcAVeXoNkBDrSfkvxLIMYG6VfFNhQ
AkKMyzX293wpwVxfOuQfJr4QVlxAgOQUko+FqajoWMBtfA3yldZjJ8RC7c9Af9uI
ciOP11vHBCG84KrTabC5kdqOcvadreDiM/oIvk57ztQhCr3e+po+vIz6Cv4p9I30
m5GXfgbtMRl6hM2S5lrRc5fNRkYJHE4aNvesFTGaLpK3LogusH1E9mH5jdjnbU42
TwkIe250qJJelNn9ZxS5Jt0BgyogNfeiA9lOXmaQBpYmwIahrjDf0g8z2I85nPhF
PDukjSXsxi4uHwpSF5wFrlqkAPiEX4vC95uSUTVbbBgZrHhNJdGgu4FJSHRM4mVo
6Awxk2O2bvcDpXEfTBdD4EJF71bZ/aH2i8ddU1oupIl88O01TWTtkO4SYHqMX+tC
fPnpGzBN8KJvHgeFxO4p0v1oelv2RtiITv1gi7YJOnq/+Vd2yy0=
=HmfP
-----END PGP SIGNATURE-----
Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter updates for net
The following patchset contains Netfilter fixes for net:
1) Fix UAF when releasing netnamespace, from Florian Westphal.
2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
from Florian Westphal.
3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
4) Extend nft_flowtable.sh selftest to cover integration with
ingress/egress hooks, from Florian Westphal.
* tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: nft_flowtable.sh: check ingress/egress chain too
selftests: nft_flowtable.sh: monitor result file sizes
selftests: nft_flowtable.sh: wait for specific nc pids
selftests: nft_flowtable.sh: no need for ps -x option
selftests: nft_flowtable.sh: use /proc for pid checking
netfilter: conntrack: fix possible bug_on with enable_hooks=1
netfilter: nf_tables: always release netdev hooks from notifier
====================
Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
"ip link set dev "$devbond1" nomaster"
This line code in bond-eth-type-change.sh is unnecessary.
Because $devbond1 was not added to any master device.
Signed-off-by: Liang Li <liali@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When run the test in netns, it's not easy to get the tc stats via
tc_rule_handle_stats_get(). With the new netns parameter, we can get
stats from specific netns like
num=$(tc_rule_handle_stats_get "dev eth0 ingress" 101 ".packets" "-n ns")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure flowtable interacts correctly with ingress and egress
chains, i.e. those get handled before and after flow table respectively.
Adds three more tests:
1. repeat flowtable test, but with 'ip dscp set cs3' done in
inet forward chain.
Expect that some packets have been mangled (before flowtable offload
became effective) while some pass without mangling (after offload
succeeds).
2. repeat flowtable test, but with 'ip dscp set cs3' done in
veth0:ingress.
Expect that all packets pass with cs3 dscp field.
3. same as 2, but use veth1:egress. Expect the same outcome.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When running nft_flowtable.sh in VM on a busy server we've found that
the time of the netcat file transfers vary wildly.
Therefore replace hardcoded 3 second sleep with the loop checking for
a change in the file sizes. Once no change in detected we test the results.
Nice side effect is that we shave 1 second sleep in the fast case
(hard-coded 3 second sleep vs two 1 second sleeps).
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Doing wait with no parameters may interfere with some of the tests
having their own background processes.
Although no such test is currently present, the cleanup is useful
to rely on the nft_flowtable.sh for local development (e.g. running
background tcpdump command during the tests).
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some ps commands (e.g. busybox derived) have no -x option. For the
purposes of hash calculation of the list of processes this option is
inessential.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some ps commands (e.g. busybox derived) have no -p option. Use /proc for
pid existence check.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e6 ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Writing `subprocess.Popen[str]` requires python 3.9+.
kunit.py has an assertion that the python version is 3.7+, so we should
try to stay backwards compatible.
This conflicts a bit with commit 1da2e6220e ("kunit: tool: fix
pre-existing `mypy --strict` errors and update run_checks.py"), since
mypy complains like so
> kunit_kernel.py:95: error: Missing type parameters for generic type "Popen" [type-arg]
Note: `mypy --strict --python-version 3.7` does not work.
We could annotate each file with comments like
`# mypy: disable-error-code="type-arg"
but then we might still get nudged to break back-compat in other files.
This patch adds a `mypy.ini` file since it seems like the only way to
disable specific error codes for all our files.
Note: run_checks.py doesn't need to specify `--config_file mypy.ini`,
but I think being explicit is better, particularly since most kernel
devs won't be familiar with how mypy works.
Fixes: 695e260308 ("kunit: tool: add subscripts for type annotations where appropriate")
Reported-by: SeongJae Park <sj@kernel.org>
Link: https://lore.kernel.org/linux-kselftest/20230501171520.138753-1-sj@kernel.org
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Tested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We need to better polish building with BPF skels, so revert back to
making it an experimental feature that has to be explicitely enabled
using BUILD_BPF_SKEL=1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZFbCXwAKCRCyPKLppCJ+
J7cHAP97erKY4hBXArjpfzcvpFmboh/oqhbTLntyIpS6TEnOyQEAyervAPGIjQYC
DCo4foyXmOWn3dhNtK9M+YiRl3o2SgQ=
=7G78
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.
Build:
- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.
It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.
libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.
Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.
- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.
- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:
$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.
- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.
- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.
- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:
Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.
perf BPF filters:
- New feature where BPF can be used to filter samples, for instance:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
perf lock contention:
- Show lock type with address.
- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80
- Update default map size to 16384.
- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.
- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).
- Fix problems found with MSAn.
perf report/top:
- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.
- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.
perf sched:
- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
exists").
perf ftrace:
- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:
# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#
perf top:
- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.
- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.
perf annotate:
- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.
perf kvm:
- Add TUI mode for 'perf kvm stat report'.
Reference counting:
- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.
To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:
https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
- The above caught, for instance, fix, present in this series:
- Fix maps use after put in 'perf test "Share thread maps"':
'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.
Fixed by reversing the order of puts so that the leader is put
last.
- Also several fixes were made to places where reference counts were
not being held.
- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.
ARM64:
- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.
- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".
arm64 vendor events:
- Add N1 metrics.
Intel vendor events:
- Add graniterapids, grandridge and sierraforrest events.
- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp
- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.
perf stat:
- Implement --topdown using JSON metrics.
- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.
- Use metrics for --smi-cost.
- Update topdown documentation.
Vendor events (JSON) infrastructure:
- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:
{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},
- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.
- Support for printing metric thresholds in 'perf list'.
- Add --metric-no-threshold option to 'perf stat'.
- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.
- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.
S/390:
- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).
- Add cache metrics for z13, z14, z15 and z16.
- Add metric for TLB and cache.
ARM:
- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.
Intel PT hardware tracing:
- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.
- Add support for new branch instructions ERETS and ERETU.
- Fix CYC timestamps after standalone CBR
ARM CoreSight hardware tracing:
- Allow user to override timestamp and contextid settings.
- Fix segfault in dso lookup.
- Fix timeless decode mode detection.
- Add separate decode paths for timeless and per-thread modes.
auxtrace:
- Fix address filter entire kernel size.
Miscellaneous:
- Fix use-after-free and unaligned bugs in the PLT handling routines.
- Use zfree() to reduce chances of use after free.
- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.
- Suppress massive unsupported target platform errors in the unwind
code.
- Fix return incorrect build_id size in elf_read_build_id().
- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.
- Add 'perf bench syscall fork' benchmark.
- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.
- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.
- Fix some spelling mistakes"
* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...
This ensures that buffers retrieved from dynptr_data are allowed to be
passed in to helpers that take mem, like bpf_strncmp
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20230506013134.2492210-6-drosen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_dynptr_slice(_rw) no longer requires a buffer for verification. If the
buffer is needed, but not present, the function will return NULL.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20230506013134.2492210-3-drosen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_progs:
Tests new kfunc bpf_task_under_cgroup().
The bpf program saves the new task's pid within a given cgroup to
the remote_pid, which is convenient for the user-mode program to
verify the test correctness.
The user-mode program creates its own mount namespace, and mounts the
cgroupsv2 hierarchy in there, call the fork syscall, then check if
remote_pid and local_pid are unequal.
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230506031545.35991-3-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that precision propagation is supported fully in the presence of
subprogs, there is no need to work around iter test. Revert original
workaround.
This reverts be7dbd275d ("selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests").
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a bunch of tests validating verifier's precision backpropagation
logic in the presence of subprog calls and/or callback-calling
helpers/kfuncs.
We validate the following conditions:
- subprog_result_precise: static subprog r0 result precision handling;
- global_subprog_result_precise: global subprog r0 precision
shortcutting, similar to BPF helper handling;
- callback_result_precise: similarly r0 marking precise for
callback-calling helpers;
- parent_callee_saved_reg_precise, parent_callee_saved_reg_precise_global:
propagation of precision for callee-saved registers bypassing
static/global subprogs;
- parent_callee_saved_reg_precise_with_callback: same as above, but in
the presence of callback-calling helper;
- parent_stack_slot_precise, parent_stack_slot_precise_global:
similar to above, but instead propagating precision of stack slot
(spilled SCALAR reg);
- parent_stack_slot_precise_with_callback: same as above, but in the
presence of callback-calling helper;
- subprog_arg_precise: propagation of precision of static subprog's
input argument back to caller;
- subprog_spill_into_parent_stack_slot_precise: negative test
validating that verifier currently can't support backtracking of stack
access with non-r10 register, we validate that we fallback to
forcing precision for all SCALARs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When precision backtracking bails out due to some unsupported sequence
of instructions (e.g., stack access through register other than r10), we
need to mark all SCALAR registers as precise to be safe. Currently,
though, we mark SCALARs precise only starting from the state we detected
unsupported condition, which could be one of the parent states of the
actual current state. This will leave some registers potentially not
marked as precise, even though they should. So make sure we start
marking scalars as precise from current state (env->cur_state).
Further, we don't currently detect a situation when we end up with some
stack slots marked as needing precision, but we ran out of available
states to find the instructions that populate those stack slots. This is
akin the `i >= func->allocated_stack / BPF_REG_SIZE` check and should be
handled similarly by falling back to marking all SCALARs precise. Add
this check when we run out of states.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Teach __mark_chain_precision logic to maintain register/stack masks
across all active frames when going from child state to parent state.
Currently this should be mostly no-op, as precision backtracking usually
bails out when encountering subprog entry/exit.
It's not very apparent from the diff due to increased indentation, but
the logic remains the same, except everything is done on specific `fr`
frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced
with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(),
where frame index is passed explicitly, instead of using current frame
number.
We also adjust logging to emit affected frame number. And we also add
better logging of human-readable register and stack slot masks, similar
to previous patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add helper to format register and stack masks in more human-readable
format. Adjust logging a bit during backtrack propagation and especially
during forcing precision fallback logic to make it clearer what's going
on (with log_level=2, of course), and also start reporting affected
frame depth. This is in preparation for having more than one active
frame later when precision propagation between subprog calls is added.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sometimes during debugging it's important that BPF program is loaded
with BPF_F_TEST_STATE_FREQ flag set to force verifier to do frequent
state checkpointing. Teach veristat to do this when -t ("test state")
flag is specified.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230505043317.3629845-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE
ioctl's behavior more similar to KSM's behavior.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZFLsxAAKCRDdBJ7gKXxA
jl8yAQCqjstPsOULf9QN0z4bGAUhY+Wj4ERz1jbKSIuhFCJWiQEAgQvgRXObKjmi
OtUB0Ek4CMDCQzbyIQ1Bhp3kxi6+Jgs=
=AbyC
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- Some DAMON cleanups from Kefeng Wang
- Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE
ioctl's behavior more similar to KSM's behavior.
[ Andrew called these "final", but I suspect we'll have a series fixing
up the fact that the last commit in the dmapools series in the
previous pull seems to have unintentionally just reverted all the
other commits in the same series.. - Linus ]
* tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: hwpoison: coredump: support recovery from dump_user_range()
mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
mm/ksm: move disabling KSM from s390/gmap code to KSM code
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0
mm/damon/paddr: fix missing folio_sz update in damon_pa_young()
mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate()
mm/damon/paddr: minor refactor of damon_pa_pageout()
1. Don't hard-code pkg-config
2. Remove distro-specific default for CFLAGS
3. Use pkg-config for LDLIBS
Fixes: a50a88f026 ("selftests: netfilter: fix a build error on openSUSE")
Suggested-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let's test whether setting PR_SET_MEMORY_MERGE to 0 after setting it to 1
will unmerge pages, similar to how setting MADV_UNMERGEABLE after setting
MADV_MERGEABLE would.
Link: https://lkml.kernel.org/r/20230422205420.30372-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Stefan Roesch <shr@devkernel.io>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Improve test selection logic when using -a/-b/-d/-t options.
The list of tests to include or exclude can now be read from a file,
specified as @<filename>.
The file contains one name (or wildcard pattern) per line, and
comments beginning with # are ignored.
These options can be passed multiple times to read more than one file.
Signed-off-by: Stephen Veiss <sveiss@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230427225333.3506052-3-sveiss@meta.com
Split the logic to insert new tests into test filter sets out from
parse_test_list.
Fix the subtest insertion logic to reuse an existing top-level test
filter, which prevents the creation of duplicate top-level test filters
each with a single subtest.
Signed-off-by: Stephen Veiss <sveiss@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230427225333.3506052-2-sveiss@meta.com
* More phys_to_virt conversions
* Improvement of AP management for VSIE (nested virtualization)
ARM64:
* Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
* New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features
being moved to VMMs rather than be implemented in the kernel.
* Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one.
This last part allows the NV timer code to be implemented on
top.
* A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
* The usual selftest fixes and improvements.
KVM x86 changes for 6.4:
* Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
and by giving the guest control of CR0.WP when EPT is enabled on VMX
(VMX-only because SVM doesn't support per-bit controls)
* Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
as a bool
* Move AMD_PSFD to cpufeatures.h and purge KVM's definition
* Avoid unnecessary writes+flushes when the guest is only adding new PTEs
* Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
when emulating invalidations
* Clean up the range-based flushing APIs
* Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
changed SPTE" overhead associated with writing the entire entry
* Track the number of "tail" entries in a pte_list_desc to avoid having
to walk (potentially) all descriptors during insertion and deletion,
which gets quite expensive if the guest is spamming fork()
* Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
* Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, similar to CPUID features
* Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
* Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
x86 AMD:
* Add support for virtual NMIs
* Fixes for edge cases related to virtual interrupts
x86 Intel:
* Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
* Fix a bug in emulation of ENCLS in compatibility mode
* Allow emulation of NOP and PAUSE for L2
* AMX selftests improvements
* Misc cleanups
MIPS:
* Constify MIPS's internal callbacks (a leftover from the hardware enabling
rework that landed in 6.3)
Generic:
* Drop unnecessary casts from "void *" throughout kvm_main.c
* Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
size by 8 bytes on 64-bit kernels by utilizing a padding hole
Documentation:
* Fix goof introduced by the conversion to rST
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
=S2Hq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"s390:
- More phys_to_virt conversions
- Improvement of AP management for VSIE (nested virtualization)
ARM64:
- Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
- New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features being
moved to VMMs rather than be implemented in the kernel.
- Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one. This
last part allows the NV timer code to be implemented on top.
- A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
- The usual selftest fixes and improvements.
x86:
- Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
enabled, and by giving the guest control of CR0.WP when EPT is
enabled on VMX (VMX-only because SVM doesn't support per-bit
controls)
- Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
return as a bool
- Move AMD_PSFD to cpufeatures.h and purge KVM's definition
- Avoid unnecessary writes+flushes when the guest is only adding new
PTEs
- Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
optimizations when emulating invalidations
- Clean up the range-based flushing APIs
- Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
single A/D bit using a LOCK AND instead of XCHG, and skip all of
the "handle changed SPTE" overhead associated with writing the
entire entry
- Track the number of "tail" entries in a pte_list_desc to avoid
having to walk (potentially) all descriptors during insertion and
deletion, which gets quite expensive if the guest is spamming
fork()
- Disallow virtualizing legacy LBRs if architectural LBRs are
available, the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably
PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features
- Overhaul the vmx_pmu_caps selftest to better validate
PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- AMD SVM:
- Add support for virtual NMIs
- Fixes for edge cases related to virtual interrupts
- Intel AMX:
- Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
XTILE_DATA is not being reported due to userspace not opting in
via prctl()
- Fix a bug in emulation of ENCLS in compatibility mode
- Allow emulation of NOP and PAUSE for L2
- AMX selftests improvements
- Misc cleanups
MIPS:
- Constify MIPS's internal callbacks (a leftover from the hardware
enabling rework that landed in 6.3)
Generic:
- Drop unnecessary casts from "void *" throughout kvm_main.c
- Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
struct size by 8 bytes on 64-bit kernels by utilizing a padding
hole
Documentation:
- Fix goof introduced by the conversion to rST"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
KVM: s390: pci: fix virtual-physical confusion on module unload/load
KVM: s390: vsie: clarifications on setting the APCB
KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
KVM: selftests: Test the PMU event "Instructions retired"
KVM: selftests: Copy full counter values from guest in PMU event filter test
KVM: selftests: Use error codes to signal errors in PMU event filter test
KVM: selftests: Print detailed info in PMU event filter asserts
KVM: selftests: Add helpers for PMC asserts in PMU event filter test
KVM: selftests: Add a common helper for the PMU event filter guest code
KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
KVM: arm64: vhe: Drop extra isb() on guest exit
KVM: arm64: vhe: Synchronise with page table walker on MMU update
KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
KVM: arm64: nvhe: Synchronise with page table walker on TLBI
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
KVM: arm64: vgic: Don't acquire its_lock before config_lock
KVM: selftests: Add test to verify KVM's supported XCR0
...
- Refactor the DOE infrastructure (Data Object Exchange PCI-config-cycle
mailbox) to be a facility of the PCI core rather than the CXL core.
This is foundational for upcoming support for PCI device-attestation and
PCIe / CXL link encryption.
- Add support for retrieving and injecting poison for CXL memory
expanders. This enabling uses trace-events to convey CXL media error
records to user tooling. It includes translation of device-local
addresses (DPA) to system physical addresses (SPA) and their
corresponding CXL region.
- Fixes for decoder enumeration that missed v6.3-final
- Miscellaneous fixups
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZE2nNwAKCRDfioYZHlFs
Z5c2AQCTWebok6CD+HN01xnIx+CBWAUQe0QIGR40dT2P6/WGEgEA8wMae0w/FDlc
lQDvSoIyPvy1hGO7Ppb0K2AT6jrQAgU=
=blcC
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link updates from Dan Williams:
"DOE support is promoted from drivers/cxl/ to drivers/pci/ with Bjorn's
blessing, and the CXL core continues to mature its media management
capabilities with support for listing and injecting media errors. Some
late fixes that missed v6.3-final are also included:
- Refactor the DOE infrastructure (Data Object Exchange
PCI-config-cycle mailbox) to be a facility of the PCI core rather
than the CXL core.
This is foundational for upcoming support for PCI
device-attestation and PCIe / CXL link encryption.
- Add support for retrieving and injecting poison for CXL memory
expanders.
This enabling uses trace-events to convey CXL media error records
to user tooling. It includes translation of device-local addresses
(DPA) to system physical addresses (SPA) and their corresponding
CXL region.
- Fixes for decoder enumeration that missed v6.3-final
- Miscellaneous fixups"
* tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (38 commits)
cxl/test: Add mock test for set_timestamp
cxl/mbox: Update CMD_RC_TABLE
tools/testing/cxl: Require CONFIG_DEBUG_FS
tools/testing/cxl: Add a sysfs attr to test poison inject limits
tools/testing/cxl: Use injected poison for get poison list
tools/testing/cxl: Mock the Clear Poison mailbox command
tools/testing/cxl: Mock the Inject Poison mailbox command
cxl/mem: Add debugfs attributes for poison inject and clear
cxl/memdev: Trace inject and clear poison as cxl_poison events
cxl/memdev: Warn of poison inject or clear to a mapped region
cxl/memdev: Add support for the Clear Poison mailbox command
cxl/memdev: Add support for the Inject Poison mailbox command
tools/testing/cxl: Mock support for Get Poison List
cxl/trace: Add an HPA to cxl_poison trace events
cxl/region: Provide region info to the cxl_poison trace event
cxl/memdev: Add trigger_poison_list sysfs attribute
cxl/trace: Add TRACE support for CXL media-error records
cxl/mbox: Add GET_POISON_LIST mailbox command
cxl/mbox: Initialize the poison state
cxl/mbox: Restrict poison cmds to debugfs cxl_raw_allow_all
...
* cpuset changes including the fix for an incorrect interaction with CPU
hotplug and an optimization.
* Other doc and cosmetic changes.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErfng4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGVVtAQCDycK4VSgc4nsFPG1vh1Oy1A723ciEUwAbKmV/
F1n7xwEA68FiDvE29LpMJJuYP9HnX0A5zRMyNnb52kN9jmgcEQI=
=ALol
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset changes including the fix for an incorrect interaction with
CPU hotplug and an optimization
- Other doc and cosmetic changes
* tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1/cpusets: update libcgroup project link
cgroup/cpuset: Minor updates to test_cpuset_prs.sh
cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
cpuset: Clean up cpuset_node_allowed
cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers
* Support for runtime detection of the Svnapot extension.
* Support for Zicboz when clearing pages.
* We've moved to GENERIC_ENTRY.
* Support for !MMU on rv32 systems.
* The linear region is now mapped via huge pages.
* Support for building relocatable kernels.
* Support for the hwprobe interface.
* Various fixes and cleanups throughout the tree.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRL5rcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYibpcD/0RnmO+N2OJxsJXf0KtHv4LlChAFaMZ
mfcsU8lv8r3Rz1USJGyVoE57885R+iUw1664ic6Gj9Ll9/A+BDVyqlNeo1BZ7nnv
6hZawSh8XGMyCJoatjaCSMW6VKObsSpHXLoA0mxtj06w1XhtpUnzjv4SZQqBYxC2
7+/cfy6l3uGdSKQ0R402sF8PE+l3HthhO+Cw9NYHQZisAHEQrfFpXRnrovhs+vX0
aVxoWo8bmIhhNke2jh6dnGhfFfAs+UClbaKgZfe8af6feboo+Tal3+OibiEy1K1j
hDQ3w/G5jAdwSqnNPdXzpk4srskUOhP9is8AG79vCasMxybQIBfZcc7/kLmmQX+2
xt1EoDVD/lSO1p+CWRautLXEsInWbpBYaSJie7WcR4SHe8S7/nomTDlwkJHx5cma
mkSYHJKNwCbamDTI3gXg8nrScbxsRnJQsQUolFDwAeRz7AYVwtqVh8VxAWqAdU3q
xUNKrUpCAzNC3d5GL7pmRfZrqjpQhuFXkHFSy85vaCPuckBu926OzxpKBmX4Kea1
qLYWfxv78bcwuY47FWJKcd97Ib63iBYDgarJxvrHrwDaHV2xjBOmdapNPUc2PswT
a938enbYYnJHIbuSmbeNBPF4iF6nKUXshyfZu7tCZl6MzsXloUckGdm++j97Bpvr
g6G3ZP6STSQBmw==
=oxQd
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for runtime detection of the Svnapot extension
- Support for Zicboz when clearing pages
- We've moved to GENERIC_ENTRY
- Support for !MMU on rv32 systems
- The linear region is now mapped via huge pages
- Support for building relocatable kernels
- Support for the hwprobe interface
- Various fixes and cleanups throughout the tree
* tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits)
RISC-V: hwprobe: Explicity check for -1 in vdso init
RISC-V: hwprobe: There can only be one first
riscv: Allow to downgrade paging mode from the command line
dt-bindings: riscv: add sv57 mmu-type
RISC-V: hwprobe: Remove __init on probe_vendor_features()
riscv: Use --emit-relocs in order to move .rela.dyn in init
riscv: Check relocations at compile time
powerpc: Move script to check relocations at compile time in scripts/
riscv: Introduce CONFIG_RELOCATABLE
riscv: Move .rela.dyn outside of init to avoid empty relocations
riscv: Prepare EFI header for relocatable kernels
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
riscv: Fix ptdump when KASAN is enabled
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
riscv: Move DTB_EARLY_BASE_VA to the kernel address space
riscv: Rework kasan population functions
riscv: Split early and final KASAN population functions
riscv: Use PUD/P4D/PGD pages for the linear mapping
riscv: Move the linear mapping creation in its own function
riscv: Get rid of riscv_pfn_base variable
...
- Add support for building the kernel using PC-relative addressing on Power10.
- Allow HV KVM guests on Power10 to use prefixed instructions.
- Unify support for the P2020 CPU (85xx) into a single machine description.
- Always build the 64-bit kernel with 128-bit long double.
- Drop support for several obsolete 2000's era development boards as
identified by Paul Gortmaker.
- A series fixing VFIO on Power since some generic changes.
- Various other small features and fixes.
Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley,
Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael
Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin,
Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras,
Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher
Boessenkool, Timothy Pearson.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmRLdD8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPrED/93Hwi1/8udXYV6OAFBnLLLDX9RZWNx
sj6W7PC/yY7MGUTIGcayrAURt5xFfQZMBdq9nhhH46Wyd8pSUe1IlpXIEgqeH64Y
5uaSe6u4OZ5dDmiYz8bM+Y4Ixkfq1xMO0Rj27FIRJyU4Pp6gyMQ6/8W+iPU2vYIb
jl4frVO8PpmCbW8euOyT/b9YB2h79+5nLgZT4RvmYJblKQwgzRZWaiCAU0wYf+xT
xYsMQcqJlPstizTnXIr+GC08VrMPQR51kJCurnNUMXoAQY6toEXebveWRNZ3sH39
K0BRQ036NNGPS4GlHVbjgLIdoWr4pUEROZ48Jy9WeiDh6OwO2vb2zrm7ZLtKlGXI
LCQ08T9diPbAAJyVJaKjpsNXhTffuLPRhIOr1o2vqGvY+Fqbw96bPQAyFbxpJtrw
rGTTc5e93fEdZV+eaGR8kfdNM75WM6lgiteaIbUnpirYTh/0j6YEO1Ijc4d9e5ux
aoHN2eXQDjMm9foZf1D6vaCTRyN8LwLa9hcydKCn6+qULUQzoZ2r4cRt6eSD7+fx
Wni1Zg7vD6xx294nVmhg5dWuD4qVIAZj3BZPFHHR2SPqy+UbEJLk/NKgznmrhhgQ
HBcZAEFqIBZkA4e0w6LJKh/1j1rxpZUokgjMotOJq4VRivq82W1P7SioFDY3/6w5
UvGtIgGq4Qsa2Q==
=89WH
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for building the kernel using PC-relative addressing on
Power10.
- Allow HV KVM guests on Power10 to use prefixed instructions.
- Unify support for the P2020 CPU (85xx) into a single machine
description.
- Always build the 64-bit kernel with 128-bit long double.
- Drop support for several obsolete 2000's era development boards as
identified by Paul Gortmaker.
- A series fixing VFIO on Power since some generic changes.
- Various other small features and fixes.
Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel
Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro
Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas
Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali
Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob
Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and
Timothy Pearson.
* tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits)
powerpc/64s: Disable pcrel code model on Clang
powerpc: Fix merge conflict between pcrel and copy_thread changes
powerpc/configs/powernv: Add IGB=y
powerpc/configs/64s: Drop JFS Filesystem
powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems
powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest
powerpc/configs: Make pseries_le an alias for ppc64le_guest
powerpc/configs: Incorporate generic kvm_guest.config into guest configs
powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs
powerpc/configs/64s: Enable Device Mapper options
powerpc/configs/64s: Enable PSTORE
powerpc/configs/64s: Enable VLAN support
powerpc/configs/64s: Enable BLK_DEV_NVME
powerpc/configs/64s: Drop REISERFS
powerpc/configs/64s: Use SHA512 for module signatures
powerpc/configs/64s: Enable IO_STRICT_DEVMEM
powerpc/configs/64s: Enable SCHEDSTATS
powerpc/configs/64s: Enable DEBUG_VM & other options
powerpc/configs/64s: Enable EMULATED_STATS
powerpc/configs/64s: Enable KUNIT and most tests
...
- User events are finally ready!
After lots of collaboration between various parties, we finally locked
down on a stable interface for user events that can also work with user
space only tracing. This is implemented by telling the kernel (or user
space library, but that part is user space only and not part of this
patch set), where the variable is that the application uses to know if
something is listening to the trace. There's also an interface to tell
the kernel about these events, which will show up in the
/sys/kernel/tracing/events/user_events/ directory, where it can be
enabled. When it's enabled, the kernel will update the variable, to tell
the application to start writing to the kernel.
See https://lwn.net/Articles/927595/
- Cleaned up the direct trampolines code to simplify arm64 addition of
direct trampolines. Direct trampolines use the ftrace interface but
instead of jumping to the ftrace trampoline, applications (mostly BPF)
can register their own trampoline for performance reasons.
- Some updates to the fprobe infrastructure. fprobes are more efficient than
kprobes, as it does not need to save all the registers that kprobes on
ftrace do. More work needs to be done before the fprobes will be exposed
as dynamic events.
- More updates to references to the obsolete path of
/sys/kernel/debug/tracing for the new /sys/kernel/tracing path.
- Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line
by line instead of all at once. There's users in production kernels that
have a large data dump that originally used printk() directly, but the
data dump was larger than what printk() allowed as a single print.
Using seq_buf() to do the printing fixes that.
- Add /sys/kernel/tracing/touched_functions that shows all functions that
was every traced by ftrace or a direct trampoline. This is used for
debugging issues where a traced function could have caused a crash by
a bpf program or live patching.
- Add a "fields" option that is similar to "raw" but outputs the fields of
the events. It's easier to read by humans.
- Some minor fixes and clean ups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZEr36xQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6quZHAQCzuqnn2S8DsPd3Sy1vKIYaj0uajW5D
Kz1oUJH4F0H7kgEA8XwXkdtfKpOXWc/ZH4LWfL7Orx2wJZJQMV9dVqEPDAE=
=w0Z1
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- User events are finally ready!
After lots of collaboration between various parties, we finally
locked down on a stable interface for user events that can also work
with user space only tracing.
This is implemented by telling the kernel (or user space library, but
that part is user space only and not part of this patch set), where
the variable is that the application uses to know if something is
listening to the trace.
There's also an interface to tell the kernel about these events,
which will show up in the /sys/kernel/tracing/events/user_events/
directory, where it can be enabled.
When it's enabled, the kernel will update the variable, to tell the
application to start writing to the kernel.
See https://lwn.net/Articles/927595/
- Cleaned up the direct trampolines code to simplify arm64 addition of
direct trampolines.
Direct trampolines use the ftrace interface but instead of jumping to
the ftrace trampoline, applications (mostly BPF) can register their
own trampoline for performance reasons.
- Some updates to the fprobe infrastructure. fprobes are more efficient
than kprobes, as it does not need to save all the registers that
kprobes on ftrace do. More work needs to be done before the fprobes
will be exposed as dynamic events.
- More updates to references to the obsolete path of
/sys/kernel/debug/tracing for the new /sys/kernel/tracing path.
- Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer
line by line instead of all at once.
There are users in production kernels that have a large data dump
that originally used printk() directly, but the data dump was larger
than what printk() allowed as a single print.
Using seq_buf() to do the printing fixes that.
- Add /sys/kernel/tracing/touched_functions that shows all functions
that was every traced by ftrace or a direct trampoline. This is used
for debugging issues where a traced function could have caused a
crash by a bpf program or live patching.
- Add a "fields" option that is similar to "raw" but outputs the fields
of the events. It's easier to read by humans.
- Some minor fixes and clean ups.
* tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits)
ring-buffer: Sync IRQ works before buffer destruction
tracing: Add missing spaces in trace_print_hex_seq()
ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus
recordmcount: Fix memory leaks in the uwrite function
tracing/user_events: Limit max fault-in attempts
tracing/user_events: Prevent same address and bit per process
tracing/user_events: Ensure bit is cleared on unregister
tracing/user_events: Ensure write index cannot be negative
seq_buf: Add seq_buf_do_printk() helper
tracing: Fix print_fields() for __dyn_loc/__rel_loc
tracing/user_events: Set event filter_type from type
ring-buffer: Clearly check null ptr returned by rb_set_head_page()
tracing: Unbreak user events
tracing/user_events: Use print_format_fields() for trace output
tracing/user_events: Align structs with tabs for readability
tracing/user_events: Limit global user_event count
tracing/user_events: Charge event allocs to cgroups
tracing/user_events: Update documentation for ABI
tracing/user_events: Use write ABI in example
tracing/user_events: Add ABI self-test
...
to ARM's Top Byte Ignore and allows userspace to store metadata in some
bits of pointers without masking it out before use.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRK/WIACgkQaDWVMHDJ
krAL+RAAw33EhsWyYVkeAtYmYBKkGvlgeSDULtfJKe5bynJBTHkGKfM6RE9MSJIt
5fHWaConGh8HNpy0Us1sDvd/aWcWRm5h7ZcCVD+R4qrgh/vc7ULzM+elXe5jzr4W
cyuTckF2eW6SVrYg6fH5q+6Uy/moDtrdkLRvwRBf+AYeepB8gvSSH5XixKDNiVBE
pjNy1xXVZQokqD4tjsFelmLttyacR5OabiE/aeVNoFYf9yTwfnN8N3T6kwuOoS4l
Lp6NA+/0ux+oBlR+Is+JJG8Mxrjvz96yJGZYdR2YP5k3bMQtHAAjuq2w+GgqZm5i
j3/E6KQepEGaCfC+bHl68xy/kKx8ik+jMCEcBalCC25J3uxbLz41g6K3aI890wJn
+5ZtfcmoDUk9pnUyLxR8t+UjOSBFAcRSUE+FTjUH1qEGsMPK++9a4iLXz5vYVK1+
+YCt1u5LNJbkDxE8xVX3F5jkXh0G01SJsuUVAOqHSNfqSNmohFK8/omqhVRrRqoK
A7cYLtnOGiUXLnvjrwSxPNOzRrG+GAwqaw8gwOTaYogETWbTY8qsSCEVl204uYwd
m8io9rk2ZXUdDuha56xpBbPE0JHL9hJ2eKCuPkfvRgJT9YFyTh+e0UdX20k+nDjc
ang1S350o/Y0sus6rij1qS8AuxJIjHucG0GdgpZk3KUbcxoRLhI=
=qitk
-----END PGP SIGNATURE-----
Merge tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 LAM (Linear Address Masking) support from Dave Hansen:
"Add support for the new Linear Address Masking CPU feature.
This is similar to ARM's Top Byte Ignore and allows userspace to store
metadata in some bits of pointers without masking it out before use"
* tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
selftests/x86/lam: Add test cases for LAM vs thread creation
selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
selftests/x86/lam: Add inherit test cases for linear-address masking
selftests/x86/lam: Add io_uring test cases for linear-address masking
selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
mm: Expose untagging mask in /proc/$PID/status
x86/mm: Provide arch_prctl() interface for LAM
x86/mm: Reduce untagged_addr() overhead for systems without LAM
x86/uaccess: Provide untagged_addr() and remove tags before address check
mm: Introduce untagged_addr_remote()
x86/mm: Handle LAM on context switch
x86: CPUID and CR3/CR4 flags for Linear Address Masking
x86: Allow atomic MM_CONTEXT flags setting
x86/mm: Rework address range check in get_user() and put_user()
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes: 03a0b567a0 ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is reported that the fexit_sleep never returns in aarch64.
The remaining tests cannot start. Put this test into DENYLIST.aarch64
for now so that other tests can continue to run in the CI.
Acked-by: Manu Bretelle <chantr4@gmail.com>
Reported-by: Manu Bretelle <chantra@meta.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page().
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
than exclusive, and has fixed a bunch of errors which were caused by its
unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
=J+Dj
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
- sh: Use generic GCC library routines
- sh: sq: Use the bitmap API when applicable
- sh: sq: Fix incorrect element size for allocating bitmap buffer
- sh: pci: Remove unused variable in SH-7786 PCI Express code
- sh: mcount.S: fix build error when PRINTK is not enabled
- sh: remove sh5/sh64 last fragments
- sh: math-emu: fix macro redefined warning
- sh: init: use OF_EARLY_FLATTREE for early init
- sh: nmi_debug: fix return value of __setup handler
- sh: SH2007: drop the bad URL info
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEYv+KdYTgKVaVRgAGdCY7N/W1+RMFAmRIz0UACgkQdCY7N/W1
+RMm1Q/9Hw5xMnxHbryDoBAqgwEOZRH+MUMBnAyMw3shqxO/Cp/nIAacvdNmF4Me
iszDjATleshk8vbTwUE6cFPzKuLM8r4o1JfBvYSEBgkfs5YEEhoa+1TQZ6aYl3zD
v6vcVQnobaV5dUc9yUA3FdG/vuXEj7wctZuqO0QYsC/bE5g/r1fFTEd37Jbo2qwg
6sJ+xL8KEa29Abq9OP0QmeOWvHBuGcCLZNgagA4JxT7U4+jYhg0ddphw+c3yybnP
FX1eFMulB98V/oDPCOlfrYsZAkQGoYPWwY0WI/nVg8ujA3lbRkSu6Fd9ic95/PGG
KVjr6Mol6/+ESy4k/MB46bJzq0un2FPWhZzyfL0RoCbX2zQWBtC/1XbT0PmTsRud
CzcPAMpNPDwUTcoSWdUpOfEAbxjIgGNhQBth9lRMNFhNkk8cwgk1UAN0LjBRm5nq
MteTim3qCyiFkNlngpvSVbIokBKWllKAtPSL3wCi6OgQCNm7XWZxme2z8G5tVkit
Q9bTVD5qMt24pRJsGsVho8wvRsqMmtl5hwMzFVP02WBNxb9csHpQHrhG7MRLN9kt
0BPYU6erCcRl9DQ9HonUaKCmJDJEyxUcXan48TSyGzajFDnURS7AfkreO7NHQIbO
YAaCvqCDwGVygBjUQtHLrBWlORjAD8IoMEJ1sivRzCeHXGlmI6s=
=RGSv
-----END PGP SIGNATURE-----
Merge tag 'sh-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"This is a bit larger than my previous one and mainly consists of
clean-up work in the arch/sh directory by Geert Uytterhoeven and Randy
Dunlap.
Additionally, this fixes a bug in the Storage Queue code that was
discovered while I was reviewing a patch to switch the code to the
bitmap API by Christophe Jaillet.
So this contains both a fix for the original bug in the Storage Queue
code that can be backported later as well as the Christophe's patch to
swich the code to the bitmap API.
Summary:
- Use generic GCC library routines
- sq: Use the bitmap API when applicable
- sq: Fix incorrect element size for allocating bitmap buffer
- pci: Remove unused variable in SH-7786 PCI Express code
- mcount.S: fix build error when PRINTK is not enabled
- remove sh5/sh64 last fragments
- math-emu: fix macro redefined warning
- init: use OF_EARLY_FLATTREE for early init
- nmi_debug: fix return value of __setup handler
- SH2007: drop the bad URL info"
* tag 'sh-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Replace <uapi/asm/types.h> by <asm-generic/int-ll64.h>
sh: Use generic GCC library routines
sh: sq: Use the bitmap API when applicable
sh: sq: Fix incorrect element size for allocating bitmap buffer
sh: pci: Remove unused variable in SH-7786 PCI Express code
sh: mcount.S: fix build error when PRINTK is not enabled
sh: remove sh5/sh64 last fragments
sh: math-emu: fix macro redefined warning
sh: init: use OF_EARLY_FLATTREE for early init
sh: nmi_debug: fix return value of __setup handler
sh: SH2007: drop the bad URL info
The selftest test_global_funcs/global_func1 failed with the latest clang17.
The reason is due to upstream ArgumentPromotionPass ([1]),
which may manipulate static function parameters and cause inlining
although the funciton is marked as noinline.
The original code:
static __attribute__ ((noinline))
int f0(int var, struct __sk_buff *skb)
{
return skb->len;
}
__attribute__ ((noinline))
int f1(struct __sk_buff *skb)
{
...
return f0(0, skb) + skb->len;
}
...
SEC("tc")
__failure __msg("combined stack size of 4 calls is 544")
int global_func1(struct __sk_buff *skb)
{
return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
}
After ArgumentPromotionPass, the code is translated to
static __attribute__ ((noinline))
int f0(int var, int skb_len)
{
return skb_len;
}
__attribute__ ((noinline))
int f1(struct __sk_buff *skb)
{
...
return f0(0, skb->len) + skb->len;
}
...
SEC("tc")
__failure __msg("combined stack size of 4 calls is 544")
int global_func1(struct __sk_buff *skb)
{
return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
}
And later llvm InstCombine phase recognized that f0()
simplify returns the value of the second argument and removed f0()
completely and the final code looks like:
__attribute__ ((noinline))
int f1(struct __sk_buff *skb)
{
...
return skb->len + skb->len;
}
...
SEC("tc")
__failure __msg("combined stack size of 4 calls is 544")
int global_func1(struct __sk_buff *skb)
{
return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4);
}
If f0() is not inlined, the verification will fail with stack size
544 for a particular callchain. With f0() inlined, the maximum
stack size is 512 which is in the limit.
Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass
from hoisting the code to its caller, and this fixed the test failure.
[1] https://reviews.llvm.org/D148269
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230425174744.1758515-1-yhs@fb.com
Now that ftrace supports direct call on arm64, BPF tracing programs work
on that architecture. This fixes the vast majority of BPF selftests
except for:
- multi_kprobe programs which require fprobe, not available on arm64 yet
- tracing_struct which requires trampoline support to access struct args
This patch updates the list of BPF selftests which are known to fail so
the BPF CI can validate the tests which pass now.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230427143207.635263-1-revest@chromium.org
Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIVAwUAZEmThqZi849r7WBJAQLKTxAAwLKvk8xCUVPardg2tYxLSaNJAeSgo4L0
CKgB52kXa5R6+L3OApKgkREkj0TotNpNA5Gc/1DlPiRrUXPAj7g+NS2ID8SfXOUd
Iii42DoVZli03kG2xoLgU9Fy7mJ1JdfCC6dhP95y6oDzsZqb87M8sk+2G59KVhXO
KXaVMSU68+AKdXwDCbxhDwR+CH0YpGUqZxURKYycIZQhWPCssBDHorqJLLHzodSx
jk+OKAqTAURjt3Pqqn6BwyOXmjhsomUfJ2z01i/I062+zFTjy+6RBhqqbOPBpJ0w
D34nDwunyhlha11u1dJoP2lpmujJvliTUPM0ddeZTTMbRf58LzpxtVBPSsy389uI
pqC14OdUDEvlp4WX4Xkj7K2m4HpE9hYL1gF2ebnwvyS2f1Sjti1mKSYvs/cJk5nY
nlivD7lmj4Cc0SDasyfqnkP9TUxF+1SNoDAImtku/ajtIGsguveU8kYZtZxKj3WO
A0LZKabKH/jEvJug/aQA0l5+AdP88mGLre+WYc6xh7IxTlsXnYeLpaYOdGZ19WCQ
tjpc+z+nPSszc0wQs2TsJSxQpkzcO+8qS+h9GFhBm5DREfVHR8wrsMrdxot55zvm
+j9sMN8oD7RQwxtG9DUF2wzIyjKe/k9b3qbe/BApC65WsMiXdSvlhJKhvNZQs+w3
1OGeT5LJpqc=
=JhWj
-----END PGP SIGNATURE-----
Merge tag 'for-linus-2023042601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- import a bunch of HID selftests from out-of-tree hid-tools project
(Benjamin Tissoires)
- drastically reducing Bluetooth disconnects on hid-nintendo driven
devices (Daniel J. Ogorchock)
- lazy initialization of battery interfaces in wacom driver (Jason
Gerecke)
- generic support for all Kye tablets (David Yang)
- proper rumble queue overrun handling in hid-nintendo (Daniel J.
Ogorchock)
- support for ADC measurement in logitech-hidpp driver (Bastien Nocera)
- reset GPIO support in i2c-hid (Hans de Goede)
- improved handling of generic "Digitizer" usage (Jason Gerecke)
- support for KEY_CAMERA_FOCUS (Feng Qi)
- quirks for Apple Geyser 3 and Apple Geyser 4 (Alex Henrie)
- assorted functional fixes and device ID additions
* tag 'for-linus-2023042601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (54 commits)
HID: amd_sfh: Fix max supported HID devices
HID: wacom: generic: Set battery quirk only when we see battery data
HID: wacom: Lazy-init batteries
HID: Ignore battery for ELAN touchscreen on ROG Flow X13 GV301RA
HID: asus: explicitly include linux/leds.h
HID: lg-g15: explicitly include linux/leds.h
HID: steelseries: explicitly include linux/leds.h
HID: apple: Set the tilde quirk flag on the Geyser 3
HID: apple: explicitly include linux/leds.h
HID: mcp2221: fix get and get_direction for gpio
HID: mcp2221: fix report layout for gpio get
HID: wacom: Set a default resolution for older tablets
HID: i2c-hid-of: Add reset GPIO support to i2c-hid-of
HID: i2c-hid-of: Allow using i2c-hid-of on non OF platforms
HID: i2c-hid-of: Consistenly use dev local variable in probe()
HID: kye: Fix rdesc for kye tablets
HID: amd_sfh: Support for additional light sensor
HID: amd_sfh: Handle "no sensors" enabled for SFH1.1
HID: amd_sfh: Increase sensor command timeout for SFH1.1
HID: amd_sfh: Correct the stop all command
...
At this time, it's an interesting mixture of changes for both old and
new stuff. Majority of changes are about ASoC (lots of systematic
changes for converting remove callbacks to void, and cleanups), while
we got the fixes and the enhancements of very old PCI cards, too.
Here are some highlights:
ALSA/ASoC Core:
- Continued effort of more ASoC core cleanups
- Minor improvements for XRUN handling in indirect PCM helpers
- Code refactoring of PCM core code
ASoC:
- Continued feature and simplification work on SOF, including addition
of a no-DSP mode for bringup, HDA MLink and extensions to the IPC4
protocol
- Hibernation support for CS35L45
- More DT binding conversions
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas R-Car
Gen4, Rockchip RK3588 and TI TAS5733
ALSA:
- Lots of works for legacy emu10k1 and ymfpci PCI drivers
- PCM kselftest fixes and enhancements
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmRJBkcOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8S/Q/+If1MEW+XXYushYU6VcWbHevwsRwmUZPtIJzT
Nx4PE4Ia8rX++GbsH5Iqt6tmldbb/vMbwy7TGbn/Q4ju2cO5qGT4/qgWdC2TuUX6
icWRHslJ//TffSd/yh1g6JIKBlcCmQeYcw5KoaLzBE/qO3iRP0IQUc17gkLKYNni
u1XOGrU9zuh3uwz+UQFfUhB8NlKhD3HVYjwrbd3gwcDsE/0G+q76A/wWghfA+RAb
0ruDhIDtJoem6PKQTwC05UgDpmwd7XFAIgcbOu7E7t/lr4YKwQZhQmJI0IexCR9i
aLPqg3Q/6S+WFKpcPcGCHNljqRNp9lUlIXak+NsbCZ7mXKE6tALywAtuB57sZ0sO
QM1YrmUAsi0RaD7foPcT64CAq8IVQ6aLWusXwvcxzzvJuHvJdeiBKiI5gmF0GqMu
ZLpAMGCoKxft4Il2r+BPTbLHe57uHmp1fKMWUK4NfyIUW7jEdKmf7ALSSJmvcqwU
+R0PXikc0lOo1GH9ZQojpVNFwV8XLOd2CWaNfoPl85A0+ngYhTY3ZRQ3qbYWHlU6
zXAu06IUOef5phsn3zerJ1orV729Xdjf+JUbL0uxJvANsX6R93CQWw0tgrUI62EZ
0vhoOp3PPZUKmDKvUo/NtIyuvSGREg3wDug5tiDOb53Qwfr2VIThJa999kNzH76c
lHUfrv4=
=7XGG
-----END PGP SIGNATURE-----
Merge tag 'sound-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"At this time, it's an interesting mixture of changes for both old and
new stuff. Majority of changes are about ASoC (lots of systematic
changes for converting remove callbacks to void, and cleanups), while
we got the fixes and the enhancements of very old PCI cards, too.
Here are some highlights:
ALSA/ASoC Core:
- Continued effort of more ASoC core cleanups
- Minor improvements for XRUN handling in indirect PCM helpers
- Code refactoring of PCM core code
ASoC:
- Continued feature and simplification work on SOF, including
addition of a no-DSP mode for bringup, HDA MLink and extensions to
the IPC4 protocol
- Hibernation support for CS35L45
- More DT binding conversions
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas
R-Car Gen4, Rockchip RK3588 and TI TAS5733
ALSA:
- Lots of works for legacy emu10k1 and ymfpci PCI drivers
- PCM kselftest fixes and enhancements"
* tag 'sound-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (586 commits)
ALSA: emu10k1: use high-level I/O in set_filterQ()
ALSA: emu10k1: use high-level I/O functions also during init
ALSA: emu10k1: fix error handling in snd_audigy_i2c_volume_put()
ALSA: emu10k1: don't stop DSP in _snd_emu10k1_{,audigy_}init_efx()
ALSA: emu10k1: fix SNDRV_EMU10K1_IOCTL_SINGLE_STEP
ALSA: emu10k1: skip Sound Blaster-specific hacks for E-MU cards
ALSA: emu10k1: fixup DSP defines
ALSA: emu10k1: pull in some register definitions from kX-project
ALSA: emu10k1: remove some bogus defines
ALSA: emu10k1: eliminate some unused defines
ALSA: emu10k1: fix lineup of EMU_HANA_* defines
ALSA: emu10k1: comment updates
ALSA: emu10k1: fix snd_emu1010_fpga_read() input masking for rev2 cards
ALSA: emu10k1: remove unused emu->pcm_playback_efx_substream field
ALSA: emu10k1: remove unused `resume` parameter from snd_emu10k1_init()
ALSA: emu10k1: minor optimizations
ALSA: emu10k1: remove remaining cruft from snd_emu10k1_emu1010_init()
ALSA: emu10k1: remove apparently pointless EMU_HANA_OPTION_CARDS reads
ALSA: emu10k1: remove apparently pointless FPGA reads
ALSA: emu10k1: stop doing weird things with HCFG in snd_emu10k1_emu1010_init()
...
To correlate the hardware RX timestamp with something, add tracking of
two software timestamps both clock source CLOCK_TAI (see description in
man clock_gettime(2)).
XDP metadata is extended with xdp_timestamp for capturing when XDP
received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I
could not find a BPF helper for getting CLOCK_REALTIME, which would have
been preferred. In userspace when AF_XDP sees the packet another
software timestamp is recorded via clock_gettime() also clock source
CLOCK_TAI.
Example output shortly after loading igc driver:
poll: 1 (0) skip=1 fail=0 redir=2
xsk_ring_cons__peek: 1
0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000
rx_hash: 0x82A96531 with RSS type:0x1
rx_timestamp: 1681740540304898909 (sec:1681740540.3049)
XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec)
AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec)
0x12557a8: complete idx=9 addr=9000
The first observation is that the 37 sec difference between RX HW vs XDP
timestamps, which indicate hardware is likely clock source
CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised
with a 37 sec offset.
The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the
userspace wakeup time. On this hardware it was caused by CPU idle sleep
states, which can be reduced by tuning /dev/cpu_dma_latency.
View current requested/allowed latency bound via:
hexdump --format '"%d\n"' /dev/cpu_dma_latency
More explanation of the output and how this can be used to identify
clock drift for the HW clock can be seen here[1]:
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/bpf/168182466298.616355.2544377890818617459.stgit@firesoul
Two series:
- Reorganize how the hardware page table objects are managed,
particularly their destruction flow. Increase the selftest
test coverage in this area by creating a more complete mock
iommu driver.
This is preparation to add a replace operation for HWPT binding,
which is done but waiting for the VFIO parts to complete so there
is a user.
- Split the iommufd support for "access" to make it two step - allocate
an access then link it to an IOAS. Update VFIO and have VFIO always
create an access even for the VFIO mdevs that never do DMA.
This is also preperation for the replace VFIO series that will allow
replace to work on access types as well.
Three minor fixes:
- Sykzaller found the selftest code didn't check for overflow when
processing user VAs
- smatch noted a .data item should have been static
- Add a selftest that reproduces a syzkaller bug for batch carry already
fixed in rc
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZEfgkQAKCRCFwuHvBreF
YRw7AQD5BhiTzdTMlMPAasihb2ZITYs3MyiZrxICZuZf3CksLAD9FcO3aYPq01ge
7yHF+LeWHYq1dR6x7YTvcbc3apFFIgA=
=0j6E
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"Two series:
- Reorganize how the hardware page table objects are managed,
particularly their destruction flow. Increase the selftest test
coverage in this area by creating a more complete mock iommu
driver.
This is preparation to add a replace operation for HWPT binding,
which is done but waiting for the VFIO parts to complete so there
is a user.
- Split the iommufd support for "access" to make it two step -
allocate an access then link it to an IOAS. Update VFIO and have
VFIO always create an access even for the VFIO mdevs that never do
DMA.
This is also preperation for the replace VFIO series that will
allow replace to work on access types as well.
Three minor fixes:
- Sykzaller found the selftest code didn't check for overflow when
processing user VAs
- smatch noted a .data item should have been static
- Add a selftest that reproduces a syzkaller bug for batch carry
already fixed in rc"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (21 commits)
iommufd/selftest: Cover domain unmap with huge pages and access
iommufd/selftest: Set varaiable mock_iommu_device storage-class-specifier to static
vfio: Check the presence for iommufd callbacks in __vfio_register_dev()
vfio/mdev: Uses the vfio emulated iommufd ops set in the mdev sample drivers
vfio-iommufd: Make vfio_iommufd_emulated_bind() return iommufd_access ID
vfio-iommufd: No need to record iommufd_ctx in vfio_device
iommufd: Create access in vfio_iommufd_emulated_bind()
iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas()
iommufd/selftest: Catch overflow of uptr and length
iommufd/selftest: Add a selftest for iommufd_device_attach() with a hwpt argument
iommufd/selftest: Make selftest create a more complete mock device
iommufd/selftest: Rename the remaining mock device_id's to stdev_id
iommufd/selftest: Rename domain_id to hwpt_id for FIXTURE iommufd_mock_domain
iommufd/selftest: Rename domain_id to stdev_id for FIXTURE iommufd_ioas
iommufd/selftest: Rename the sefltest 'device_id' to 'stdev_id'
iommufd: Make iommufd_hw_pagetable_alloc() do iopt_table_add_domain()
iommufd: Move iommufd_device to iommufd_private.h
iommufd: Move ioas related HWPT destruction into iommufd_hw_pagetable_destroy()
iommufd: Consistently manage hwpt_item
iommufd: Add iommufd_lock_obj() around the auto-domains hwpts
...
Add a test case to check for precision marking of safe paths. Ensure
that the verifier will not prematurely prune scalars contributing to
registers needing precision.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Core
----
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances.
- Reduce compound page head access for zero-copy data transfers.
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible.
- Threaded NAPI improvements, adding defer skb free support and unneeded
softirq avoidance.
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking.
- Add lockless accesses annotation to sk_err[_soft].
- Optimize again the skb struct layout.
- Extends the skb drop reasons to make it usable by multiple
subsystems.
- Better const qualifier awareness for socket casts.
BPF
---
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses.
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward.
- Add more precise memory usage reporting for all BPF map types.
- Adds support for using {FOU,GUE} encap with an ipip device operating
in collect_md mode and add a set of BPF kfuncs for controlling encap
params.
- Allow BPF programs to detect at load time whether a particular kfunc
exists or not, and also add support for this in light skeleton.
- Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities.
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc.
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps.
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps.
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree.
- Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them.
- Add ARM32 USDT support to libbpf.
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations.
Protocols
---------
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address.
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition.
- Add the handshake upcall mechanism, allowing the user-space
to implement generic TLS handshake on kernel's behalf.
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures.
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers.
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction.
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore.
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter
---------
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged.
- Update bridge netfilter and ovs conntrack helpers to handle
IPv6 Jumbo packets properly, i.e. fetch the packet length
from hop-by-hop extension header. This is needed for BIT TCP
support.
- The iptables 32bit compat interface isn't compiled in by default
anymore.
- Move ip(6)tables builtin icmp matches to the udptcp one.
This has the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used.
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device.
Driver API
----------
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time.
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them.
- Allow the page_pool to directly recycle the pages from safely
localized NAPI.
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization.
- Add YNL support for user headers and struct attrs.
- Add partial YNL specification for devlink.
- Add partial YNL specification for ethtool.
- Add tc-mqprio and tc-taprio support for preemptible traffic classes.
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device.
- Add basic LED support for switch/phy.
- Add NAPI documentation, stop relaying on external links.
- Convert dsa_master_ioctl() to netdev notifier. This is a preparatory
work to make the hardware timestamping layer selectable by user
space.
- Add transceiver support and improve the error messages for CAN-FD
controllers.
New hardware / drivers
----------------------
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers
-------
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors.
- add support for configuring max SDU for each Tx queue.
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only
on shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll.
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates.
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices
(e.g. MAC address from efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRI/mUSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkgO0QAJGxpuN67YgYV0BIM+/atWKEEexJYG7B
9MMpU4jMO3EW/pUS5t7VRsBLUybLYVPmqCZoHodObDfnu59jiPOegb6SikJv/ZwJ
Zw62PVk5MvDnQjlu4e6kDcGwkplteN08TlgI+a49BUTedpdFitrxHAYGW8f2fRO6
cK2XSld+ZucMoym5vRwf8yWS1BwdxnslPMxDJ+/8ZbWBZv44qAnG2vMB/kIx7ObC
Vel/4m6MzTwVsLYBsRvcwMVbNNlZ9GuhztlTzEbfGA4ZhTadIAMgb5VTWXB84Ws7
Aic5wTdli+q+x6/2cxhbyeoVuB9HHObYmLBAciGg4GNljP5rnQBY3X3+KVZ/x9TI
HQB7CmhxmAZVrO9pLARFV+ECrMTH2/dy3NyrZ7uYQ3WPOXJi8hJZjOTO/eeEGL7C
eTjdz0dZBWIBK2gON/6s4nExXVQUTEF2ZsPi52jTTClKjfe5pz/ddeFQIWaY1DTm
pInEiWPAvd28JyiFmhFNHsuIBCjX/Zqe2JuMfMBeBibDAC09o/OGdKJYUI15AiRf
F46Pdb7use/puqfrYW44kSAfaPYoBiE+hj1RdeQfen35xD9HVE4vdnLNeuhRlFF9
aQfyIRHYQofkumRDr5f8JEY66cl9NiKQ4IVW1xxQfYDNdC6wQqREPG1md7rJVMrJ
vP7ugFnttneg
=ITVa
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
- Don't advertisze XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
- Overhaul the AMX selftests to improve coverage and cleanup the test
- Misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRGt50SHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5MskP/2PhSrdgHxCwfpqpdVe/q5OWwFuhn3wG
f5QKMpEBg4wJFeIE3eGJEaDlg776nWtWDNgUmqdjoZ8vyyadkPX9CV2Y2Hq0M7Tw
d0gKPjQrz2BavyDYoPNfs4pfshs4EvDTswBkhdAt8KTZhGZosJOywQIp61V3ePqr
1rDP6C4+CmwTRAK0f7egslyJ2pZXiUcvhITvzx8XhIAQh6nEK4gUZ/l3hLmg38kD
Af23kiLnP8lHUUx4BQtRAnTw0SZXJ8DcKtoFkzEH8mdj4g6EqXpxy48zuyZcqWVi
4XIFr+WECPsV5gdqWN9rMDqIG2ib+2heKDmcdUptcVuvr1ktv0reQybmgVck4CKX
fTAdu86/LBaQmIHwNHaNFPwdUby4QQZ8ajafPC62oc+B6N1lQg8bbCwnvO6KGlGl
FaQTnzaZq7ft4tfQRXOMu1AbLZLK7dIqJHHhxR3MkBkd4MAcZ1bVKkvlJLqsOKNw
TEsreXErY7AsegZK73Rn4IN/CJGBof5bZ2NIchmiN+0UfMsd9zGn66Als6oRNh4E
tRUhFONPIEmydy9UB50qe6b98ElB6R++opZbvkVW2hy8lMy3iJrCvUbOs1nx3wbn
cxvIuTfw/dAFf70S03/zudf7lYHs2wKV1rrIAebyTd4NnvWdVB8OaSHgZswMgVjb
UzzQfnQ+u9so
=BY10
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-selftests-6.4' of https://github.com/kvm-x86/linux into HEAD
KVM selftests, and an AMX/XCR0 bugfix, for 6.4:
- Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
- Overhaul the AMX selftests to improve coverage and cleanup the test
- Misc cleanups
- Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better
validate PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- Misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRGtd4SHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5Z9kP/i3WZ40hevvQvB/5cEpxxmxYDwCYnnjM
hiQgK5jT4SrMTmVjLgkNdI2PogQoS4CX+GC7lcA9bvse84hjuPvgOflb2B+p2UQi
Ytbr9g/tfKNIpnKIk9mcPcSObN9vm2Kgt7n28rtPrHWj89eQzgc66eijqdpKBLxA
c3crVR8krwYAQK0tmzHq1+H6hB369YbHAHyTTRRI/bNWnqKblnvUbt0NL2aBusa9
rNMaOdRtinLpy2dmuX/b3japRB8QTnlf7zpPIF4cBEhbYXy5woClZpf1D2fCA6Er
XFbEoYawMVd9UeJYbW4z5yErLT83eYoGp4U0eFXWp6fvh8nZlgCGvBKE9g4mmqwj
aSLaTR5eVN2qlw6jXVeg3unCo8Eyl36AwYwve2L6sFmBvZvNV5iz2eQ7rrOe4oE3
dnTUaLQ8I2SVg04MbYmCq5W+frTL/I7kqNpbccL1Z3R5WO4y5gz63mug6NfLIvhR
t45TAIaifxBfcXQsBZM3v2KUK/xQrD3AbJmFKh54L2CKqiGaNWsMLX+6NZ7LZWgf
8rEqsVkkQDgF7z8eXai4TR26nYfSX6g9gDqtOH73L87aJ7PJk5cRoDWQ1sWs1e/l
4HA/L0Bo/3pnKAa0ZWxJOixmzqY49gNQf3dj8gt3jk3y2ijbAivshiSpPBmIxn0u
QLeOf/LGvipl
=m18F
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEAD
KVM x86 PMU changes for 6.4:
- Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better
validate PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- Misc cleanups and fixes
- Update the ACPICA code in the kernel to upstream revision 20230331
including the following changes:
* Delete bogus node_array array of pointers from AEST table (Jessica
Clarke).
* Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang).
* Add missing macro ACPI_FUNCTION_TRACE() for acpi_ns_repair_HID()
(Xiongfeng Wang).
* Add missing tables to astable (Pedro Falcato).
* Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen).
* Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski).
* Add support for Arm's MPAM ACPI table version 2 (Hesham Almatary).
* Update all copyrights/signons in ACPICA to 2023 (Bob Moore).
* Add support for ClockInput resource (v6.5) (Niyas Sait).
* Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L).
* Add structure definitions for the RISC-V RHCT ACPI table (Sunil V L).
* Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein).
* Make ACPICA code support flexible arrays properly (Kees Cook).
* Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red).
* Add os specific support for Zephyr RTOS to ACPICA (Najumon).
* Update version to 20230331 (Bob Moore).
- Fix evaluating the _PDC ACPI control method when running as Xen
dom0 (Roger Pau Monne).
- Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu).
- Check for null return of devm_kzalloc() in fch_misc_setup() (Kang
Chen).
- Log a message if enable_irq_wake() fails for the ACPI SCI (Simon
Gaiser).
- Initialize the correct IOMMU fwspec while parsing ACPI VIOT
(Jean-Philippe Brucker).
- Amend indentation and prefix error messages with FW_BUG in the ACPI
SPCR parsing code (Andy Shevchenko).
- Enable ACPI sysfs support for CCEL records (Kuppuswamy
Sathyanarayanan).
- Make the APEI error injection code warn on invalid arguments when
explicitly indicated by platform (Shuai Xue).
- Add CXL error types to the error injection code in APEI (Tony Luck).
- Refactor acpi_data_prop_read_single() (Andy Shevchenko).
- Fix two issues in the ACPI SBS driver (Armin Wolf).
- Replace ternary operator with min_t() in the generic ACPI thermal
zone driver (Jiangshan Yi).
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki).
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede).
- Replace irqdomain.h include with struct declarations in ACPI headers
and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring).
- Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K).
- Update the pm_profile sysfs attribute documentation (Rafael Wysocki).
- Add 80862289 ACPI _HID for second PWM controller on Cherry Trail to
the ACPI driver for Intel SoCs (Hans de Goede).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmRGvLQSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxoV4P/jxWGAdldtgXORR58lKGbSs6lx/0Y+SF
iI7qK88NcbcbWS+a3PqRrisNkjN17rjzajfp28Ue2CXFxzwTViyw6KYELbPJ6N/h
/3prem++jKgf7qiueDJG/AyO8N2+Z+yciubhxdMiK1+c1dZM2ycwSyBzJgYocpXn
fH+YFPhxE7c8Z8doBrTOZjRuU4SIEKCmxo3c5BbCuyVZkbqCRdQMIDCiBJgLTmbo
z4pu9OFhAamB8Cth2QFfRbZWqmuY71Gt54+c4ITPPV2ALlLUYODyHZoSISBJULp3
k0lU/hMCD+i1WRwv+Bb6of7pJPM4Lqp+wOirAtiiibjE9LRxVTNyOUAHLXbx+t2V
PN8JKVJVCLaZO6TRELgFIL4nh4aBdOtr4BuaLnClZho9bG68jEkc8grnOZYhFYtM
66BuJBW30rwwGY4N5VSZGzFFR7l2qaHIOSHdq681bxQ3e6erFEeIc5jQVEOKgCqd
XWdELVkqf3CnCX0lgonj+AgoeCqOpYdrNcWqMsJ+6OyQRoFhLFltDSPeJm9gHGO7
X+qCQru4ZgEDKexWKpGgH9x8AllDKbh/ApyyumXgsQOsRocVdoNaf+yCBlaaDyqu
UYif6hgFYnIxF2Fg1r/POgHDXFobE4iUTHcUU1V2QhuByc4PkN9ljKsHeC2FgVUz
JityWRiMABNv
=O61K
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20230331, fix the ACPI SBS driver and the evaluation of the _PDC
method on Xen dom0 in the ACPI processor driver, update the ACPI
driver for Intel SoCs and clean up code in multiple places.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20230331
including the following changes:
* Delete bogus node_array array of pointers from AEST table
(Jessica Clarke)
* Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang)
* Add missing macro ACPI_FUNCTION_TRACE() for
acpi_ns_repair_HID() (Xiongfeng Wang)
* Add missing tables to astable (Pedro Falcato)
* Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen)
* Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski)
* Add support for Arm's MPAM ACPI table version 2 (Hesham
Almatary)
* Update all copyrights/signons in ACPICA to 2023 (Bob Moore)
* Add support for ClockInput resource (v6.5) (Niyas Sait)
* Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L)
* Add structure definitions for the RISC-V RHCT ACPI table (Sunil
V L)
* Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein)
* Make ACPICA code support flexible arrays properly (Kees Cook)
* Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red)
* Add os specific support for Zephyr RTOS to ACPICA (Najumon)
* Update version to 20230331 (Bob Moore)
- Fix evaluating the _PDC ACPI control method when running as Xen
dom0 (Roger Pau Monne)
- Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu)
- Check for null return of devm_kzalloc() in fch_misc_setup() (Kang
Chen)
- Log a message if enable_irq_wake() fails for the ACPI SCI (Simon
Gaiser)
- Initialize the correct IOMMU fwspec while parsing ACPI VIOT
(Jean-Philippe Brucker)
- Amend indentation and prefix error messages with FW_BUG in the ACPI
SPCR parsing code (Andy Shevchenko)
- Enable ACPI sysfs support for CCEL records (Kuppuswamy
Sathyanarayanan)
- Make the APEI error injection code warn on invalid arguments when
explicitly indicated by platform (Shuai Xue)
- Add CXL error types to the error injection code in APEI (Tony Luck)
- Refactor acpi_data_prop_read_single() (Andy Shevchenko)
- Fix two issues in the ACPI SBS driver (Armin Wolf)
- Replace ternary operator with min_t() in the generic ACPI thermal
zone driver (Jiangshan Yi)
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki)
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede)
- Replace irqdomain.h include with struct declarations in ACPI
headers and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring)
- Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K)
- Update the pm_profile sysfs attribute documentation (Rafael
Wysocki)
- Add 80862289 ACPI _HID for second PWM controller on Cherry Trail to
the ACPI driver for Intel SoCs (Hans de Goede)"
* tag 'acpi-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
ACPI: LPSS: Add 80862289 ACPI _HID for second PWM controller on Cherry Trail
ACPI: bus: Ensure that notify handlers are not running after removal
ACPI: bus: Add missing braces to acpi_sb_notify()
ACPI: video: Remove desktops without backlight DMI quirks
ACPI: video: Remove register_backlight_delay module option and code
ACPI: Replace irqdomain.h include with struct declarations
fpga: lattice-sysconfig-spi: Add explicit include for of.h
tpm: atmel: Add explicit include for of.h
virtio-mmio: Add explicit include for of.h
pata: ixp4xx: Add explicit include for of.h
ata: pata_macio: Add explicit include of irqdomain.h
serial: 8250_tegra: Add explicit include for of.h
net: rfkill-gpio: Add explicit include for of.h
staging: iio: resolver: ad2s1210: Add explicit include for of.h
iio: adc: ad7292: Add explicit include for of.h
ACPICA: Update version to 20230331
ACPICA: add os specific support for Zephyr RTOS
ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects
ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array
ACPICA: acpi_madt_oem_data: Fix flexible array member definition
...
User processes register an address and bit pair for events. If the same
address and bit pair are registered multiple times in the same process,
it can cause undefined behavior when events are enabled/disabled.
When more than one are used, the bit could be turned off by another
event being disabled, while the original event is still enabled.
Prevent undefined behavior by checking the current mm to see if any
event has already been registered for the address and bit pair. Return
EADDRINUSE back to the user process if it's already being used.
Update ftrace self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-4-beaub@linux.microsoft.com
Suggested-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If an event is enabled and a user process unregisters user_events, the
bit is left set. Fix this by always clearing the bit in the user process
if unregister is successful.
Update abi self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-3-beaub@linux.microsoft.com
Suggested-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The write index indicates which event the data is for and accesses a
per-file array. The index is passed by user processes during write()
calls as the first 4 bytes. Ensure that it cannot be negative by
returning -EINVAL to prevent out of bounds accesses.
Update ftrace self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-2-beaub@linux.microsoft.com
Fixes: 7f5a08c79d ("user_events: Add minimal support for trace_event into ftrace")
Reported-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- Improve the VDSO build time checks to cover all dynamic relocations
VDSO does not allow dynamic relcations, but the build time check is
incomplete and fragile.
It's based on architectures specifying the relocation types to search
for and does not handle R_*_NONE relocation entries correctly.
R_*_NONE relocations are injected by some GNU ld variants if they fail
to determine the exact .rel[a]/dyn_size to cover trailing zeros.
R_*_NONE relocations must be ignored by dynamic loaders, so they
should be ignored in the build time check too.
Remove the architecture specific relocation types to check for and
validate strictly that no other relocations than R_*_NONE end up
in the VSDO .so file.
- Prefer signal delivery to the current thread for
CLOCK_PROCESS_CPUTIME_ID based posix-timers
Such timers prefer to deliver the signal to the main thread of a
process even if the context in which the timer expires is the current
task. This has the downside that it might wake up an idle thread.
As there is no requirement or guarantee that the signal has to be
delivered to the main thread, avoid this by preferring the current
task if it is part of the thread group which shares sighand.
This not only avoids waking idle threads, it also distributes the
signal delivery in case of multiple timers firing in the context
of different threads close to each other better.
- Align the tick period properly (again)
For a long time the tick was starting at CLOCK_MONOTONIC zero, which
allowed users space applications to either align with the tick or to
place a periodic computation so that it does not interfere with the
tick. The alignement of the tick period was more by chance than by
intention as the tick is set up before a high resolution clocksource is
installed, i.e. timekeeping is still tick based and the tick period
advances from there.
The early enablement of sched_clock() broke this alignement as the time
accumulated by sched_clock() is taken into account when timekeeping is
initialized. So the base value now(CLOCK_MONOTONIC) is not longer a
multiple of tick periods, which breaks applications which relied on
that behaviour.
Cure this by aligning the tick starting point to the next multiple of
tick periods, i.e 1000ms/CONFIG_HZ.
- A set of NOHZ fixes and enhancements
- Cure the concurrent writer race for idle and IO sleeptime statistics
The statitic values which are exposed via /proc/stat are updated from
the CPU local idle exit and remotely by cpufreq, but that happens
without any form of serialization. As a consequence sleeptimes can be
accounted twice or worse.
Prevent this by restricting the accumulation writeback to the CPU
local idle exit and let the remote access compute the accumulated
value.
- Protect idle/iowait sleep time with a sequence count
Reading idle/iowait sleep time, e.g. from /proc/stat, can race with
idle exit updates. As a consequence the readout may result in random
and potentially going backwards values.
Protect this by a sequence count, which fixes the idle time
statistics issue, but cannot fix the iowait time problem because
iowait time accounting races with remote wake ups decrementing the
remote runqueues nr_iowait counter. The latter is impossible to fix,
so the only way to deal with that is to document it properly and to
remove the assertion in the selftest which triggers occasionally due
to that.
- Restructure struct tick_sched for better cache layout
- Some small cleanups and a better cache layout for struct tick_sched
- Implement the missing timer_wait_running() callback for POSIX CPU timers
For unknown reason the introduction of the timer_wait_running() callback
missed to fixup posix CPU timers, which went unnoticed for almost four
years.
While initially only targeted to prevent livelocks between a timer
deletion and the timer expiry function on PREEMPT_RT enabled kernels, it
turned out that fixing this for mainline is not as trivial as just
implementing a stub similar to the hrtimer/timer callbacks.
The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems
there is a livelock issue independent of RT.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers
out from hard interrupt context to task work, which is handled before
returning to user space or to a VM. The expiry mechanism moves the
expired timers to a stack local list head with sighand lock held. Once
sighand is dropped the task can be preempted and a task which wants to
delete a timer will spin-wait until the expiry task is scheduled back
in. In the worst case this will end up in a livelock when the preempting
task and the expiry task are pinned on the same CPU.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way.
Add a per task mutex to struct posix_cputimers_work, let the expiry task
hold it accross the expiry function and let the deleting task which
waits for the expiry to complete block on the mutex.
In the non-contended case this results in an extra mutex_lock()/unlock()
pair on both sides.
This avoids spin-waiting on a task which is scheduled out, prevents the
livelock and cures the problem for RT and !RT systems.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb
EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0
0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs
IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+
00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y
OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H
5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY
Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40
6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2
tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx
9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB
nIJuHl8mxSetag==
=SVnj
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
- Improve the VDSO build time checks to cover all dynamic relocations
VDSO does not allow dynamic relocations, but the build time check is
incomplete and fragile.
It's based on architectures specifying the relocation types to search
for and does not handle R_*_NONE relocation entries correctly.
R_*_NONE relocations are injected by some GNU ld variants if they
fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
R_*_NONE relocations must be ignored by dynamic loaders, so they
should be ignored in the build time check too.
Remove the architecture specific relocation types to check for and
validate strictly that no other relocations than R_*_NONE end up in
the VSDO .so file.
- Prefer signal delivery to the current thread for
CLOCK_PROCESS_CPUTIME_ID based posix-timers
Such timers prefer to deliver the signal to the main thread of a
process even if the context in which the timer expires is the current
task. This has the downside that it might wake up an idle thread.
As there is no requirement or guarantee that the signal has to be
delivered to the main thread, avoid this by preferring the current
task if it is part of the thread group which shares sighand.
This not only avoids waking idle threads, it also distributes the
signal delivery in case of multiple timers firing in the context of
different threads close to each other better.
- Align the tick period properly (again)
For a long time the tick was starting at CLOCK_MONOTONIC zero, which
allowed users space applications to either align with the tick or to
place a periodic computation so that it does not interfere with the
tick. The alignement of the tick period was more by chance than by
intention as the tick is set up before a high resolution clocksource
is installed, i.e. timekeeping is still tick based and the tick
period advances from there.
The early enablement of sched_clock() broke this alignement as the
time accumulated by sched_clock() is taken into account when
timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
not longer a multiple of tick periods, which breaks applications
which relied on that behaviour.
Cure this by aligning the tick starting point to the next multiple of
tick periods, i.e 1000ms/CONFIG_HZ.
- A set of NOHZ fixes and enhancements:
* Cure the concurrent writer race for idle and IO sleeptime
statistics
The statitic values which are exposed via /proc/stat are updated
from the CPU local idle exit and remotely by cpufreq, but that
happens without any form of serialization. As a consequence
sleeptimes can be accounted twice or worse.
Prevent this by restricting the accumulation writeback to the CPU
local idle exit and let the remote access compute the accumulated
value.
* Protect idle/iowait sleep time with a sequence count
Reading idle/iowait sleep time, e.g. from /proc/stat, can race
with idle exit updates. As a consequence the readout may result
in random and potentially going backwards values.
Protect this by a sequence count, which fixes the idle time
statistics issue, but cannot fix the iowait time problem because
iowait time accounting races with remote wake ups decrementing
the remote runqueues nr_iowait counter. The latter is impossible
to fix, so the only way to deal with that is to document it
properly and to remove the assertion in the selftest which
triggers occasionally due to that.
* Restructure struct tick_sched for better cache layout
* Some small cleanups and a better cache layout for struct
tick_sched
- Implement the missing timer_wait_running() callback for POSIX CPU
timers
For unknown reason the introduction of the timer_wait_running()
callback missed to fixup posix CPU timers, which went unnoticed for
almost four years.
While initially only targeted to prevent livelocks between a timer
deletion and the timer expiry function on PREEMPT_RT enabled kernels,
it turned out that fixing this for mainline is not as trivial as just
implementing a stub similar to the hrtimer/timer callbacks.
The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
systems there is a livelock issue independent of RT.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
timers out from hard interrupt context to task work, which is handled
before returning to user space or to a VM. The expiry mechanism moves
the expired timers to a stack local list head with sighand lock held.
Once sighand is dropped the task can be preempted and a task which
wants to delete a timer will spin-wait until the expiry task is
scheduled back in. In the worst case this will end up in a livelock
when the preempting task and the expiry task are pinned on the same
CPU.
The timer wheel has a timer_wait_running() mechanism for RT, which
uses a per CPU timer-base expiry lock which is held by the expiry
code and the task waiting for the timer function to complete blocks
on that lock.
This does not work in the same way for posix CPU timers as there is
no timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry
lock can be used too in a slightly different way.
Add a per task mutex to struct posix_cputimers_work, let the expiry
task hold it accross the expiry function and let the deleting task
which waits for the expiry to complete block on the mutex.
In the non-contended case this results in an extra
mutex_lock()/unlock() pair on both sides.
This avoids spin-waiting on a task which is scheduled out, prevents
the livelock and cures the problem for RT and !RT systems
* tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Implement the missing timer_wait_running callback
selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
selftests/proc: Remove idle time monotonicity assertions
MAINTAINERS: Remove stale email address
timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
timers/nohz: Add a comment about broken iowait counter update race
timers/nohz: Protect idle/iowait sleep time under seqcount
timers/nohz: Only ever update sleeptime from idle exit
timers/nohz: Restructure and reshuffle struct tick_sched
tick/common: Align tick period with the HZ tick.
selftests/timers/posix_timers: Test delivery of signals across threads
posix-timers: Prefer delivery of signals to the current thread
vdso: Improve cmd_vdso_check to check all dynamic relocations
Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes which
utilize syscall user dispatch correctly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGgIETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYockhEACWVd/KOBlQIdUMpM3jfSWsm+VZrITg
sKN2WCKaz8MS5RA7xTAfZIEqMzkI0V+GPoj+8eK70W39XFU/PlSQo8LUFahSxVHF
RVyz4zFKeR2XZpDa8J3ytoOvngiAnpOUflssvfA0+f3gq/B48jgLmj8XsrkmkL2T
6txRpusYNlzVTBoza0+1uEmxBTNhRxvURXa6OR/l24Kbh2udyNd6dlAoRHBV0iOW
qn7ILgoYIr/74ChCbrr8yZe2rZ+BqqlS1fsjDWkuUqq9AgzeuOjGJnZtMKG6WbGg
/NBj0Ewe7gsgZwBo7t4MbKNF7bXRkLczp8BX/l9xOTe+mpZ+LyNIHvOM3/TD6O1A
NFJNwTAGAnhU5Uoba9HzaKYZZnanqgLxuszXznJDU3zKV5pCNMNzlKxjPT73Jzsl
T1WTCyhSydluSuhOHLU4awC38pqVEQwichx98c9agIBPo7kxkb5RcTVq223wOSeI
h8otkecJ6U+gmjNDHnRtNBzykEIjVFjgiSBYGTr+/6ek2Myf0O/RMr13oe9OZG5R
jaKyjcDIADbYRow1rXfEs7Bq42K8rIkbVZvEEK/auYRUFngAoQ3l090i9wj6ViXf
7CqAjCC1K1BBxbqQwf0YLuDXCzUaXxcWfvNGEGEGs/NYDuu291QntGSFSxsJgsym
HXvO4NzHOHi13A==
=AS+6
-----END PGP SIGNATURE-----
Merge tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry/ptrace update from Thomas Gleixner:
"Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes
which utilize syscall user dispatch correctly"
* tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftest, ptrace: Add selftest for syscall user dispatch config api
ptrace: Provide set/get interface for syscall user dispatch
syscall_user_dispatch: Untag selector address before access_ok()
syscall_user_dispatch: Split up set_syscall_user_dispatch()
iter_pass_iter_ptr_to_subprog subtest is relying on actual array size
being passed as subprog parameter. This combined with recent fixes to
precision tracking in conditional jumps ([0]) is now causing verifier to
backtrack all the way to the point where sum() and fill() subprogs are
called, at which point precision backtrack bails out and forces all the
states to have precise SCALAR registers. This in turn causes each
possible value of i within fill() and sum() subprogs to cause
a different non-equivalent state, preventing iterator code to converge.
For now, change the test to assume fixed size of passed in array. Once
BPF verifier supports precision tracking across subprogram calls, these
changes will be reverted as unnecessary.
[0] 71b547f561 ("bpf: Fix incorrect verifier pruning due to missing register precision taints")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230424235128.1941726-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As reported by Kumar in [0], the shared ownership implementation for BPF
programs has some race conditions which need to be addressed before it
can safely be used. This patch does so in a minimal way instead of
ripping out shared ownership entirely, as proper fixes for the issues
raised will follow ASAP, at which point this patch's commit can be
reverted to re-enable shared ownership.
The patch removes the ability to call bpf_refcount_acquire_impl from BPF
programs. Programs can only bump refcount and obtain a new owning
reference using this kfunc, so removing the ability to call it
effectively disables shared ownership.
Instead of changing success / failure expectations for
bpf_refcount-related selftests, this patch just disables them from
running for now.
[0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2/
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230424204321.2680232-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZEEuiAAKCRCRxhvAZXjc
otFNAP4iesF3tqrzb1JLf/vKg8PT98re0Iec5tGYyfDSCjfGYAEAraeFOSry16sy
VUn6JBjDwrkAbUeraz6zauS+rXwstQk=
=xYmX
-----END PGP SIGNATURE-----
Merge tag 'v6.4/kernel.clone3.tests' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull clone3 selftest fix from Christian Brauner:
"This is a single fix to the clone3() selftstests.
It fell through the sefltest tree cracks a few times so I'll provide
it here. It has low urgency but we should still correctly report the
number of tests"
* tag 'v6.4/kernel.clone3.tests' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: fix number of tests in ksft_set_plan
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRFYFsACgkQCwJExA0N
Qxw+PxAA1KHnHool3QbzZouFgLgTS2N/hxsOIoWKeUl6guUPX0XYu67FEIyt7p5k
a1eFLjt+q4URW/heHKYdffP+Up6xhN5yVP8xJEcbn6GD13lz1clI9RAjObiPOehc
KOV90PeAEfzosEGRIp97g4Gzu8NUMZqN7BsKBdzYJ4rEftlcjaILBVp4OfSuCyAi
UbYBdRjK4eIOwGXuHVfhNqzH1HRSbzcoSRTywj5qW0Qhpe6KnZBRuZESXYBsxzGb
G0nd4+OttjZyplI/xQYwaU0XGAI6roG5G4nAT5YGHLp5g8rTaHetTi+i3iK4iEru
wEL0NgywkA0ujAge97RldOjtU97vvSFk7FwxdS9lxaMW/Ut2sN72I2ThI8dBvVRZ
fcw8t8mmT1gUv3SCq+s1X13vz22IedXLOfvOY2o/fLk2zxOw5e8FirAz/aFeOf3K
++hK+IQvDmeMMv08bz0ORzdRQcjdwQNQ3klnfdrUVFN9yK+iAllOJ/nrXHLNIXu4
c3ITlAMldcAf2W+LRWzvqqKyT4H8MCXL3L0bBc1M1reRu9nM89AZedO8MHCB0R9Q
2ic0rOxIwZzPJuk0qPDxEVmN7Rpyx85I96YOwRemJTEfdkB/ZX+BfOU0KzinOVHC
3qrHuIw/SyRTlUEDAr53gJ5WHbdjhKAmrd1/FuplyoOSX0w6VVA=
=COQn
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
* tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: add tests for using current KUnit test field
kunit: tool: Add support for SH under QEMU
kunit: tool: Add support for overriding the QEMU serial port
.gitignore: Unignore .kunitconfig
list: test: Test the klist structure
kunit: increase KUNIT_LOG_SIZE to 2048 bytes
kunit: Use gfp in kunit_alloc_resource() kernel-doc
kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py
kunit: tool: remove unused imports and variables
kunit: tool: add subscripts for type annotations where appropriate
kunit: fix bug of extra newline characters in debugfs logs
kunit: fix bug in the order of lines in debugfs logs
kunit: fix bug in debugfs logs of parameterized tests
kunit: tool: Add support for m68k under QEMU
linux-kselftest-next-6.4-rc1
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRFWfMACgkQCwJExA0N
QxxXPg/9Fjo+jLEt6/CraOQeeyQsmzQkPiXutqrjUoUwaMGpOoon+En39g1eDAiZ
5iJE8lu47ldUJNp7Pn2THfcLXJ4Le+12s0imWuXaDf6VdFUNsyd9AZMfziMmeUGa
6GXLF+FUzT7uXbIuIrPvZmyX3QFL8WEcywmgFRlAyfLFcg37uUgiKSl7ATrgWCEw
jlPELO2p2+t+EB0/n1VMoXup6tGD6tpuuNH50rDeRMTV+cW7wKTvJyFXbMvGThcx
YfjzofYm+drX5gka/XQYynZehTNcbASjkvYYEqm3piwWCyfqt4Y1aOAA8fS9h+86
jKGF/pxz1Zcl7vgZW5WixKaJ+cMf8gfCMRsny6h2x2pmqKwqSJtg+jk8XqNkJQnh
JKwRxosLjEQMyIRPMH9PUjBQD46VuC2nvu4SY5apxkbHH2iG8SKG/DNIpSFXB2m0
7BuziwKZe9uw671vaZU2b0r5eeCxETtuFwlHWN9Rl954g0zeueyUuBqg0ibpQois
Jp2SvwVR+oXRLHoqDMrCEDk0WEGJ9WD/mMxW4iHGMYRwoXb1RCKzFWEoYw3dFAxw
N/knx2IeEFshiKIAqaGnpER1chLe6ChX/5rDXkj1YLDwHEJdowHhE9/GxqGbhEUi
zOzZIAGMVTOZp6g/TIutGYfpyxUAXj19eRoAbA/KkLmv4d+s1nY=
=992w
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
* tag 'linux-kselftest-next-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests/resctrl: Fix incorrect error return on test complete
selftests/resctrl: Remove duplicate codes that clear each test result file
selftests/resctrl: Commonize the signal handler register/unregister for all tests
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Return MBA check result and make it to output message
selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test
selftests/resctrl: Use correct exit code when tests fail
kselftest/arm64: Convert za-fork to use kselftest.h
kselftest: Support nolibc
tools/nolibc/stdio: Implement vprintf()
selftests/resctrl: Correct get_llc_perf() param in function comment
selftests/resctrl: Use remount_resctrlfs() consistently with boolean
selftests/resctrl: Change name from CBM_MASK_PATH to INFO_PATH
selftests/resctrl: Change initialize_llc_perf() return type to void
selftests/resctrl: Replace obsolete memalign() with posix_memalign()
selftests/resctrl: Check for return value after write_schemata()
selftests/resctrl: Allow ->setup() to return errors
selftests/resctrl: Move ->setup() call outside of test specific branches
selftests/resctrl: Return NULL if malloc_and_init_memory() did not alloc mem
...
o MAINTAINERS files additions and changes.
o Fix hotplug warning in nohz code.
o Tick dependency changes by Zqiang.
o Lazy-RCU shrinker fixes by Zqiang.
o rcu-tasks stall reporting improvements by Neeraj.
o Initial changes for renaming of k[v]free_rcu() to its new k[v]free_rcu_mightsleep()
name for robustness.
o Documentation Updates:
o Significant changes to srcu_struct size.
o Deadlock detection for srcu_read_lock() vs synchronize_srcu() from Boqun.
o rcutorture and rcu-related tool, which are targeted for v6.4 from Boqun's tree.
o Other misc changes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEcoCIrlGe4gjE06JJqA4nf2o45hAFAmQuBnIACgkQqA4nf2o4
5hACVRAAoXu7/gfh5Pjw9O4E4pCdPJKsZZVYrcrVGrq6NAxRn6M1SgurAdC5grj2
96x0waoGaiO82V0H5iJMcKdAVu67x9R8WaQ1JoxN75Efn8h9W4TguB87TV1gk0xS
eZ18b/CyEaM5mNb80DFFF4FLohy5737p/kNTMqXQdUyR1BsDl16iRMgjiBiFhNUx
yPo8Y2kC2U2OTbldZgaE7s9bQO3xxEcifx93sGWsAex/gx54FYNisiwSlCOSgOE+
XkYo/OKk8Xvr82tLVX8XQVEPCMJ+rxea8T5zSs8/alvsPq7gA8wW3y6fsoa3vUU/
+Gd+W+Q/OsONIDtp8rQAY1qsD0ScDpaR8052RSH0zTa7pj8HsQgE5PjZ+cJW0SEi
cKN+Oe8+ETqKald+xZ6PDf58O212VLrru3RpQWrOQcJ7fmKmfT4REK0RcbLgg4qT
CBgOo6eg+ub4pxq2y11LZJBNTv1/S7xAEzFE0kArew64KB2gyVud0VJRZVAJnEfe
93QQVDFrwK2bhgWQZ6J6IbTvGeQW0L93IibuaU6jhZPR283VtUIIvM7vrOylN7Fq
4jsae0T7YGYfKUhgTpm7rCnm8A/D3Ni8MY0sKYYgDSyKmZUsnpI5wpx1xke4lwwV
ErrY46RCFa+k8wscc6iWfB4cGXyyFHyu+wtyg0KpFn5JAzcfz4A=
=Rgbj
-----END PGP SIGNATURE-----
Merge tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux
Pull RCU updates from Joel Fernandes:
- Updates and additions to MAINTAINERS files, with Boqun being added to
the RCU entry and Zqiang being added as an RCU reviewer.
I have also transitioned from reviewer to maintainer; however, Paul
will be taking over sending RCU pull-requests for the next merge
window.
- Resolution of hotplug warning in nohz code, achieved by fixing
cpu_is_hotpluggable() through interaction with the nohz subsystem.
Tick dependency modifications by Zqiang, focusing on fixing usage of
the TICK_DEP_BIT_RCU_EXP bitmask.
- Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n
kernels, fixed by Zqiang.
- Improvements to rcu-tasks stall reporting by Neeraj.
- Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for
increased robustness, affecting several components like mac802154,
drbd, vmw_vmci, tracing, and more.
A report by Eric Dumazet showed that the API could be unknowingly
used in an atomic context, so we'd rather make sure they know what
they're asking for by being explicit:
https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/
- Documentation updates, including corrections to spelling,
clarifications in comments, and improvements to the srcu_size_state
comments.
- Better srcu_struct cache locality for readers, by adjusting the size
of srcu_struct in support of SRCU usage by Christoph Hellwig.
- Teach lockdep to detect deadlocks between srcu_read_lock() vs
synchronize_srcu() contributed by Boqun.
Previously lockdep could not detect such deadlocks, now it can.
- Integration of rcutorture and rcu-related tools, targeted for v6.4
from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis
module parameter, and more
- Miscellaneous changes, various code cleanups and comment improvements
* tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits)
checkpatch: Error out if deprecated RCU API used
mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()
rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep()
ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access
rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()
rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels
rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check
rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
rcu/trace: use strscpy() to instead of strncpy()
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
...
o Add support for loongarch.
o Fix stack-protector issues.
o Support additional integral types and signal-related macros.
o Add support for stdin, stdout, and stderr.
o Add getuid() and geteuid().
o Allow S_I* macros to be overridden by program.
o Defer to linux/fcntl.h and linux/stat.h to avoid duplicate
definitions.
o Many improvements to the self tests.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmQsjLUTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jKuOD/9Y18ncS7+K7D9m89h5zwqBoPeNdP65
fC/kSgBeLreqO9hR7fsDegPk/pC8ZcMZj7iiLMV8dQIan0N74Hcd4mnL+Nf9LdkP
KpwMfN/DG7h+p69Ug4d6qQqSr7jwEGa1NE280NguNtugs+kanSayVoSh0lhWPp7K
5t9s4oITOQNa69EgluB25b1qbaY36DibsBbt7vO2b/Rsmbw13UdgXQxKBQTgEOmQ
yfNqjSYatRrl7pQQiQQDwV8xzd84jNLgNKsTDtG18kUWqYBhleMDQxt54betRFZ2
O3SNylt7pJKVFsgjAB90TcOl4SRmzXJ0+jNLIuNYJkqlQaPjQy29N4nBBUp2ciBf
sSkL4OBfaegJlnBWZ4LwyV+ZTPLGQ5PZ78EhfqMpQzVbEibL5Es/pPAVuUxLVK4B
U03+sh4IAr2aK4aXcxXgDWYr/HmKP6DWrlp/oH679sW1e2mUpnjghOfuGMYwLifo
d+a/M2VH8++KOV/0aZf/Z4p1Jbexn6/jz4pW7DTfiARduUtlaN5F69oAr85q5vrS
T1dZ84ZjmxrWJUYEJjfs5DM+fvPzsOyho8ae9TRlxnJJtW6UOZ0TTUo+mUDfgqWQ
OU6XteK6zTQmE5H9aVOVwO/jl0uXWzmHAoJTAynpPgEkP01I+t6Vx8e1PAK/oPbT
fJdUIEFiMo1qMg==
=BS2x
-----END PGP SIGNATURE-----
Merge tag 'nolibc.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Add support for loongarch
- Fix stack-protector issues
- Support additional integral types and signal-related macros
- Add support for stdin, stdout, and stderr
- Add getuid() and geteuid()
- Allow S_I* macros to be overridden by program
- Defer to linux/fcntl.h and linux/stat.h to avoid duplicate
definitions
- Many improvements to the selftests
* tag 'nolibc.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (22 commits)
tools/nolibc: x86_64: add stackprotector support
tools/nolibc: i386: add stackprotector support
tools/nolibc: tests: add test for -fstack-protector
tools/nolibc: tests: fold in no-stack-protector cflags
tools/nolibc: add support for stack protector
tools/nolibc: tests: constify test_names
tools/nolibc: add helpers for wait() signal exits
tools/nolibc: add definitions for standard fds
selftests/nolibc: Adjust indentation for Makefile
selftests/nolibc: Add support for LoongArch
tools/nolibc: Add support for LoongArch
tools/nolibc: Add statx() and make stat() rely on statx() if necessary
tools/nolibc: Include linux/fcntl.h and remove duplicate code
tools/nolibc: check for S_I* macros before defining them
selftests/nolibc: skip the chroot_root and link_dir tests when not privileged
tools/nolibc: add getuid() and geteuid()
tools/nolibc: add tests for the integer limits in stdint.h
tools/nolibc: enlarge column width of tests
tools/nolibc: add integer types and integer limit macros
tools/nolibc: add stdint.h
...
This update adds tests for nested locking and also adds support for
testing raw spinlocks in PREEMPT_RT kernels.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmQs8kETHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jImaEACFX7CPZyRUG32Yo6wdzxRHuZPid6cR
Si5GyRiTJzKuS9aDgl6jMYRvFXSXE9Xx1TVX0ad6fkNW40IMAkXprmUkQwN3ZtSb
K/pOLyOSFkm/XDrfDinPU46kh+DgSrAZtB3jhELa5doRxr9lWWSnwV4HoBx64T3/
84LEyIi47OSVxucaUWfimDUyBbNl4Oq95hdpD3hwxyxq5nsv2Q+oLWy2syXeegOz
3ru4Aswg40cwjYT9tjnrfZKZeteby2q55JYUDvP3kPfu/utyMyafUOda0DhHFdRB
dT1EISkY/zyqf3orTfghLpYJEplDNkSKhVtyn2dQcRHhoUJ9e/8xnRclqVo4tkqv
QWUZHJFar08P6iNBh9Z/YiM8D4kpeQNVCmR29h094BlQMbTLYbcZUjJ3YeE5nsz+
Bid7Ln6aBvGb3Ui6EWq7FVfcGzrPms3MUXw6nQLh6HaQg0F2g73MKS9Wd75OjEc/
cKPxkqzC35pM87eEf0xBlJzudZYxkYhP8Rt0bCGt/tq/pZAulCyOgnET2mcBv7Z0
94uEIGVvswVPB9/VKyqf7mHVrk/uJeygGKD1++4pzGumdhfsaM1dl3g6DkrSgK1j
A/kAApkhha8Zacj3oAAQuBPi8JuIqUFQvfbA8Os6d/8PXfTRaaMnV9DRS7wcohkP
7haDPwX8pHj+Gg==
=QAhX
-----END PGP SIGNATURE-----
Merge tag 'locktorture.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull locktorture updates from Paul McKenney:
"This adds tests for nested locking and also adds support for testing
raw spinlocks in PREEMPT_RT kernels"
* tag 'locktorture.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
locktorture: Add raw_spinlock* torture tests for PREEMPT_RT kernels
locktorture: With nested locks, occasionally skip main lock
locktorture: Add nested locking to rtmutex torture tests
locktorture: Add nested locking to mutex torture tests
locktorture: Add nested_[un]lock() hooks and nlocks parameter
Support the command testing in a unit-test fashion.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230423221231.6357-1-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Merge ACPICA material for 6.4-rc1:
- Delete bogus node_array array of pointers from AEST table (Jessica
Clarke).
- Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang).
- Add missing macro ACPI_FUNCTION_TRACE() for acpi_ns_repair_HID()
(Xiongfeng Wang).
- Add missing tables to astable (Pedro Falcato).
- Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen).
- Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski).
- Add support for Arm's MPAM ACPI table version 2 (Hesham Almatary).
- Update all copyrights/signons in ACPICA to 2023 (Bob Moore).
- Add support for ClockInput resource (v6.5) (Niyas Sait).
- Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L).
- Add structure definitions for the RISC-V RHCT ACPI table (Sunil V L).
- Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein).
- Make ACPICA code support flexible arrays properly (Kees Cook).
- Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red).
- Add os specific support for Zephyr RTOS to ACPICA (Najumon).
- Update version to 20230331 (Bob Moore).
* acpica: (32 commits)
ACPICA: Update version to 20230331
ACPICA: add os specific support for Zephyr RTOS
ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects
ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array
ACPICA: acpi_madt_oem_data: Fix flexible array member definition
ACPICA: acpi_dmar_andd: Replace 1-element array with flexible array
ACPICA: acpi_pci_routing_table: Replace fixed-size array with flex array member
ACPICA: struct acpi_resource_dma: Replace 1-element array with flexible array
ACPICA: Introduce ACPI_FLEX_ARRAY
ACPICA: struct acpi_nfit_interleave: Replace 1-element array with flexible array
ACPICA: actbl2: Replace 1-element arrays with flexible arrays
ACPICA: actbl1: Replace 1-element arrays with flexible arrays
ACPICA: struct acpi_resource_vendor: Replace 1-element array with flexible array
ACPICA: Avoid undefined behavior: load of misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within null pointer
ACPICA: Avoid undefined behavior: applying zero offset to null pointer
...
The bulk of the commits here are for the conversion of drivers to use
void remove callbacks but there's a reasonable amount of other stuff
going on, the pace of development with the SOF code continues to be high
and there's a bunch of new drivers too:
- More core cleanups from Morimto-san.
- Update drivers to have remove() callbacks returning void, mostly
mechanical with some substantial changes.
- Continued feature and simplification work on SOF, including addition
of a no-DSP mode for bringup, HDA MLink and extensions to the IPC4
protocol.
- Hibernation support for CS35L45.
- More DT binding conversions.
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas R-Car
Gen4, Rockchip RK3588 and TI TAS5733.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmRGdEsACgkQJNaLcl1U
h9BNVAf+Ijupg3dhAl6847v1PRYXkYK6YjAayhNd0xRoRePKnnv1zkrsXSBzxZUM
8KHpQDUJyfQbPnE2JRmr1WfHSoNDl/NXdJl+lefPBgol5bzxRji68hDPGjA0645o
R6vzrqLNw8mh3jwptXfEpQfKH8tIvwOeMeLkncDvsm0ZCUR2GhYnjB1g82ih0ssx
lvh36PdCRF0e/ruHxkiVn9b/riID65oTRkN6IxJqoPnqJZVyCiqmiJcfWePpaPir
4R9Dyk+REos/aCLdne1g6H21Tgi0td+blv6empqwdEXG41VSdRMTrOZb1ZISKmpF
ggPbKsk9BjJFBCewllHXJ0YEcBp9/g==
=TVxt
-----END PGP SIGNATURE-----
Merge tag 'asoc-v6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Updates for v6.4
The bulk of the commits here are for the conversion of drivers to use
void remove callbacks but there's a reasonable amount of other stuff
going on, the pace of development with the SOF code continues to be high
and there's a bunch of new drivers too:
- More core cleanups from Morimto-san.
- Update drivers to have remove() callbacks returning void, mostly
mechanical with some substantial changes.
- Continued feature and simplification work on SOF, including addition
of a no-DSP mode for bringup, HDA MLink and extensions to the IPC4
protocol.
- Hibernation support for CS35L45.
- More DT binding conversions.
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas R-Car
Gen4, Rockchip RK3588 and TI TAS5733.
The cxl_mem driver uses debugfs to support poison inject and clear.
Add debugfs to the list of required symbols so that cxl_test can
emulate those poison operations.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/4f3aab57fbf1cc3ccde2eb887c5d90566c8d0e90.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL devices may report a maximum number of addresses that a device
allows to be poisoned using poison injection. When cxl_test creates
mock CXL memory devices, it defaults to MOCK_INJECT_DEV_MAX==88 for
all mocked memdevs.
Add a sysfs attribute, poison_inject_max to module cxl_mock_mem so
that users can set a custom device injection limit. Fail, and return
-EBUSY, if the mock poison list is not empty.
/sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
A simple usage model is to set the attribute before running a test in
order to emulate a device's poison handling.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/0f25b2862b90013545450222d2199e435c6cc11a.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prior to poison inject support, the mock of 'Get Poison List'
returned a poison list containing a single mocked error record.
Following the addition of poison inject and clear support to the
mock driver, use the mock_poison_list[], rather than faking an
error record. Mock_poison_list[] list tracks the actual poison
inject and clear requests issued by userspace.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/0f4242c81821f4982b02cb1009c22783ef66b2f1.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Mock the clear of poison by deleting the device:address entry from
the mock_poison_list[]. Behave like a real CXL device and do not fail
if the address is not in the poison list, but offer a dev_dbg()
message.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/ecf19743c6572e60971bbd078f67d520cf5bca5d.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Mock the injection of poison by storing the device:address entries in
mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
device and 128 total entries for the cxl_test mock driver.
Introducing the mock_poison[] list here, makes it available for use in
the mock of Clear Poison, and the mock of Get Poison List.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/f6b7f03541eaa8c2260d3eafadd04afe3f0d7962.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Make mock memdevs support the Get Poison List mailbox command.
Return a fake poison error record when the get poison list command
is issued.
This supports testing the kernel tracing and cxl list capabilities
for media errors.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/14d661ce3e3a32b7d8e76b8ecc5eb88343b3d09c.1681838292.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The QFQ qdisc class has parameter bounds that are not being
checked for correctness.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZELn8wAKCRDbK58LschI
g1khAQC1nmXPuKjM4EAfFK8Ysb3KoF8ADmpE97n+/HEDydCagwD/bX0+NABR75Nh
ueGcoU1TcfcbshDzrH0s+C95owZDZw4=
=BeZM
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-04-21
We've added 71 non-merge commits during the last 8 day(s) which contain
a total of 116 files changed, 13397 insertions(+), 8896 deletions(-).
The main changes are:
1) Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward,
from Florian Westphal.
2) Fix race between btf_put and btf_idr walk which caused a deadlock,
from Alexei Starovoitov.
3) Second big batch to migrate test_verifier unit tests into test_progs
for ease of readability and debugging, from Eduard Zingerman.
4) Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree, from Dave Marchevsky.
5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF
selftests into libbpf-provided bpf_helpers.h header and improve
kfunc handling, from Andrii Nakryiko.
6) Support 64-bit pointers to kfuncs needed for archs like s390x,
from Ilya Leoshkevich.
7) Support BPF progs under getsockopt with a NULL optval,
from Stanislav Fomichev.
8) Improve verifier u32 scalar equality checking in order to enable
LLVM transformations which earlier had to be disabled specifically
for BPF backend, from Yonghong Song.
9) Extend bpftool's struct_ops object loading to support links,
from Kui-Feng Lee.
10) Add xsk selftest follow-up fixes for hugepage allocated umem,
from Magnus Karlsson.
11) Support BPF redirects from tc BPF to ifb devices,
from Daniel Borkmann.
12) Add BPF support for integer type when accessing variable length
arrays, from Feng Zhou.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits)
selftests/bpf: verifier/value_ptr_arith converted to inline assembly
selftests/bpf: verifier/value_illegal_alu converted to inline assembly
selftests/bpf: verifier/unpriv converted to inline assembly
selftests/bpf: verifier/subreg converted to inline assembly
selftests/bpf: verifier/spin_lock converted to inline assembly
selftests/bpf: verifier/sock converted to inline assembly
selftests/bpf: verifier/search_pruning converted to inline assembly
selftests/bpf: verifier/runtime_jit converted to inline assembly
selftests/bpf: verifier/regalloc converted to inline assembly
selftests/bpf: verifier/ref_tracking converted to inline assembly
selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
selftests/bpf: verifier/map_in_map converted to inline assembly
selftests/bpf: verifier/lwt converted to inline assembly
selftests/bpf: verifier/loops1 converted to inline assembly
selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
selftests/bpf: verifier/direct_packet_access converted to inline assembly
selftests/bpf: verifier/d_path converted to inline assembly
selftests/bpf: verifier/ctx converted to inline assembly
selftests/bpf: verifier/btf_ctx_access converted to inline assembly
selftests/bpf: verifier/bpf_get_stack converted to inline assembly
...
====================
Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
if sch_fq is configured with "initial quantum" having values greater than
INT_MAX, the first assignment of "credit" does signed integer overflow to
a very negative value.
In this situation, the syzkaller script provided by Cristoph triggers the
CPU soft-lockup warning even with few sockets. It's not an infinite loop,
but "credit" wasn't probably meant to be minus 2Gb for each new flow.
Capping "initial quantum" to INT_MAX proved to fix the issue.
v2: validation of "initial quantum" is done in fq_policy, instead of open
coding in fq_change() _ suggested by Jakub Kicinski
Reported-by: Christoph Paasch <cpaasch@apple.com>
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/377
Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/7b3a3c7e36d03068707a021760a194a8eb5ad41a.1682002300.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds three new tests to the selftests for KSM. These tests use the
new prctl API's to enable and disable KSM.
1) add new prctl flags to prctl header file in tools dir
This adds the new prctl flags to the include file prct.h in the
tools directory. This makes sure they are available for testing.
2) add KSM prctl merge test to ksm_tests
This adds the -t option to the ksm_tests program. The -t flag
allows to specify if it should use madvise or prctl ksm merging.
3) add two functions for debugging merge outcome for ksm_tests
This adds two functions to report the metrics in /proc/self/ksm_stat
and /sys/kernel/debug/mm/ksm. The debug output is enabled with the
-d option.
4) add KSM prctl test to ksm_functional_tests
This adds a test to the ksm_functional_test that verifies that the
prctl system call to enable / disable KSM works.
5) add KSM fork test to ksm_functional_test
Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
by the child process.
Link: https://lkml.kernel.org/r/20230418051342.1919757-4-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test suite (with 10 more sub-tests) to cover RO pinning against
fork() over uffd-wp. It covers both:
(1) Early CoW test in fork() when page pinned,
(2) page unshare due to RO longterm pin.
They are:
Testing wp-fork-pin on anon... done
Testing wp-fork-pin on shmem... done
Testing wp-fork-pin on shmem-private... done
Testing wp-fork-pin on hugetlb... done
Testing wp-fork-pin on hugetlb-private... done
Testing wp-fork-pin-with-event on anon... done
Testing wp-fork-pin-with-event on shmem... done
Testing wp-fork-pin-with-event on shmem-private... done
Testing wp-fork-pin-with-event on hugetlb... done
Testing wp-fork-pin-with-event on hugetlb-private... done
CONFIG_GUP_TEST needed or they'll be skipped.
Testing wp-fork-pin on anon... skipped [reason: Possibly CONFIG_GUP_TEST missing or unprivileged]
Note that the major test goal is on private memory, but no hurt to also run
all of them over shared because shared memory should work the same.
Link: https://lkml.kernel.org/r/20230417195317.898696-7-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The macro and facility can be reused in other tests too. Make it general.
Link: https://lkml.kernel.org/r/20230417195317.898696-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Extend it to all types of mem, meanwhile add one parallel test when
EVENT_FORK is enabled, where uffd-wp bits should be persisted rather than
dropped.
Since at it, rename the test to "wp-fork" to better show what it means.
Making the new test called "wp-fork-with-event".
Before:
Testing pagemap on anon... done
After:
Testing wp-fork on anon... done
Testing wp-fork on shmem... done
Testing wp-fork on shmem-private... done
Testing wp-fork on hugetlb... done
Testing wp-fork on hugetlb-private... done
Testing wp-fork-with-event on anon... done
Testing wp-fork-with-event on shmem... done
Testing wp-fork-with-event on shmem-private... done
Testing wp-fork-with-event on hugetlb... done
Testing wp-fork-with-event on hugetlb-private... done
Link: https://lkml.kernel.org/r/20230417195317.898696-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namely:
"-f": add a wildcard filter for tests to run
"-l": list tests rather than running any
"-h": help msg
Link: https://lkml.kernel.org/r/20230417195317.898696-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Test verifier/value_ptr_arith automatically converted to use inline assembly.
Test cases "sanitation: alu with different scalars 2" and
"sanitation: alu with different scalars 3" are updated to
avoid -ENOENT as return value, as __retval() annotation
only supports numeric literals.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-25-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test verifier/unpriv semi-automatically converted to use inline assembly.
The verifier/unpriv.c had to be split in two parts:
- the bulk of the tests is in the progs/verifier_unpriv.c;
- the single test that needs `struct bpf_perf_event_data`
definition is in the progs/verifier_unpriv_perf.c.
The tests above can't be in a single file because:
- first requires inclusion of the filter.h header
(to get access to BPF_ST_MEM macro, inline assembler does
not support this isntruction);
- the second requires vmlinux.h, which contains definitions
conflicting with filter.h.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-23-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test verifier/loops1 automatically converted to use inline assembly.
There are a few modifications for the converted tests.
"tracepoint" programs do not support test execution, change program
type to "xdp" (which supports test execution) for the following tests
that have __retval tags:
- bounded loop, count to 4
- bonded loop containing forward jump
Also, remove the __retval tag for test:
- bounded loop, count from positive unknown to 4
As it's return value is a random number.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-10-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In order to express test cases that use bpf_tail_call() intrinsic it
is necessary to have several programs to be loaded at a time.
This commit adds __auxiliary annotation to the set of annotations
supported by test_loader.c. Programs marked as auxiliary are always
loaded but are not treated as a separate test.
For example:
void dummy_prog1(void);
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(max_entries, 4);
__uint(key_size, sizeof(int));
__array(values, void (void));
} prog_map SEC(".maps") = {
.values = {
[0] = (void *) &dummy_prog1,
},
};
SEC("tc")
__auxiliary
__naked void dummy_prog1(void) {
asm volatile ("r0 = 42; exit;");
}
SEC("tc")
__description("reference tracking: check reference or tail call")
__success __retval(0)
__naked void check_reference_or_tail_call(void)
{
asm volatile (
"r2 = %[prog_map] ll;"
"r3 = 0;"
"call %[bpf_tail_call];"
"r0 = 0;"
"exit;"
:: __imm(bpf_tail_call),
: __clobber_all);
}
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend prog_tests with two test cases:
# ./test_progs --allow=verifier_netfilter_retcode
#278/1 verifier_netfilter_retcode/bpf_exit with invalid return code. test1:OK
#278/2 verifier_netfilter_retcode/bpf_exit with valid return code. test2:OK
#278/3 verifier_netfilter_retcode/bpf_exit with valid return code. test3:OK
#278/4 verifier_netfilter_retcode/bpf_exit with invalid return code. test4:OK
#278 verifier_netfilter_retcode:OK
This checks that only accept and drop (0,1) are permitted.
NF_QUEUE could be implemented later if we can guarantee that attachment
of such programs can be rejected if they get attached to a pf/hook that
doesn't support async reinjection.
NF_STOLEN could be implemented via trusted helpers that can guarantee
that the skb will eventually be free'd.
v4: test case for bpf_nf_ctx access checks, requested by Alexei Starovoitov.
v5: also check ctx->{state,skb} can be dereferenced (Alexei).
# ./test_progs --allow=verifier_netfilter_ctx
#281/1 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
#281/2 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
#281/3 verifier_netfilter_ctx/netfilter invalid context access, past end of ctx:OK
#281/4 verifier_netfilter_ctx/netfilter invalid context, write:OK
#281/5 verifier_netfilter_ctx/netfilter valid context read and invalid write:OK
#281/6 verifier_netfilter_ctx/netfilter test prog with skb and state read access:OK
#281/7 verifier_netfilter_ctx/netfilter test prog with skb and state read access @unpriv:OK
#281 verifier_netfilter_ctx:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
This checks:
1/2: partial reads of ctx->{skb,state} are rejected
3. read access past sizeof(ctx) is rejected
4. write to ctx content, e.g. 'ctx->skb = NULL;' is rejected
5. ctx->state content cannot be altered
6. ctx->state and ctx->skb can be dereferenced
7. ... same program fails for unpriv (CAP_NET_ADMIN needed).
Link: https://lore.kernel.org/bpf/20230419021152.sjq4gttphzzy6b5f@dhcp-172-26-102-232.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20230420201655.77kkgi3dh7fesoll@MacBook-Pro-6.local/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-8-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Put the flag MAP_HUGE_2MB in the correct flags argument instead of the
wrong offset argument.
Fixes: 2ddade3229 ("selftests/xsk: Fix munmap for hugepage allocated umem")
Reported-by: Kal Cutter Conley <kal.conley@dectris.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230421062208.3772-1-magnus.karlsson@gmail.com
When calculating the address of the refcount_t struct within a local
kptr, bpf_refcount_acquire_impl should add refcount_off bytes to the
address of the local kptr. Due to some missing parens, the function is
incorrectly adding sizeof(refcount_t) * refcount_off bytes. This patch
fixes the calculation.
Due to the incorrect calculation, bpf_refcount_acquire_impl was trying
to refcount_inc some memory well past the end of local kptrs, resulting
in kasan and refcount complaints, as reported in [0]. In that thread,
Florian and Eduard discovered that bpf selftests written in the new
style - with __success and an expected __retval, specifically - were
not actually being run. As a result, selftests added in bpf_refcount
series weren't really exercising this behavior, and thus didn't unearth
the bug.
With this fixed behavior it's safe to revert commit 7c4b96c000
("selftests/bpf: disable program test run for progs/refcounted_kptr.c"),
this patch does so.
[0] https://lore.kernel.org/bpf/ZEEp+j22imoN6rn9@strlen.de/
Fixes: 7c50b1cb76 ("bpf: Add bpf_refcount_acquire kfunc")
Reported-by: Florian Westphal <fw@strlen.de>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20230421074431.3548349-1-davemarchevsky@fb.com
* kvm-arm64/smccc-filtering:
: .
: SMCCC call filtering and forwarding to userspace, courtesy of
: Oliver Upton. From the cover letter:
:
: "The Arm SMCCC is rather prescriptive in regards to the allocation of
: SMCCC function ID ranges. Many of the hypercall ranges have an
: associated specification from Arm (FF-A, PSCI, SDEI, etc.) with some
: room for vendor-specific implementations.
:
: The ever-expanding SMCCC surface leaves a lot of work within KVM for
: providing new features. Furthermore, KVM implements its own
: vendor-specific ABI, with little room for other implementations (like
: Hyper-V, for example). Rather than cramming it all into the kernel we
: should provide a way for userspace to handle hypercalls."
: .
KVM: selftests: Fix spelling mistake "KVM_HYPERCAL_EXIT_SMC" -> "KVM_HYPERCALL_EXIT_SMC"
KVM: arm64: Test that SMC64 arch calls are reserved
KVM: arm64: Prevent userspace from handling SMC64 arch range
KVM: arm64: Expose SMC/HVC width to userspace
KVM: selftests: Add test for SMCCC filter
KVM: selftests: Add a helper for SMCCC calls with SMC instruction
KVM: arm64: Let errors from SMCCC emulation to reach userspace
KVM: arm64: Return NOT_SUPPORTED to guest for unknown PSCI version
KVM: arm64: Introduce support for userspace SMCCC filtering
KVM: arm64: Add support for KVM_EXIT_HYPERCALL
KVM: arm64: Use a maple tree to represent the SMCCC filter
KVM: arm64: Refactor hvc filtering to support different actions
KVM: arm64: Start handling SMCs from EL1
KVM: arm64: Rename SMC/HVC call handler to reflect reality
KVM: arm64: Add vm fd device attribute accessors
KVM: arm64: Add a helper to check if a VM has ran once
KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/timer-vm-offsets: (21 commits)
: .
: This series aims at satisfying multiple goals:
:
: - allow a VMM to atomically restore a timer offset for a whole VM
: instead of updating the offset each time a vcpu get its counter
: written
:
: - allow a VMM to save/restore the physical timer context, something
: that we cannot do at the moment due to the lack of offsetting
:
: - provide a framework that is suitable for NV support, where we get
: both global and per timer, per vcpu offsetting, and manage
: interrupts in a less braindead way.
:
: Conflict resolution involves using the new per-vcpu config lock instead
: of the home-grown timer lock.
: .
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: selftests: Augment existing timer test to handle variable offset
KVM: arm64: selftests: Deal with spurious timer interrupts
KVM: arm64: selftests: Add physical timer registers to the sysreg list
KVM: arm64: nv: timers: Support hyp timer emulation
KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset
KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co
KVM: arm64: timers: Abstract the number of valid timers per vcpu
KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handling
KVM: arm64: Elide kern_hyp_va() in VHE-specific parts of the hypervisor
KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data
KVM: arm64: timers: Abstract per-timer IRQ access
KVM: arm64: timers: Rationalise per-vcpu timer init
KVM: arm64: timers: Allow save/restoring of the physical timer
KVM: arm64: timers: Allow userspace to set the global counter offset
KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM
KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2
KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer
arm64: Add HAS_ECV_CNTPOFF capability
arm64: Add CNTPOFF_EL2 register definition
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add test cases for bridge neighbor suppression, testing both per-port
and per-{Port, VLAN} neighbor suppression with both ARP and NS packets.
Example truncated output:
# ./test_bridge_neigh_suppress.sh
[...]
Tests passed: 148
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC Merge layer (IEEE 802.3-2018 clause 99) does all the heavy
lifting for Frame Preemption (IEEE 802.1Q-2018 clause 6.7.2), a TSN
feature for minimizing latency.
Preemptible traffic is different on the wire from normal traffic in
incompatible ways. If we send a preemptible packet and the link partner
doesn't support preemption, it will drop it as an error frame and we
will never know. The MAC Merge layer has a control plane of its own,
which can be manipulated (using ethtool) in order to negotiate this
capability with the link partner (through LLDP).
Actually the TLV format for LLDP solves this problem only partly,
because both partners only advertise:
- if they support preemption (RX and TX)
- if they have enabled preemption (TX)
so we cannot tell the link partner what to do - we cannot force it to
enable reception of our preemptible packets.
That is fully solved by the verification feature, where the local device
generates some small probe frames which look like preemptible frames
with no useful content, and the link partner is obliged to respond to
them if it supports the standard. If the verification times out, we know
that preemption isn't active in our TX direction on the link.
Having clarified the definition, this selftest exercises the manual
(ethtool) configuration path of 2 link partners (with and without
verification), and the LLDP code path, using the openlldp project.
The test also verifies the TX activity of the MAC Merge layer by
sending traffic through a traffic class configured as preemptible
(using mqprio). There isn't a good way to make this really portable
(user space cannot find out how many traffic classes there are for
a device), but I chose num_tc 4 here, that should work reasonably well.
I also know that some devices (stmmac) only permit TXQ0 to be
preemptible, so this is why PREEMPTIBLE_PRIO was strategically chosen
as 0. Even if other hardware is more configurable, this test should
cover the baseline.
This is not really a "forwarding" selftest, but I put it near the other
"ethtool" selftests.
$ ./ethtool_mm.sh eno0 swp0
TEST: Manual configuration with verification: eno0 to swp0 [ OK ]
TEST: Manual configuration with verification: swp0 to eno0 [ OK ]
TEST: Manual configuration without verification: eno0 to swp0 [ OK ]
TEST: Manual configuration without verification: swp0 to eno0 [ OK ]
TEST: Manual configuration with failed verification: eno0 to swp0 [ OK ]
TEST: Manual configuration with failed verification: swp0 to eno0 [ OK ]
TEST: LLDP [ OK ]
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Counters for the MAC Merge layer and preemptible MAC have standardized
so far on using structured ethtool stats as opposed to the driver
specific names and meanings.
Benefit from that rare opportunity and introduce a helper to lib.sh for
querying standardized counters, in the hope that these will take off for
other uses as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlxsw selftests often invoke a bail_on_lldpad() helper to make sure LLDPAD
is not running, to prevent conflicts between the QoS configuration applied
through TC or DCB command line tool, and the DCB configuration that LLDPAD
might apply. This helper might be useful to others. Move the function to
lib.sh, and parameterize to make reusable in other contexts.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver-specific wrappers of these selftests invoke bail_on_lldpad to
make sure that LLDPAD doesn't trample the configuration. The function
bail_on_lldpad is going to move to lib.sh in the next patch. With that, it
won't be visible for the wrappers before sourcing the framework script. And
after sourcing it, it is too late: the selftest will have run by then.
One option might be to source NUM_NETIFS=0 lib.sh from the wrapper, but
even if that worked (it might, it might not), that seems cumbersome. lib.sh
is doing fair amount of stuff, and even if it works today, it does not look
particularly solid as a solution.
Instead, introduce a hook, sch_tbf_pre_hook(), that when available, gets
invoked. Move the bail to the hook.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two test cases:
- "valid read map access into a read-only array 1" and
- "valid read map access into a read-only array 2"
Expect that map_array_ro map is filled with mock data. This logic was
not taken into acount during initial test conversion.
This commit modifies prog_tests/verifier.c entry point for this test
to fill the map.
Fixes: a3c830ae02 ("selftests/bpf: verifier/array_access.c converted to inline assembly")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When a test case is annotated with __retval tag the test_loader engine
would use libbpf's bpf_prog_test_run_opts() to do a test run of the
program and compare retvals.
This commit allows to perform arbitrary actions on bpf object right
before test loader invokes bpf_prog_test_run_opts(). This could be
used to setup some state for program execution, e.g. fill some maps.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florian Westphal found a bug in and suggested a fix for test_loader.c
processing of __retval tag. Because of this bug the function
test_loader.c:do_prog_test_run() never executed and all __retval test
tags were ignored.
If this bug is fixed a number of test cases from
progs/verifier_array_access.c fail with retval not matching the
expected value. This test was recently converted to use test_loader.c
and inline assembly in [1]. When doing the conversion I missed the
important detail of test_verifier.c operation: when it creates
fixup_map_array_ro, fixup_map_array_wo and fixup_map_array_small it
populates these maps with a dummy record.
Disabling the __retval checks for the affected verifier_array_access
in this commit to avoid false-postivies in any potential bisects.
The issue is addressed in the next patch.
I verified that the __retval tags are now respected by changing
expected return values for all tests annotated with __retval, and
checking that these tests started to fail.
[1] https://lore.kernel.org/bpf/20230325025524.144043-1-eddyz87@gmail.com/
Fixes: 19a8e06f5f ("selftests/bpf: Tests execution support for test_loader.c")
Reported-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/bpf/f4c4aee644425842ee6aa8edf1da68f0a8260e7c.camel@gmail.com/T/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florian Westphal found a bug in test_loader.c processing of __retval
tag. Because of this bug the function test_loader.c:do_prog_test_run()
never executed and all __retval test tags were ignored. This hid an
issue with progs/refcounted_kptr.c tests.
When __retval tag bug is fixed and refcounted_kptr.c tests are run
kernel reports various issues and eventually hangs. Shortest reproducer
is the following command run a few times:
$ for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
Commenting out __retval tags for these tests until this issue is resolved.
Reported-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/bpf/f4c4aee644425842ee6aa8edf1da68f0a8260e7c.camel@gmail.com/T/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add prog test for accessing integer type of variable array in tracing
program.
In addition, hook load_balance function to access sd->span[0], only
to confirm whether the load is successful. Because there is no direct
way to trigger load_balance call.
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20230420032735.27760-3-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reducing the time taken by dscr_sysfs_test.c allows restoring the
default timeout, which was removed in
commit 850507f30c ("selftests/powerpc: Turn off timeout setting for
benchmarks, dscr, signal, tm") because that test took too long.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-8-bgray@linux.ibm.com
This test case is extremely slow, taking around a minute compared to
most of the other DSCR tests taking a second at most. Perf shows most
time is spent by the kernel switching to each CPU it reads in
/sys/devices/system/cpu. This switching is an unavoidable consequnce
of reading all the .../cpuN/dscr values.
Remove the outer iteration loop from this test case, reducing the reads
from 1600 to 16. This still updates the DSCR 16 times and verifies on
every CPU each time, so I do not expect the lower coverage to be
meaningful. The speedup is significant: back down to ~1 second like the
other tests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-7-bgray@linux.ibm.com
The tests currently have a single writer thread updating the system
DSCR with a 1/1000 chance looped only 100 times. So only around one in
10 runs actually do anything.
* Add multiple threads to the dscr_explicit_random_test case.
* Use a barrier to make all the threads start work as simultaneously as
possible.
* Use a rwlock and make all threads have a reasonable chance to write to
the DSCR on each iteration.
PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP is used to prevent
writers from starving while all the other threads keep reading.
Logging the reads/writes shows a decent mix across the whole test.
* Allow all threads a chance to write.
* Make the chance of writing more likely.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-6-bgray@linux.ibm.com
Add new cases to the relevant tests that use explicitly synchronized
threads to test the behaviour across context switches with less
randomness. By locking the participants to the same CPU we guarantee a
context switch occurs each time they make progress, which is a likely
failure point if the kernel is not tracking the thread local DSCR
correctly.
The random case is left in to keep exercising potential edge cases.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-5-bgray@linux.ibm.com
All current users of bind_to_cpu() don't care _which_ CPU they get, just
that they are bound to a single free one. So alter the interface to
1. Accept a BIND_CPU_ANY value that tells it to automatically
pick a CPU
2. Return the picked CPU
And convert all these users to bind_to_cpu(BIND_CPU_ANY).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-4-bgray@linux.ibm.com
This function will be useful in the DSCR test patches later in this
series, so promote it to be shared by all powerpc selftests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-3-bgray@linux.ibm.com
Correct a couple of typos while working on other improvements to the
DSCR tests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-2-bgray@linux.ibm.com
This macro is to be used in assembly where C functions are called.
pcrel addressing mode requires branches to functions with a
localentry value of 1 to have either a trailing nop or @notoc.
This macro permits the latter without changing callers.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add dummy definitions to fix selftests build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-5-npiggin@gmail.com
Fix the unmapping of hugepage allocated umems so that they are
properly unmapped. The new test referred to in the fixes label,
introduced a test that allocated a umem that is not a multiple of a 2M
hugepage size. This is fine for mmap() that rounds the size up the
nearest multiple of 2M. But munmap() requires the size to be a
multiple of the hugepage size in order for it to unmap the region. The
current behaviour of not properly unmapping the umem, was discovered
when further additions of tests that require hugepages (unaligned mode
tests only) started failing as the system was running out of
hugepages.
Fixes: c0801598e5 ("selftests: xsk: Add test UNALIGNED_INV_DESC_4K1_FRAME_SIZE")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230418143617.27762-1-magnus.karlsson@gmail.com
sysctl memfd_noexec is pid-namespaced, non-reservable, and inherent to the
child process.
Move the inherence test from init ns to child ns, so init ns can keep the
default value.
Link: https://lkml.kernel.org/r/20230414022801.2545257-1-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@google.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303312259.441e35db-yujie.liu@intel.com
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The va_high_addr_switch selftest is used to test mmap across 128TB
boundary. It divides the selftest cases into two main categories on the
basis of size. One set is used to create mappings that are multiples of
PAGE_SIZE while the other creates mappings that are multiples of
HUGETLB_SIZE.
In order to run the hugetlb testcases the binary must be appended with
"--run-hugetlb" but the file that used to run the test only invokes the
binary, thereby completely skipping the hugetlb testcases. Hence, the
required statement has been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-6-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Arm64 has a default hugepage size of 512MB when CONFIG_ARM64_64K_PAGES=y
is enabled. While testing on arm64 platforms having up to 4PB of virtual
address space, a minimum of 6 hugepages were required for all test cases
to pass. Support for this requirement has been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-5-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The in code comments for the selftest were made on the basis of 128TB
switch, an architecture feature specific to PowerPc and x86 platforms.
Keeping in mind the support added for arm64 platforms which implements a
256TB switch, a more generic explanation has been provided.
Link: https://lkml.kernel.org/r/20230323105243.2807166-4-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As the initial selftest only took into consideration PowperPC and x86
architectures, on adding support for arm64, a platform independent naming
convention is chosen.
Link: https://lkml.kernel.org/r/20230323105243.2807166-3-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: Implement support for arm64 on va".
The va_128TBswitch selftest is designed and implemented for PowerPC and
x86 architectures which support a 128TB switch, up to 256TB of virtual
address space and hugepage sizes of 16MB and 2MB respectively. Arm64
platforms on the other hand support a 256Tb switch, up to 4PB of virtual
address space and a default hugepage size of 512MB when 64k pagesize is
enabled.
These architectural differences require introducing support for arm64
platforms, after which a more generic naming convention is suggested. The
in code comments are amended to provide a more platform independent
explanation of the working of the code and nr_hugepages are configured as
required. Finally, the file running the testcase is modified in order to
prevent skipping of hugetlb testcases of va_high_addr_switch.
This patch (of 5):
Arm64 platforms have the ability to support 64kb pagesize, 512MB default
hugepage size and up to 4PB of virtual address space. The address switch
occurs at 256TB as opposed to 128TB. Hence, the necessary support has
been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-1-chaitanyas.prakash@arm.com
Link: https://lkml.kernel.org/r/20230323105243.2807166-2-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This new test tests against the returned ioctls from UFFDIO_REGISTER,
where put into uffdio_register.ioctls.
This also tests the expected failure cases of UFFDIO_REGISTER, aka:
- Register with empty mode should fail with -EINVAL
- Register minor without page cache (anon) should fail with -EINVAL
Link: https://lkml.kernel.org/r/20230412164548.329376-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The userfaultfd stress test never tested private shmem, which I think was
overlooked long due. Add it so it matches with uffd unit test and it'll
cover all memory supported with the three memory types.
Meanwhile, rename the memory types a bit. Considering shared mem is the
major use case for both shmem / hugetlbfs, changing from:
anon, hugetlb, hugetlb_shared, shmem
To (with shmem-private added):
anon, hugetlb, hugetlb-private, shmem, shmem-private
Add the shmem-private to run_vmtests.sh too.
Link: https://lkml.kernel.org/r/20230412164546.329355-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With the new uffd unit test covering the /dev/userfaultfd path and syscall
path of uffd initializations, we can safely drop the devnode test in the
old stress test.
One thing is to avoid duplication of running the stress test twice which is
an overkill to only test the /dev/ interface in run_vmtests.sh.
The other benefit is now all uffd tests (that uses userfaultfd_open) can
run automatically as long as any type of interface is enabled (either
syscall or dev), so it's more likely to succeed rather than fail due to
unprivilege.
With this patch lands, we can drop all the "mem_type:XXX" handlings too.
Link: https://lkml.kernel.org/r/20230412164525.329176-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Allow skip a unit test properly due to no privilege (e.g. sigbus and
events tests).
[colin.i.king@gmail.com: fix spelling mistake "priviledge" -> "privilege"]
Link: https://lkml.kernel.org/r/20230414081506.1678998-1-colin.i.king@gmail.com
Link: https://lkml.kernel.org/r/20230412164520.329163-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Userfaultfd minor+wp mode was very recently added. The test will fail on
the old kernels at ioctl(UFFDIO_CONTINUE) which is misterious.
Unfortunately there's no feature bit to detect for this support.
Add a hack to leverage WP_UNPOPULATED to detect whether that feature
existed, since WP_UNPOPULATED was merged right after minor+wp.
Link: https://lkml.kernel.org/r/20230412164517.329152-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Simplifies it a bit along the way, e.g., drop the never used offset field
(which was always the 1st page so offset=0).
Introduce uffd_register_with_ioctls() out of uffd_register() to detect
uffdio_register.ioctls got returned. Check that automatically when testing
UFFDIO_ZEROPAGE on different types of memory (and kernel).
Link: https://lkml.kernel.org/r/20230412164404.328815-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move the two tests into the unit test, and convert it into 20 standalone
tests:
- events test on all 5 mem types, with wp on/off
- signal test on all 5 mem types, with wp on/off
Testing sigbus on anon... done
Testing sigbus on shmem... done
Testing sigbus on shmem-private... done
Testing sigbus on hugetlb... done
Testing sigbus on hugetlb-private... done
Testing sigbus-wp on anon... done
Testing sigbus-wp on shmem... done
Testing sigbus-wp on shmem-private... done
Testing sigbus-wp on hugetlb... done
Testing sigbus-wp on hugetlb-private... done
Testing events on anon... done
Testing events on shmem... done
Testing events on shmem-private... done
Testing events on hugetlb... done
Testing events on hugetlb-private... done
Testing events-wp on anon... done
Testing events-wp on shmem... done
Testing events-wp on shmem-private... done
Testing events-wp on hugetlb... done
Testing events-wp on hugetlb-private... done
It'll also remove a lot of global references along the way,
e.g. test_uffdio_wp will be replaced with the wp value passed over.
Link: https://lkml.kernel.org/r/20230412164400.328798-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This moves the minor test to the new unit test.
Rewrite the content check with char* opeartions to avoid fiddling with
my_bcmp().
Drop global vars test_uffdio_minor and test_collapse, just assume test them
always in common code for now.
OTOH make this single test into five tests:
- minor test on [shmem, hugetlb] with wp=false
- minor test on [shmem, hugetlb] with wp=true
- minor test + collapse on shmem only
One thing to mention that we used to test COLLAPSE+WP but that doesn't
sound right at all. It's possible it's silently broken but unnoticed
because COLLAPSE is not part of the default test suite.
Make the MADV_COLLAPSE test fail-able (by skip it when failing), because
it's not guaranteed to success anyway.
Drop a bunch of useless code after the move, because the unit test always
use aligned num of pages and has nothing to do with n_cpus.
Link: https://lkml.kernel.org/r/20230412164357.328779-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move it over and make it split into two tests, one for pagemap and one for
the new WP_UNPOPULATED (to be a separate one).
The thp pagemap test wasn't really working (with MADV_HUGEPAGE). Let's
just drop it (since it never really worked anyway..) and leave that for
later.
Link: https://lkml.kernel.org/r/20230412164352.328733-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a framework to be prepared to move unit tests from uffd-stress.c into
uffd-unit-tests.c. The goal is to allow detection of uffd features for
each test, and also loop over specified types of memory that a test
support.
Link: https://lkml.kernel.org/r/20230412164348.328710-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mostly to detect hugetlb allocation errors and skip hugetlb tests when
pages are not allocated.
Link: https://lkml.kernel.org/r/20230412164345.328659-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make the handler optionally apply WP bit when resolving page faults for
either missing or minor page faults. This moves towards removing global
test_uffdio_wp outside of the common code.
Link: https://lkml.kernel.org/r/20230412164341.328618-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Prepare for adding more fields into the struct.
Link: https://lkml.kernel.org/r/20230412164337.328607-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hpage_size was wrongly used. Sometimes it means hugetlb default size,
sometimes it was used as thp size.
Remove the global variable and use the right one at each place.
Link: https://lkml.kernel.org/r/20230412164333.328596-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Drop it by creating the memfd dynamically in the tests.
Link: https://lkml.kernel.org/r/20230412164331.328584-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add one simple test for UFFDIO_API. With that, I also added a bunch of
small but handy helpers along the way.
Link: https://lkml.kernel.org/r/20230412164257.328375-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Provide two helpers to open an uffd handle. Drop the error checks around
SKIPs because it's inside an errexit() anyway, which IMHO doesn't really
help much if the test will not continue.
Link: https://lkml.kernel.org/r/20230412164254.328335-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add two helpers to register/unregister to an uffd. Use them to drop
duplicate codes.
This patch also drops assert_expected_ioctls_present() and
get_expected_ioctls(). Reasons:
- It'll need a lot of effort to pass test_type==HUGETLB into it from
the upper, so it's the simplest way to get rid of another global var
- The ioctls returned in UFFDIO_REGISTER is hardly useful at all,
because any app can already detect kernel support on any ioctl via its
corresponding UFFD_FEATURE_*. The check here is for sanity mostly but
it's probably destined no user app will even use it.
- It's not friendly to one future goal of uffd to run on old
kernels, the problem is get_expected_ioctls() compiles against
UFFD_API_RANGE_IOCTLS, which is a value that can change depending on
where the test is compiled, rather than reflecting what the kernel
underneath has. It means it'll report false negatives on old kernels
so it's against our will.
So let's make our lives easier.
[peterx@redhat.com; tools/testing/selftests/mm/hugepage-mremap.c: add headers]
Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
Link: https://lkml.kernel.org/r/20230412164247.328293-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In many ways it's weird and unwanted to keep all the tests in the same
userfaultfd.c at least when still in the current way.
For example, it doesn't make much sense to run the stress test for each
method we can create an userfaultfd handle (either via syscall or /dev/
node). It's a waste of time running this twice for the whole stress as
the stress paths are the same, only the open path is different.
It's also just weird to need to manually specify different types of memory
to run all unit tests for the userfaultfd interface. We should be able to
just run a single program and that should go through all functional uffd
tests without running the stress test at all. The stress test was more
for torturing and finding race conditions. We don't want to wait for
stress to finish just to regress test a functional test.
When we start to pile up more things on top of the same file and same
functions, things start to go a bit chaos and the code is just harder to
maintain too with tons of global variables.
This patch creates a new test uffd-unit-tests to keep userfaultfd unit
tests in the future, currently empty.
Meanwhile rename the old userfaultfd.c test to uffd-stress.c.
Link: https://lkml.kernel.org/r/20230412164244.328270-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move common utility functions into uffd-common.[ch] files from the
original userfaultfd.c. This prepares for a split of userfaultfd.c into
two tests: one to only cover the old but powerful stress test, the other
one covers all the functional tests.
This movement is kind of a brute-force effort for now, with light
touch-ups but nothing should really change. There's chances to optimize
more, but let's leave that for later.
Link: https://lkml.kernel.org/r/20230412164241.328259-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The idea was trying to flip this var in the alarm handler from time to
time to test -EEXIST of UFFDIO_ZEROPAGE, but firstly it's only used in the
zeropage test so probably only used once, meanwhile we passed
"retry==false" so it'll never got tested anyway.
Drop both sides so we always test UFFDIO_ZEROPAGE retries if has_zeropage
is set (!hugetlb).
One more thing to do is doing UFFDIO_REGISTER for the alias buffer too,
because otherwise the test won't even pass! We were just lucky that this
test never really got ran at all.
Link: https://lkml.kernel.org/r/20230412164238.328238-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make the check as simple as "test_type == TEST_HUGETLB" because that's the
only mem that doesn't support ZEROPAGE.
Link: https://lkml.kernel.org/r/20230412164234.328168-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We've got the macros in uffd-stress.c, move it over and use it in
vm_util.h.
Link: https://lkml.kernel.org/r/20230412164227.328145-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There're already 3 same definitions of the three functions. Move it into
vm_util.[ch].
Link: https://lkml.kernel.org/r/20230412164223.328134-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We do have plenty of files that want to link against vm_util.c. Just make
it simple by linking it always.
Link: https://lkml.kernel.org/r/20230412164220.328123-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
TEST_GEN_PROGS and TEST_GEN_FILES are used randomly in the mm/Makefile to
specify programs that need to build. Logically all these binaries should
all fall into TEST_GEN_PROGS.
Replace those TEST_GEN_FILES with TEST_GEN_PROGS, so that we can reference
all the tests easily later.
[peterx@redhat.com: tools/testing/selftests/mm/Makefile: don't wipe out TEST_GEN_PROGS]
Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
Link: https://lkml.kernel.org/r/20230412164218.328104-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There're two util headers under mm/ kselftest. Merge one with another.
It turns out util.h is the easy one to move.
When merging, drop PAGE_SIZE / PAGE_SHIFT because they're unnecessary
wrappers to page_size() / page_shift(), meanwhile rename them to psize()
and pshift() so as to not conflict with some existing definitions in some
test files that includes vm_util.h.
Link: https://lkml.kernel.org/r/20230412164120.327731-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dump a summary after running whatever test specified. Useful for human
runners to identify any kind of failures (besides exit code).
Link: https://lkml.kernel.org/r/20230412164117.327720-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: Split / Refactor userfault test", v2.
This patchset splits userfaultfd.c into two tests:
- uffd-stress: the "vanilla", old and powerful stress test
- uffd-unit-tests: all the unit tests will be moved here
This is on my todo list for a long time but I never did it for real. The
uffd test is growing into a small and cute monster. I start to notice it's
going harder to maintain such a test and make it useful.
A few issues I found when looking at userfaultfd test:
- We have a bunch of unit tests in userfaultfd.c, but they always need to
be run only after a stress type. No way to not do it.
- We can only run an unit test for one memory type only, if we want to
do a quick smoke test to check regressions, there's no good way. The
best to come currently is "bash ./run_vmtests.sh -t userfaultfd" thanks
to the most recent changes to run_vmtests.sh on tagging. Still, that
needs to run the stress tests always and hard to see what's wrong.
- It's hard to add a new unit test to userfaultfd.c, we don't really know
what's happening, not until we mostly read the whole file.
- We did a bunch of useless tests, e.g. we run twice the whole suite of
stress test just to verify both syscall and /dev/userfaultfd. They're
all using userfaultfd_new() to create the handle, everything should
really be the same underneath. One simple unit test should cover that!
- We have tens of global variables in one file but shared with all the
tests. Some of them are not suitable to be a global var from
maintainance pov. It enforces every unit test to consider how these
vars affects the stress test and vice versa, but that's logically not
necessary.
- Userfaultfd test is not friendly to old kernels. Mostly it only works
on the latest kernel tree. It's preferrable to be run on all kernels
and properly report what's missing.
I'll stop here, I feel like I can still list some..
This patchset should resolve all issues above, and actually we can do even
more on top. I stopped doing that until I found I already got 29 patches
and 2000+ LOC changes. That's already a patchset terrible enough so we
should move in small steps.
After the whole set applied, "./run_vmtests.sh -t userfaultfd" looks like
this:
===8<===
vm.nr_hugepages = 1024
-------------------------
running ./uffd-unit-tests
-------------------------
Testing UFFDIO_API (with syscall)... done
Testing UFFDIO_API (with /dev/userfaultfd)... done
Testing register-ioctls on anon... done
Testing register-ioctls on shmem... done
Testing register-ioctls on shmem-private... done
Testing register-ioctls on hugetlb... done
Testing register-ioctls on hugetlb-private... done
Testing zeropage on anon... done
Testing zeropage on shmem... done
Testing zeropage on shmem-private... done
Testing zeropage on hugetlb... done
Testing zeropage on hugetlb-private... done
Testing pagemap on anon... done
Testing wp-unpopulated on anon... done
Testing minor on shmem... done
Testing minor on hugetlb... done
Testing minor-wp on shmem... done
Testing minor-wp on hugetlb... done
Testing minor-collapse on shmem... done
Testing sigbus on anon... done
Testing sigbus on shmem... done
Testing sigbus on shmem-private... done
Testing sigbus on hugetlb... done
Testing sigbus on hugetlb-private... done
Testing sigbus-wp on anon... done
Testing sigbus-wp on shmem... done
Testing sigbus-wp on shmem-private... done
Testing sigbus-wp on hugetlb... done
Testing sigbus-wp on hugetlb-private... done
Testing events on anon... done
Testing events on shmem... done
Testing events on shmem-private... done
Testing events on hugetlb... done
Testing events on hugetlb-private... done
Testing events-wp on anon... done
Testing events-wp on shmem... done
Testing events-wp on shmem-private... done
Testing events-wp on hugetlb... done
Testing events-wp on hugetlb-private... done
Userfaults unit tests: pass=39, skip=0, fail=0 (total=39)
[PASS]
--------------------------------
running ./uffd-stress anon 20 16
--------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 345 missing (26+48+61+102+30+12+59+7) 1596 wp (120+139+317+346+215+67+306+86)
[...]
[PASS]
------------------------------------
running ./uffd-stress hugetlb 128 32
------------------------------------
nr_pages: 64, nr_pages_per_cpu: 8
bounces: 31, mode: rnd racing ver poll, userfaults: 29 missing (6+6+6+5+4+2+0+0) 104 wp (20+19+22+18+7+12+5+1)
[...]
[PASS]
--------------------------------------------
running ./uffd-stress hugetlb-private 128 32
--------------------------------------------
nr_pages: 64, nr_pages_per_cpu: 8
bounces: 31, mode: rnd racing ver poll, userfaults: 33 missing (12+9+7+0+5+0+0+0) 111 wp (24+25+14+14+11+17+5+1)
[...]
[PASS]
---------------------------------
running ./uffd-stress shmem 20 16
---------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 247 missing (15+17+34+60+81+37+3+0) 2038 wp (180+114+276+400+381+318+165+204)
[...]
[PASS]
-----------------------------------------
running ./uffd-stress shmem-private 20 16
-----------------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 235 missing (52+29+55+56+13+9+16+5) 2849 wp (218+406+461+531+328+284+430+191)
[...]
[PASS]
SUMMARY: PASS=6 SKIP=0 FAIL=0
===8<===
The output may be different if we miss some features (e.g., hugetlb not
allocated, old kernel, less privilege of uffd handle), but they should show
up with good reasons. E.g., I tried to run the unit test on my Fedora
kernel and it gives me:
===8<===
UFFDIO_API (with syscall)... failed [reason: UFFDIO_API should fail with wrong api but didn't]
UFFDIO_API (with /dev/userfaultfd)... skipped [reason: cannot open userfaultfd handle]
zeropage on anon... done
zeropage on shmem... done
zeropage on shmem-private... done
zeropage-hugetlb on hugetlb... done
zeropage-hugetlb on hugetlb-private... done
pagemap on anon... pagemap on anon... pagemap on anon... done
wp-unpopulated on anon... skipped [reason: feature missing]
minor on shmem... done
minor on hugetlb... done
minor-wp on shmem... skipped [reason: feature missing]
minor-wp on hugetlb... skipped [reason: feature missing]
minor-collapse on shmem... done
sigbus on anon... skipped [reason: possible lack of priviledge]
sigbus on shmem... skipped [reason: possible lack of priviledge]
sigbus on shmem-private... skipped [reason: possible lack of priviledge]
sigbus on hugetlb... skipped [reason: possible lack of priviledge]
sigbus on hugetlb-private... skipped [reason: possible lack of priviledge]
sigbus-wp on anon... skipped [reason: possible lack of priviledge]
sigbus-wp on shmem... skipped [reason: possible lack of priviledge]
sigbus-wp on shmem-private... skipped [reason: possible lack of priviledge]
sigbus-wp on hugetlb... skipped [reason: possible lack of priviledge]
sigbus-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
events on anon... skipped [reason: possible lack of priviledge]
events on shmem... skipped [reason: possible lack of priviledge]
events on shmem-private... skipped [reason: possible lack of priviledge]
events on hugetlb... skipped [reason: possible lack of priviledge]
events on hugetlb-private... skipped [reason: possible lack of priviledge]
events-wp on anon... skipped [reason: possible lack of priviledge]
events-wp on shmem... skipped [reason: possible lack of priviledge]
events-wp on shmem-private... skipped [reason: possible lack of priviledge]
events-wp on hugetlb... skipped [reason: possible lack of priviledge]
events-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
Userfaults unit tests: pass=9, skip=24, fail=1 (total=34)
===8<===
Patch layout:
- Revert "userfaultfd: don't fail on unrecognized features"
Something I found when I got the UFFDIO_API test below. Axel, I still
propose to revert it as a whole, but feel free to continue the discussion
from the original patch thread.
- selftests/mm: Update .gitignore with two missing tests
- selftests/mm: Dump a summary in run_vmtests.sh
- selftests/mm: Merge util.h into vm_util.h
- selftests/mm: Use TEST_GEN_PROGS where proper
- selftests/mm: Link vm_util.c always
- selftests/mm: Merge default_huge_page_size() into one
- selftests/mm: Use PM_* macros in vm_utils.h
- selftests/mm: Reuse pagemap_get_entry() in vm_util.h
- selftests/mm: Test UFFDIO_ZEROPAGE only when !hugetlb
- selftests/mm: Drop test_uffdio_zeropage_eexist
Until here, all cleanups here and there. I wanted to keep going, but I
found that maybe it'll take a few more days to split the test. Hence I
did a split starting from the next one, so we have a working thing first.
- selftests/mm: Create uffd-common.[ch]
- selftests/mm: Split uffd tests into uffd-stress and uffd-unit-tests
This did the major brute force split of common codes into
uffd-common.[ch]. That'll be the so far common base for stress and unit
tests. Then a new unit test is created.
- selftests/mm: uffd_[un]register()
- selftests/mm: uffd_open_{dev|sys}()
- selftests/mm: UFFDIO_API test
This patch hides here to start writting the 1st unit test with
UFFDIO_API, also detection of userfaultfd privileges.
- selftests/mm: Drop global mem_fd in uffd tests
- selftests/mm: Drop global hpage_size in uffd tests
- selftests/mm: Rename uffd_stats to uffd_args
- selftests/mm: Let uffd_handle_page_fault() takes wp parameter
- selftests/mm: Allow allocate_area() to fail properly
Some further cleanup that I noticed otherwise hard to move the tests.
- selftests/mm: Add framework for uffd-unit-test
The major patch provides the framework for most of the rest unit tests.
- selftests/mm: Move uffd pagemap test to unit test
- selftests/mm: Move uffd minor test to unit test
- selftests/mm: Move uffd sig/events tests into uffd unit tests
- selftests/mm: Move zeropage test into uffd unit tests
Move unit tests and suite them into the new file.
- selftests/mm: Workaround no way to detect uffd-minor + wp
- selftests/mm: Allow uffd test to skip properly with no privilege
- selftests/mm: Drop sys/dev test in uffd-stress test
- selftests/mm: Add shmem-private test to uffd-stress
A bunch of changes to do better on error reportings, and add
shmem-private to the stress test which was long missing.
- selftests/mm: Add uffdio register ioctls test
One more patch to test uffdio_register.ioctls.
This patch (of 30):
Update .gitignore with two missing tests.
Link: https://lkml.kernel.org/r/20230412163922.327282-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230412164114.327709-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add some tests that trigger (pte|pmd)_mkdirty on VMAs without write
permissions. If an architecture implementation is wrong, we might
accidentally set the PTE/PMD writable and allow for write access in a VMA
without write permissions.
The tests include reproducers for the two issues recently discovered
and worked-around in core-MM for now:
(1) commit 624a2c94f5 ("Partly revert "mm/thp: carry over dirty
bit when thp splits on pmd"")
(2) commit 96a9c287e2 ("mm/migrate: fix wrongly apply write bit
after mkdirty on sparc64")
In addition, some other tests that reveal further issues.
All tests pass under x86_64:
./mkdirty
# [INFO] detected THP size: 2048 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
ok 6 SIGSEGV generated, page not modified
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
But some fail on sparc64:
./mkdirty
# [INFO] detected THP size: 8192 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
not ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
not ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
not ok 6 SIGSEGV generated, page not modified
Bail out! 3 out of 6 tests failed
# Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0
Reverting both above commits makes all tests fail on sparc64:
./mkdirty
# [INFO] detected THP size: 8192 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
not ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
not ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
not ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
not ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
not ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
not ok 6 SIGSEGV generated, page not modified
Bail out! 6 out of 6 tests failed
# Totals: pass:0 fail:6 xfail:0 xpass:0 skip:0 error:0
The tests are useful to detect other problematic archs, to verify new
arch fixes, and to stop such issues from reappearing in the future.
For now, we don't add any hugetlb tests.
Link: https://lkml.kernel.org/r/20230411142512.438404-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: (pte|pmd)_mkdirty() should not unconditionally allow for
write access".
This is the follow-up on [1], adding selftests (testing for known issues
we added workarounds for and other issues that haven't been fixed yet),
fixing sparc64, reverting the workarounds, and perform one cleanup.
The patch from [1] was modified slightly (updated/extended patch
description, dropped one unnecessary NOP instruction from the ASM in
__pte_mkhwwrite()).
Retested on x86_64 and sparc64 (sun4u in QEMU).
I scanned most architectures to make sure their (pte|pmd)_mkdirty()
handling is correct. To be sure, we can run the selftests and find out if
other architectures are still affectes (loongarch was fixed recently as
well).
Based on master for now. I don't expect surprises regarding mm-tress, but
I can rebase if there are any problems.
This patch (of 6):
The COW selftest can deal with THP not being configured. So move error
handling of read_pmd_pagesize() into the callers such that we can reuse it
in the COW selftest.
Link: https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com [1]
Link: https://lkml.kernel.org/r/20230411142512.438404-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test case to check whether the number of maple_alloc structures is
actually equal to mas->alloc->total.
Link: https://lkml.kernel.org/r/20230411041005.26205-2-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This adds a test for the recently added RISC-V interface for probing
hardware capabilities. It happens to be the first selftest we have for
RISC-V, so I've added some infrastructure for those as well.
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-6-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
To make it easier for bleeding-edge BPF applications, such as sched_ext,
to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and
bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to
libbpf-provided bpf_helpers.h header.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which
can also be fetched from clock_gettime() API.
Improve the test coverage while verifying the monotonicity of
CLOCK_BOOTTIME accross both interfaces.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-9-frederic@kernel.org
Due to broken iowait task counting design (cf: comments above
get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide
the guarantee that /proc/stat or /proc/uptime display monotonic idle
time values.
Remove the assertions that verify the related wrong assumption so that
testers and maintainers don't spend more time on that.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-8-frederic@kernel.org
Add a selftest to ensure subreg equality if source register
upper 32bit is 0. Without previous patch, the test will
fail verification.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230417222139.360607-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Most of the code had an issue according to ShellCheck.
That's mainly due to the fact it incorrectly believes most of the code
was unreachable because it's invoked by variable name, see how the
"tests" array is used.
Once SC2317 has been ignored, three small warnings were still visible:
- SC2155: Declare and assign separately to avoid masking return values.
- SC2046: Quote this to prevent word splitting: can be ignored because
"ip netns pids" can display more than one pid.
- SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
This probably didn't fix any actual issues but it might help spotting
new interesting warnings reported by ShellCheck as just before,
ShellCheck was reporting issues for most lines making it a bit useless.
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
mptcp_connect tool was printing some duplicated entries when showing how
to use it: -j -l -r
While at it, I also:
- moved the very few entries that were not sorted,
- added -R that was missing since
commit 8a4b910d00 ("mptcp: selftests: add rcvbuf set option"),
- removed the -u parameter that has been removed in
commit f730b65c9d ("selftests: mptcp: try to set mptcp ulp mode in different sk states").
No need to backport this, it is just an internal tool used by our
selftests. The help menu is mainly useful for MPTCP kernel devs.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The upcall socket interface can be exercised now to make sure that
future feature adjustments to the field can maintain backwards
compatibility.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a basic set of fields to print in a 'dpflow' format. This will be
used by future commits to check for flow fields after parsing, as
well as verifying the flow fields pushed into the kernel from
userspace.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Includes an associated test to generate netns and connect
interfaces, with the option to include packet tracing.
This will be used in the future when flow support is added
for additional test cases.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We've managed to improve the UX for kptrs significantly over the last 9
months. All of the prior main use cases, struct bpf_cpumask *, struct
task_struct *, and struct cgroup *, have all been updated to be
synchronized mainly using RCU. In other words, their KF_ACQUIRE kfunc
calls are all KF_RCU, and the pointers themselves are MEM_RCU and can be
accessed in an RCU read region in BPF.
In a follow-on change, we'll be removing the KF_KPTR_GET kfunc flag.
This patch prepares for that by removing the
bpf_kfunc_call_test_kptr_get() kfunc, and all associated selftests.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230416084928.326135-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Validate that the following new ptrace requests work as expected
* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
returns the contents of task->syscall_dispatch
* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
sets the contents of task->syscall_dispatch
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230407171834.3558-5-gregory.price@memverge.com
Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
a signal to all running threads. This effectively tests that the kernel
doesn't prefer any one thread (or subset of threads) for signal delivery.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230316123028.2890338-2-elver@google.com
Test refcounted local kptr functionality added in previous patches in
the series.
Usecases which pass verification:
* Add refcounted local kptr to both tree and list. Then, read and -
possibly, depending on test variant - delete from tree, then list.
* Also test doing read-and-maybe-delete in opposite order
* Stash a refcounted local kptr in a map_value, then add it to a
rbtree. Read from both, possibly deleting after tree read.
* Add refcounted local kptr to both tree and list. Then, try reading and
deleting twice from one of the collections.
* bpf_refcount_acquire of just-added non-owning ref should work, as
should bpf_refcount_acquire of owning ref just out of bpf_obj_new
Usecases which fail verification:
* The simple successful bpf_refcount_acquire cases from above should
both fail to verify if the newly-acquired owning ref is not dropped
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-10-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch modifies bpf_rbtree_remove to account for possible failure
due to the input rb_node already not being in any collection.
The function can now return NULL, and does when the aforementioned
scenario occurs. As before, on successful removal an owning reference to
the removed node is returned.
Adding KF_RET_NULL to bpf_rbtree_remove's kfunc flags - now KF_RET_NULL |
KF_ACQUIRE - provides the desired verifier semantics:
* retval must be checked for NULL before use
* if NULL, retval's ref_obj_id is released
* retval is a "maybe acquired" owning ref, not a non-owning ref,
so it will live past end of critical section (bpf_spin_unlock), and
thus can be checked for NULL after the end of the CS
BPF programs must add checks
============================
This does change bpf_rbtree_remove's verifier behavior. BPF program
writers will need to add NULL checks to their programs, but the
resulting UX looks natural:
bpf_spin_lock(&glock);
n = bpf_rbtree_first(&ghead);
if (!n) { /* ... */}
res = bpf_rbtree_remove(&ghead, &n->node);
bpf_spin_unlock(&glock);
if (!res) /* Newly-added check after this patch */
return 1;
n = container_of(res, /* ... */);
/* Do something else with n */
bpf_obj_drop(n);
return 0;
The "if (!res)" check above is the only addition necessary for the above
program to pass verification after this patch.
bpf_rbtree_remove no longer clobbers non-owning refs
====================================================
An issue arises when bpf_rbtree_remove fails, though. Consider this
example:
struct node_data {
long key;
struct bpf_list_node l;
struct bpf_rb_node r;
struct bpf_refcount ref;
};
long failed_sum;
void bpf_prog()
{
struct node_data *n = bpf_obj_new(/* ... */);
struct bpf_rb_node *res;
n->key = 10;
bpf_spin_lock(&glock);
bpf_list_push_back(&some_list, &n->l); /* n is now a non-owning ref */
res = bpf_rbtree_remove(&some_tree, &n->r, /* ... */);
if (!res)
failed_sum += n->key; /* not possible */
bpf_spin_unlock(&glock);
/* if (res) { do something useful and drop } ... */
}
The bpf_rbtree_remove in this example will always fail. Similarly to
bpf_spin_unlock, bpf_rbtree_remove is a non-owning reference
invalidation point. The verifier clobbers all non-owning refs after a
bpf_rbtree_remove call, so the "failed_sum += n->key" line will fail
verification, and in fact there's no good way to get information about
the node which failed to add after the invalidation. This patch removes
non-owning reference invalidation from bpf_rbtree_remove to allow the
above usecase to pass verification. The logic for why this is now
possible is as follows:
Before this series, bpf_rbtree_add couldn't fail and thus assumed that
its input, a non-owning reference, was in the tree. But it's easy to
construct an example where two non-owning references pointing to the same
underlying memory are acquired and passed to rbtree_remove one after
another (see rbtree_api_release_aliasing in
selftests/bpf/progs/rbtree_fail.c).
So it was necessary to clobber non-owning refs to prevent this
case and, more generally, to enforce "non-owning ref is definitely
in some collection" invariant. This series removes that invariant and
the failure / runtime checking added in this patch provide a clean way
to deal with the aliasing issue - just fail to remove.
Because the aliasing issue prevented by clobbering non-owning refs is no
longer an issue, this patch removes the invalidate_non_owning_refs
call from verifier handling of bpf_rbtree_remove. Note that
bpf_spin_unlock - the other caller of invalidate_non_owning_refs -
clobbers non-owning refs for a different reason, so its clobbering
behavior remains unchanged.
No BPF program changes are necessary for programs to remain valid as a
result of this clobbering change. A valid program before this patch
passed verification with its non-owning refs having shorter (or equal)
lifetimes due to more aggressive clobbering.
Also, update existing tests to check bpf_rbtree_remove retval for NULL
where necessary, and move rbtree_api_release_aliasing from
progs/rbtree_fail.c to progs/rbtree.c since it's now expected to pass
verification.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-8-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The linked_list tests use macros and function pointers to reduce code
duplication. Earlier in the series, bpf_list_push_{front,back} were
modified to be macros, expanding to invoke actual kfuncs
bpf_list_push_{front,back}_impl. Due to this change, a code snippet
like:
void (*p)(void *, void *) = (void *)&bpf_list_##op;
p(hexpr, nexpr);
meant to do bpf_list_push_{front,back}(hexpr, nexpr), will no longer
work as it's no longer valid to do &bpf_list_push_{front,back} since
they're no longer functions.
This patch fixes issues of this type, along with two other minor changes
- one improvement and one fix - both related to the node argument to
list_push_{front,back}.
* The fix: migration of list_push tests away from (void *, void *)
func ptr uncovered that some tests were incorrectly passing pointer
to node, not pointer to struct bpf_list_node within the node. This
patch fixes such issues (CHECK(..., f) -> CHECK(..., &f->node))
* The improvement: In linked_list tests, the struct foo type has two
list_node fields: node and node2, at byte offsets 0 and 40 within
the struct, respectively. Currently node is used in ~all tests
involving struct foo and lists. The verifier needs to do some work
to account for the offset of bpf_list_node within the node type, so
using node2 instead of node exercises that logic more in the tests.
This patch migrates linked_list tests to use node2 instead of node.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-7-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Consider this code snippet:
struct node {
long key;
bpf_list_node l;
bpf_rb_node r;
bpf_refcount ref;
}
int some_bpf_prog(void *ctx)
{
struct node *n = bpf_obj_new(/*...*/), *m;
bpf_spin_lock(&glock);
bpf_rbtree_add(&some_tree, &n->r, /* ... */);
m = bpf_refcount_acquire(n);
bpf_rbtree_add(&other_tree, &m->r, /* ... */);
bpf_spin_unlock(&glock);
/* ... */
}
After bpf_refcount_acquire, n and m point to the same underlying memory,
and that node's bpf_rb_node field is being used by the some_tree insert,
so overwriting it as a result of the second insert is an error. In order
to properly support refcounted nodes, the rbtree and list insert
functions must be allowed to fail. This patch adds such support.
The kfuncs bpf_rbtree_add, bpf_list_push_{front,back} are modified to
return an int indicating success/failure, with 0 -> success, nonzero ->
failure.
bpf_obj_drop on failure
=======================
Currently the only reason an insert can fail is the example above: the
bpf_{list,rb}_node is already in use. When such a failure occurs, the
insert kfuncs will bpf_obj_drop the input node. This allows the insert
operations to logically fail without changing their verifier owning ref
behavior, namely the unconditional release_reference of the input
owning ref.
With insert that always succeeds, ownership of the node is always passed
to the collection, since the node always ends up in the collection.
With a possibly-failed insert w/ bpf_obj_drop, ownership of the node
is always passed either to the collection (success), or to bpf_obj_drop
(failure). Regardless, it's correct to continue unconditionally
releasing the input owning ref, as something is always taking ownership
from the calling program on insert.
Keeping owning ref behavior unchanged results in a nice default UX for
insert functions that can fail. If the program's reaction to a failed
insert is "fine, just get rid of this owning ref for me and let me go
on with my business", then there's no reason to check for failure since
that's default behavior. e.g.:
long important_failures = 0;
int some_bpf_prog(void *ctx)
{
struct node *n, *m, *o; /* all bpf_obj_new'd */
bpf_spin_lock(&glock);
bpf_rbtree_add(&some_tree, &n->node, /* ... */);
bpf_rbtree_add(&some_tree, &m->node, /* ... */);
if (bpf_rbtree_add(&some_tree, &o->node, /* ... */)) {
important_failures++;
}
bpf_spin_unlock(&glock);
}
If we instead chose to pass ownership back to the program on failed
insert - by returning NULL on success or an owning ref on failure -
programs would always have to do something with the returned ref on
failure. The most likely action is probably "I'll just get rid of this
owning ref and go about my business", which ideally would look like:
if (n = bpf_rbtree_add(&some_tree, &n->node, /* ... */))
bpf_obj_drop(n);
But bpf_obj_drop isn't allowed in a critical section and inserts must
occur within one, so in reality error handling would become a
hard-to-parse mess.
For refcounted nodes, we can replicate the "pass ownership back to
program on failure" logic with this patch's semantics, albeit in an ugly
way:
struct node *n = bpf_obj_new(/* ... */), *m;
bpf_spin_lock(&glock);
m = bpf_refcount_acquire(n);
if (bpf_rbtree_add(&some_tree, &n->node, /* ... */)) {
/* Do something with m */
}
bpf_spin_unlock(&glock);
bpf_obj_drop(m);
bpf_refcount_acquire is used to simulate "return owning ref on failure".
This should be an uncommon occurrence, though.
Addition of two verifier-fixup'd args to collection inserts
===========================================================
The actual bpf_obj_drop kfunc is
bpf_obj_drop_impl(void *, struct btf_struct_meta *), with bpf_obj_drop
macro populating the second arg with 0 and the verifier later filling in
the arg during insn fixup.
Because bpf_rbtree_add and bpf_list_push_{front,back} now might do
bpf_obj_drop, these kfuncs need a btf_struct_meta parameter that can be
passed to bpf_obj_drop_impl.
Similarly, because the 'node' param to those insert functions is the
bpf_{list,rb}_node within the node type, and bpf_obj_drop expects a
pointer to the beginning of the node, the insert functions need to be
able to find the beginning of the node struct. A second
verifier-populated param is necessary: the offset of {list,rb}_node within the
node type.
These two new params allow the insert kfuncs to correctly call
__bpf_obj_drop_impl:
beginning_of_node = bpf_rb_node_ptr - offset
if (already_inserted)
__bpf_obj_drop_impl(beginning_of_node, btf_struct_meta->record);
Similarly to other kfuncs with "hidden" verifier-populated params, the
insert functions are renamed with _impl prefix and a macro is provided
for common usage. For example, bpf_rbtree_add kfunc is now
bpf_rbtree_add_impl and bpf_rbtree_add is now a macro which sets
"hidden" args to 0.
Due to the two new args BPF progs will need to be recompiled to work
with the new _impl kfuncs.
This patch also rewrites the "hidden argument" explanation to more
directly say why the BPF program writer doesn't need to populate the
arguments with anything meaningful.
How does this new logic affect non-owning references?
=====================================================
Currently, non-owning refs are valid until the end of the critical
section in which they're created. We can make this guarantee because, if
a non-owning ref exists, the referent was added to some collection. The
collection will drop() its nodes when it goes away, but it can't go away
while our program is accessing it, so that's not a problem. If the
referent is removed from the collection in the same CS that it was added
in, it can't be bpf_obj_drop'd until after CS end. Those are the only
two ways to free the referent's memory and neither can happen until
after the non-owning ref's lifetime ends.
On first glance, having these collection insert functions potentially
bpf_obj_drop their input seems like it breaks the "can't be
bpf_obj_drop'd until after CS end" line of reasoning. But we care about
the memory not being _freed_ until end of CS end, and a previous patch
in the series modified bpf_obj_drop such that it doesn't free refcounted
nodes until refcount == 0. So the statement can be more accurately
rewritten as "can't be free'd until after CS end".
We can prove that this rewritten statement holds for any non-owning
reference produced by collection insert functions:
* If the input to the insert function is _not_ refcounted
* We have an owning reference to the input, and can conclude it isn't
in any collection
* Inserting a node in a collection turns owning refs into
non-owning, and since our input type isn't refcounted, there's no
way to obtain additional owning refs to the same underlying
memory
* Because our node isn't in any collection, the insert operation
cannot fail, so bpf_obj_drop will not execute
* If bpf_obj_drop is guaranteed not to execute, there's no risk of
memory being free'd
* Otherwise, the input to the insert function is refcounted
* If the insert operation fails due to the node's list_head or rb_root
already being in some collection, there was some previous successful
insert which passed refcount to the collection
* We have an owning reference to the input, it must have been
acquired via bpf_refcount_acquire, which bumped the refcount
* refcount must be >= 2 since there's a valid owning reference and the
node is already in a collection
* Insert triggering bpf_obj_drop will decr refcount to >= 1, never
resulting in a free
So although we may do bpf_obj_drop during the critical section, this
will never result in memory being free'd, and no changes to non-owning
ref logic are needed in this patch.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-6-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, BPF programs can interact with the lifetime of refcounted
local kptrs in the following ways:
bpf_obj_new - Initialize refcount to 1 as part of new object creation
bpf_obj_drop - Decrement refcount and free object if it's 0
collection add - Pass ownership to the collection. No change to
refcount but collection is responsible for
bpf_obj_dropping it
In order to be able to add a refcounted local kptr to multiple
collections we need to be able to increment the refcount and acquire a
new owning reference. This patch adds a kfunc, bpf_refcount_acquire,
implementing such an operation.
bpf_refcount_acquire takes a refcounted local kptr and returns a new
owning reference to the same underlying memory as the input. The input
can be either owning or non-owning. To reinforce why this is safe,
consider the following code snippets:
struct node *n = bpf_obj_new(typeof(*n)); // A
struct node *m = bpf_refcount_acquire(n); // B
In the above snippet, n will be alive with refcount=1 after (A), and
since nothing changes that state before (B), it's obviously safe. If
n is instead added to some rbtree, we can still safely refcount_acquire
it:
struct node *n = bpf_obj_new(typeof(*n));
struct node *m;
bpf_spin_lock(&glock);
bpf_rbtree_add(&groot, &n->node, less); // A
m = bpf_refcount_acquire(n); // B
bpf_spin_unlock(&glock);
In the above snippet, after (A) n is a non-owning reference, and after
(B) m is an owning reference pointing to the same memory as n. Although
n has no ownership of that memory's lifetime, it's guaranteed to be
alive until the end of the critical section, and n would be clobbered if
we were past the end of the critical section, so it's safe to bump
refcount.
Implementation details:
* From verifier's perspective, bpf_refcount_acquire handling is similar
to bpf_obj_new and bpf_obj_drop. Like the former, it returns a new
owning reference matching input type, although like the latter, type
can be inferred from concrete kptr input. Verifier changes in
{check,fixup}_kfunc_call and check_kfunc_args are largely copied from
aforementioned functions' verifier changes.
* An exception to the above is the new KF_ARG_PTR_TO_REFCOUNTED_KPTR
arg, indicated by new "__refcounted_kptr" kfunc arg suffix. This is
necessary in order to handle both owning and non-owning input without
adding special-casing to "__alloc" arg handling. Also a convenient
place to confirm that input type has bpf_refcount field.
* The implemented kfunc is actually bpf_refcount_acquire_impl, with
'hidden' second arg that the verifier sets to the type's struct_meta
in fixup_kfunc_call.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-5-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add testing for the event "Instructions retired" (0xc0) in the PMU
event filter on both Intel and AMD to ensure that the event doesn't
count when it is disallowed. Unlike most of the other events, the
event "Instructions retired" will be incremented by KVM when an
instruction is emulated. Test that this case is being properly handled
and that KVM doesn't increment the counter when that event is
disallowed.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230307141400.1486314-6-aaronlewis@google.com
Link: https://lore.kernel.org/r/20230407233254.957013-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use a single struct to track all PMC event counts in the PMU filter test,
and copy the full struct to/from the guest when running and measuring each
guest workload. Using a common struct avoids naming conflicts, e.g. the
loads/stores testcase has claimed "perf_counter", and eliminates the
unnecessary truncation of the counter values when they are propagated from
the guest MSRs to the host structs.
Zero the struct before running the guest workload to ensure that the test
doesn't get a false pass due to consuming data from a previous run.
Link: https://lore.kernel.org/r/20230407233254.957013-6-seanjc@google.com
Reviewed by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use '0' to signal success and '-errno' to signal failure in the PMU event
filter test so that the values are slightly less magical/arbitrary. Using
'0' in the error paths is especially confusing as understanding it's an
error value requires following the breadcrumbs to the host code that
ultimately consumes the value.
Arguably there should also be a #define for "success", but 0/-errno is a
common enough pattern that defining another macro on top would likely do
more harm than good.
Link: https://lore.kernel.org/r/20230407233254.957013-5-seanjc@google.com
Reviewed by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Provide the actual vs. expected count in the PMU event filter test's
asserts instead of relying on pr_info() to provide the context, e.g. so
that all information needed to triage a failure is readily available even
if the environment in which the test is run captures only the assert
itself.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
[sean: rewrite changelog]
Link: https://lore.kernel.org/r/20230407233254.957013-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add helper macros to consolidate the asserts that a PMC is/isn't counting
(branch) instructions retired. This will make it easier to add additional
asserts related to counting instructions later on.
No functional changes intended.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
[sean: add "INSTRUCTIONS", massage changelog]
Link: https://lore.kernel.org/r/20230407233254.957013-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Split out the common parts of the Intel and AMD guest code in the PMU
event filter test into a helper function. This is in preparation for
adding additional counters to the test.
No functional changes intended.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230407233254.957013-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
An error snuck in between two recent conflicting changes:
Until recently ->setup() used negative values to indicate
normal test termination. This was changed in
commit fa10366cc6 ("selftests/resctrl: Allow ->setup() to return
errors") that transitioned ->setup() to use negative values
to indicate errors and a new END_OF_TESTS to indicate normal
termination.
commit 42e3b093eb ("selftests/resctrl: Fix set up schemata with 100%
allocation on first run in MBM test") continued to use
negative return to indicate normal test termination.
Fix mbm_setup() to use the new END_OF_TESTS to indicate
error-free test termination.
Fixes: 42e3b093eb ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test")
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/lkml/bb65cce8-54d7-68c5-ef19-3364ec95392a@linux.intel.com/
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Some distros ship with older vm_sockets.h that doesn't have VMADDR_CID_LOCAL
which causes selftests build to fail:
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c:261:18: error: ‘VMADDR_CID_LOCAL’ undeclared (first use in this function); did you mean ‘VMADDR_CID_HOST’?
261 | addr->svm_cid = VMADDR_CID_LOCAL;
| ^~~~~~~~~~~~~~~~
| VMADDR_CID_HOST
Workaround this issue by defining it on demand.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDhSiwAKCRDbK58LschI
g8cbAQCH4xrquOeDmYyGXFQGchHZAIj++tKg8ABU4+hYeJtrlwEA6D4W6wjoSZRk
mLSptZ9qro8yZA86BvyPvlBT1h9ELQA=
=StAc
-----END PGP SIGNATURE-----
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-04-13
We've added 260 non-merge commits during the last 36 day(s) which contain
a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).
The main changes are:
1) Rework BPF verifier log behavior and implement it as a rotating log
by default with the option to retain old-style fixed log behavior,
from Andrii Nakryiko.
2) Adds support for using {FOU,GUE} encap with an ipip device operating
in collect_md mode and add a set of BPF kfuncs for controlling encap
params, from Christian Ehrig.
3) Allow BPF programs to detect at load time whether a particular kfunc
exists or not, and also add support for this in light skeleton,
from Alexei Starovoitov.
4) Optimize hashmap lookups when key size is multiple of 4,
from Anton Protopopov.
5) Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps, from David Vernet.
6) Add support for stashing local BPF kptr into a map value via
bpf_kptr_xchg(). This is useful e.g. for rbtree node creation
for new cgroups, from Dave Marchevsky.
7) Fix BTF handling of is_int_ptr to skip modifiers to work around
tracing issues where a program cannot be attached, from Feng Zhou.
8) Migrate a big portion of test_verifier unit tests over to
test_progs -a verifier_* via inline asm to ease {read,debug}ability,
from Eduard Zingerman.
9) Several updates to the instruction-set.rst documentation
which is subject to future IETF standardization
(https://lwn.net/Articles/926882/), from Dave Thaler.
10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register
known bits information propagation, from Daniel Borkmann.
11) Add skb bitfield compaction work related to BPF with the overall goal
to make more of the sk_buff bits optional, from Jakub Kicinski.
12) BPF selftest cleanups for build id extraction which stand on its own
from the upcoming integration work of build id into struct file object,
from Jiri Olsa.
13) Add fixes and optimizations for xsk descriptor validation and several
selftest improvements for xsk sockets, from Kal Conley.
14) Add BPF links for struct_ops and enable switching implementations
of BPF TCP cong-ctls under a given name by replacing backing
struct_ops map, from Kui-Feng Lee.
15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable
offset stack read as earlier Spectre checks cover this,
from Luis Gerhorst.
16) Fix issues in copy_from_user_nofault() for BPF and other tracers
to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner
and Alexei Starovoitov.
17) Add --json-summary option to test_progs in order for CI tooling to
ease parsing of test results, from Manu Bretelle.
18) Batch of improvements and refactoring to prep for upcoming
bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator,
from Martin KaFai Lau.
19) Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations,
from Quentin Monnet.
20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting
the module name from BTF of the target and searching kallsyms of
the correct module, from Viktor Malik.
21) Improve BPF verifier handling of '<const> <cond> <non_const>'
to better detect whether in particular jmp32 branches are taken,
from Yonghong Song.
22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock.
A built-in cc or one from a kernel module is already able to write
to app_limited, from Yixin Shen.
Conflicts:
Documentation/bpf/bpf_devel_QA.rst
b7abcd9c65 ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
0f10f647f4 ("bpf, docs: Use internal linking for link to netdev subsystem doc")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/
include/net/ip_tunnels.h
bc9d003dc4 ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts")
ac931d4cde ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices")
https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/
net/bpf/test_run.c
e5995bc7e2 ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
294635a816 ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES")
https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/
====================
Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not all that quiet given spring celebrations, but "current" fixes
are thinning out, which is encouraging. One outstanding regression
in the mlx5 driver when using old FW, not blocking but we're pushing
for a fix.
Current release - new code bugs:
- eth: enetc: workaround for unresponsive pMAC after receiving
express traffic
Previous releases - regressions:
- rtnetlink: restore RTM_NEW/DELLINK notification behavior,
keep the pid/seq fields 0 for backward compatibility
Previous releases - always broken:
- sctp: fix a potential overflow in sctp_ifwdtsn_skip
- mptcp:
- use mptcp_schedule_work instead of open-coding it and make
the worker check stricter, to avoid scheduling work on closed
sockets
- fix NULL pointer dereference on fastopen early fallback
- skbuff: fix memory corruption due to a race between skb coalescing
and releasing clones confusing page_pool reference counting
- bonding: fix neighbor solicitation validation on backup slaves
- bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp
- bpf: arm64: fixed a BTI error on returning to patched function
- openvswitch: fix race on port output leading to inf loop
- sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
returning a different errno than expected
- phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove
- Bluetooth: fix printing errors if LE Connection times out
- Bluetooth: assorted UaF, deadlock and data race fixes
- eth: macb: fix memory corruption in extended buffer descriptor mode
Misc:
- adjust the XDP Rx flow hash API to also include the protocol layers
over which the hash was computed
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQ4ZrsACgkQMUZtbf5S
IruWUQ/9F+HlnHf3Sv08zGlnV++vLaJ/Ld8C2YNYNxRwuoJvcCyikQ/ZfUKdKAoS
kCf0XqFD2SMl8wHpCQBhK4uXvKBdBMx6L6wEp7dbDciGl/+5yihe9opBBXKekWbB
ULRZcZE7RACri/QsXQhD7Y8p530xnYWQXO8ZMjR3vOAWxplJtBDNDnXi7hqtxQpW
Vzwl1ntvD1msmxhvy0UZrgeesL8R3UckFvqYEqnINeyd8E8HB1dAOg899/ZPUbjA
UoEw5VsSBSr9DE7+Fs6uD8trBxQ1CrdRVJjhRhwivHk8/Ro7dIzjcVV30ei3wucz
0RiNvCqypkeLeRrcVlSk8lR5r9FBGvhDMFbcGM8lHnxSc0WB+Sj+2iup4fpTE8/p
VUIvhhzuBuXU4Sc022pm6BL5DBSKdnJRquFq6XCTwnQM6v7fvzu1yWNXsQom8Nbi
1/ZcFdj27FHwMpU0JPZ4PFxT7Ta830UyulVZuyWA+zEzlN7DvW3O7bGQC72GEuID
Xc58D4kVtywzbntFmUjuhXCD/6vvD5WW5orLpMWg5SH9F14prt3C9OFSpTwTTq+W
uPBEslhnhhCPecTNo2iFPLX3bN67n8KDVUWua1AHaqmcK7QFGs0njJGGLpFdHMll
SuNgrNrtQE9EHH8XL6VbSD2zf35ZfoKVg6qvL3oeLzXkGkZrnls=
=W+J2
-----END PGP SIGNATURE-----
Merge tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, and bluetooth.
Not all that quiet given spring celebrations, but "current" fixes are
thinning out, which is encouraging. One outstanding regression in the
mlx5 driver when using old FW, not blocking but we're pushing for a
fix.
Current release - new code bugs:
- eth: enetc: workaround for unresponsive pMAC after receiving
express traffic
Previous releases - regressions:
- rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
pid/seq fields 0 for backward compatibility
Previous releases - always broken:
- sctp: fix a potential overflow in sctp_ifwdtsn_skip
- mptcp:
- use mptcp_schedule_work instead of open-coding it and make the
worker check stricter, to avoid scheduling work on closed
sockets
- fix NULL pointer dereference on fastopen early fallback
- skbuff: fix memory corruption due to a race between skb coalescing
and releasing clones confusing page_pool reference counting
- bonding: fix neighbor solicitation validation on backup slaves
- bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp
- bpf: arm64: fixed a BTI error on returning to patched function
- openvswitch: fix race on port output leading to inf loop
- sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
returning a different errno than expected
- phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove
- Bluetooth: fix printing errors if LE Connection times out
- Bluetooth: assorted UaF, deadlock and data race fixes
- eth: macb: fix memory corruption in extended buffer descriptor mode
Misc:
- adjust the XDP Rx flow hash API to also include the protocol layers
over which the hash was computed"
* tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
xdp: rss hash types representation
selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
skbuff: Fix a race between coalescing and releasing SKBs
net: macb: fix a memory corruption in extended buffer descriptor mode
selftests: add the missing CONFIG_IP_SCTP in net config
udp6: fix potential access to stale information
selftests: openvswitch: adjust datapath NL message declaration
selftests: mptcp: userspace pm: uniform verify events
mptcp: fix NULL pointer dereference on fastopen early fallback
mptcp: stricter state check in mptcp_worker
mptcp: use mptcp_schedule_work instead of open-coding it
net: enetc: workaround for unresponsive pMAC after receiving express traffic
sctp: fix a potential overflow in sctp_ifwdtsn_skip
net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
rtnetlink: Restore RTM_NEW/DELLINK notification behavior
net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
...
Update BPF selftests to use the new RSS type argument for kfunc
bpf_xdp_metadata_rx_hash.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132894068.340624.8914711185697163690.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.
Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.
Add counters for providing remaining information about failure and
skipped packet events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Before exiting each test function(run_cmt/cat/mbm/mba_test()),
test results("ok","not ok") are printed by ksft_test_result() and then
temporary result files are cleaned by function
cmt/cat/mbm/mba_test_cleanup().
However, before running ksft_test_result(),
function cmt/cat/mbm/mba_test_cleanup()
has been run in each test function as follows:
cmt_resctrl_val()
cat_perf_miss_val()
mba_schemata_change()
mbm_bw_change()
Remove duplicate codes that clear each test result file,
while ensuring cleanup properly even when errors occur in each test.
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
After creating a child process with fork() in CAT test, if a signal such
as SIGINT is received, the parent process will be terminated immediately,
and therefore the child process will not be killed and also resctrlfs is
not unmounted.
There is a signal handler registered in CMT/MBM/MBA tests, which kills
child process, unmount resctrlfs, cleanups result files, etc., if a
signal such as SIGINT is received.
Commonize the signal handler registered for CMT/MBM/MBA tests and
reuse it in CAT.
To reuse the signal handler to kill child process use global bm_pid
instead of local bm_pid.
Also, since the MBA/MBA/CMT/CAT are run in order, unregister the signal
handler at the end of each test so that the signal handler cannot be
inherited by other tests.
Reviewed-by: Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
After creating a child process with fork() in CAT test, if an error
occurs when parent process runs cat_val() or check_results(), the child
process will not be killed and also resctrlfs is not unmounted. Also if
an error occurs when child process runs cat_val() or check_results(),
the parent process will wait for the pipe message from the child process
which will never be sent by the child process and the parent process
cannot proceed to unmount resctrlfs.
Synchronize the exits between the parent and child. An error could
occur whether in parent process or child process. The parent process
always kills the child process and runs umount_resctrlfs(). The
child process always waits to be killed by the parent process.
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When a process has buffered output, a child process created by fork()
will also copy buffered output. When using kselftest framework,
the output (resctrl test result message) will be printed multiple times.
Add fflush() to flush out the buffered output before executing fork().
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since MBA check result is not returned, the MBA test result message
is always output as "ok" regardless of whether the MBA check result is
true or false.
Make output message to be "not ok" if MBA check result is failed.
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
There is a comment "Set up shemata with 100% allocation on the first run"
in function mbm_setup(), but there is an increment bug and the condition
"num_of_runs == 0" will never be met and write_schemata() will never be
called to set schemata to 100%. Even if write_schemata() is called in MBM
test, since it is not supported for MBM test it does not set the schemata.
This is currently fine because resctrl_val_parm->mum_resctrlfs is always 1
and umount/mount will be run in each test to set the schemata to 100%.
To support the usage when MBM test does not unmount/remount resctrl
filesystem before the test starts, fix to call write_schemata() and
set schemata properly when the function is called for the first time.
Also, remove static local variable 'num_of_runs' because this is not
needed as there is resctrl_val_param->num_of_runs which should be used
instead like in cat_setup().
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Use ksft_finished() after running tests so that resctrl_tests doesn't
return exit code 0 when tests fail.
Consequently, report the MBA and MBM tests as skipped when running on
non-Intel hardware, otherwise resctrl_tests will exit with a failure
code.
Signed-off-by: Peter Newman <peternewman@google.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The selftest sctp_vrf needs CONFIG_IP_SCTP set in config
when building the kernel, so add it.
Fixes: a61bd7b9fe ("selftests: add a selftest for sctp vrf")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/61dddebc4d2dd98fe7fb145e24d4b2430e42b572.1681312386.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netlink message for creating a new datapath takes an array
of ports for the PID creation. This shouldn't cause much issue
but correct it for future cases where we need to do decode of
datapath information that could include the per-cpu PID map.
Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230412115828.3991806-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simply adding a "sleep" before checking something is usually not a good
idea because the time that has been picked can not be enough or too
much. The best is to wait for events with a timeout.
In this selftest, 'sleep 0.5' is used more than 40 times. It is always
used before calling a 'verify_*' function except for this
verify_listener_events which has been added later.
At the end, using all these 'sleep 0.5' seems to work: the slow CIs
don't complain so far. Also because it doesn't take too much time, we
can just add two more 'sleep 0.5' to uniform what is done before calling
a 'verify_*' function. For the same reasons, we can also delay a bigger
refactoring to replace all these 'sleep 0.5' by functions waiting for
events instead of waiting for a fix time and hope for the best.
Fixes: 6c73008aa3 ("selftests: mptcp: listener test for userspace PM")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
test_prog's prog_tests/verifier_log.c is superseding test_verifier_log
stand-alone test. It cover same checks and adds more, and is also
integrated into test_progs test runner.
Just remove test_verifier_log.c.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230412170655.1866831-1-andrii@kernel.org
skel->links.oncpu is leaked in one case. This causes test perf_branches
fails when it runs after get_stackid_cannot_attach:
./test_progs -t get_stackid_cannot_attach,perf_branches
84 get_stackid_cannot_attach:OK
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_good_sample:FAIL:output not valid no valid sample from prog
146/1 perf_branches/perf_branches_hw:FAIL
146/2 perf_branches/perf_branches_no_hw:OK
146 perf_branches:FAIL
All error logs:
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_good_sample:FAIL:output not valid no valid sample from prog
146/1 perf_branches/perf_branches_hw:FAIL
146 perf_branches:FAIL
Summary: 1/1 PASSED, 0 SKIPPED, 1 FAILED
Fix this by adding the missing bpf_link__destroy().
Fixes: 346938e938 ("selftests/bpf: Add get_stackid_cannot_attach")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230412210423.900851-3-song@kernel.org
Currently, perf_event sample period in perf_event_stackmap is set too low
that the test fails randomly. Fix this by using the max sample frequency,
from read_perf_max_sample_freq().
Move read_perf_max_sample_freq() to testing_helpers.c. Replace the CHECK()
with if-printf, as CHECK is not available in testing_helpers.c.
Fixes: 1da4864c2b ("selftests/bpf: Add callchain_stackid")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230412210423.900851-2-song@kernel.org
One of the test assertions uses an uninitialized op_name, which leads
to some headscratching if it fails. Use a string constant instead.
Fixes: b1a7a480a1 ("selftests/bpf: Add fixed vs rotating verifier log tests")
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230413094740.18041-1-lmb@isovalent.com
Add tests for FOU and GUE encapsulation via the bpf_skb_{set,get}_fou_encap
kfuncs, using ipip devices in collect-metadata mode.
These tests make sure that we can successfully set and obtain FOU and GUE
encap parameters using ingress / egress BPF tc-hooks.
Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Link: https://lore.kernel.org/r/040193566ddbdb0b53eb359f7ac7bbd316f338b5.1680874078.git.cehrig@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that bpf_cgroup_acquire() is KF_RCU | KF_RET_NULL,
bpf_cgroup_kptr_get() is redundant. Let's remove it, and update
selftests to instead use bpf_cgroup_acquire() where appropriate. The
next patch will update the BPF documentation to not mention
bpf_cgroup_kptr_get().
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230411041633.179404-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
struct cgroup is already an RCU-safe type in the verifier. We can
therefore update bpf_cgroup_acquire() to be KF_RCU | KF_RET_NULL, and
subsequently remove bpf_cgroup_kptr_get(). This patch does the first of
these by updating bpf_cgroup_acquire() to be KF_RCU | KF_RET_NULL, and
also updates selftests accordingly.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230411041633.179404-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
This one gets skipped when run by vmtest.sh as we currently need to test
against actual kernel modules (.ko), not built-in to fetch the list
of supported devices.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
The code is taken from [1] to fix a change in v6.3.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Link: https://gitlab.freedesktop.org/libevdev/hid-tools/-/merge_requests/143 [1]
Cc: Roderick Colenbrander <roderick.colenbrander@sony.com>
Cc: Jose Torreguitar <jtguitar@google.com>
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Peter Hutterer <peter.hutterer@who-t.net>
Cc: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Jason Gerecke <killertofu@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
There are a lot of multitouch tests, and the default timeout of 45 seconds
is not big enough. Bump it to 200 seconds.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Peter Hutterer <peter.hutterer@who-t.net>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Roderick Colenbrander <roderick.colenbrander@sony.com>
Cc: наб <nabijaczleweli@nabijaczleweli.xyz>
Cc: Blaž Hrastnik <blaz@mxxn.io>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Peter Hutterer <peter.hutterer@who-t.net>
Cc: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Peter Hutterer <peter.hutterer@who-t.net>
Cc: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Candle Sun <candle.sun@unisoc.com>
Cc: Jose Torreguitar <jtguitar@google.com>
Cc: Peter Hutterer <peter.hutterer@who-t.net>
Cc: Roderick Colenbrander <roderick.colenbrander@sony.com>
Cc: Silvan Jegen <s.jegen@gmail.com>
Signed-off-by: Silvan Jegen <s.jegen@gmail.com>
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
These tests have been developed in the hid-tools[0] tree for a while.
Now that we have a proper selftests/hid kernel entry and that the tests
are more reliable, it is time to directly include those in the kernel
tree.
I haven't imported all of hid-tools, the python module, but only the
tests related to the kernel. We can rely on pip to fetch the latest
hid-tools release, and then run the tests directly from the tree.
This should now be easier to request tests when something is not behaving
properly in the HID subsystem.
[0] https://gitlab.freedesktop.org/libevdev/hid-tools
Cc: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Having a default binary is simple enough, but this also means that
we need to keep the targets in sync as we are adding them in the Makefile.
So instead of doing that manual work, make vmtest.sh generic enough to
actually be capable of running 'make -C tools/testing/selftests/hid'.
The new image we use has make installed, which the base fedora image
doesn't.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Now that kselftest.h can be used with nolibc convert the za-fork test to
use it. We do still have to open code ksft_print_msg() but that's not the
end of the world. Some of the advantage comes from using printf() which we
could have been using already.
This does change the output when tests are skipped, bringing it in line
with the standard kselftest output by removing the test name - we move
from
ok 0 skipped
to
ok 1 # SKIP fork_test
The old output was not following KTAP format for skips, and the
numbering was not standard or consistent with the reported plan.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Rather than providing headers for inclusion which replace any offered by
the system nolibc is provided in the form of a header which should be added
to the build via the compiler command line. In order to build with nolibc
we need to not include the standard C headers, especially not stdio.h where
the definitions of stdout, stdin and stderr will actively conflict with
nolibc.
Add an include guard which suppresses the inclusion of the standard headers
when building with nolibc, allowing us to build tests using the nolibc
headers. This allows us to avoid open coding of KTAP output for
selftests that need to use nolibc in order to test interfaces that are
controlled by libc.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Adding verifier test for accessing u32 pointer argument in
tracing programs.
The test program loads 1nd argument of bpf_fentry_test9 function
which is u32 pointer and checks that verifier allows that.
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230410085908.98493-3-zhoufeng.zf@bytedance.com
Check both architectural rules and KVM's ABI for KVM_GET_SUPPORTED_CPUID
to ensure the supported xfeatures[1] don't violate any of them.
The architectural rules[2] and KVM's contract with userspace ensure for a
given feature, e.g. sse, avx, amx, etc... their associated xfeatures are
either all sets or none of them are set, and any dependencies are enabled
if needed.
[1] EDX:EAX of CPUID.(EAX=0DH,ECX=0)
[2] SDM vol 1, 13.3 ENABLING THE XSAVE FEATURE SET AND XSAVE-ENABLED
FEATURES
Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
[sean: expand comments, use a fancy X86_PROPERTY]
Reviewed-by: Aaron Lewis <aaronlewis@google.com>
Tested-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230405004520.421768-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add all known XFEATURE masks to processor.h to make them more broadly
available in KVM selftests. Relocate and clean up the exiting AMX (XTILE)
defines in processor.h, e.g. drop the intermediate define and use BIT_ULL.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Aaron Lewis <aaronlewis@google.com>
Tested-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230405004520.421768-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Take the XFeature mask in __vm_xsave_require_permission() instead of the
bit so that there's no need to define macros for both the bit and the
mask. Asserting that only a single bit is set and retrieving said bit
is easy enough via log2 helpers.
Opportunistically clean up the error message for the
ARCH_REQ_XCOMP_GUEST_PERM sanity check.
Reviewed-by: Aaron Lewis <aaronlewis@google.com>
Tested-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230405004520.421768-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The instructions XGETBV and XSETBV are useful to other tests. Move
them to processor.h to make them more broadly available.
No functional change intended.
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
[sean: reword shortlog]
Reviewed-by: Aaron Lewis <aaronlewis@google.com>
Tested-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230405004520.421768-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add verifier log tests for BPF_BTF_LOAD command, which are very similar,
conceptually, to BPF_PROG_LOAD tests. These are two separate commands
dealing with verbose verifier log, so should be both tested separately.
Test that log_buf==NULL condition *does not* return -ENOSPC.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-20-andrii@kernel.org
Add few extra test conditions to validate that it's ok to pass
log_buf==NULL and log_size==0 to BPF_PROG_LOAD command with the intent
to get log_true_size without providing a buffer.
Test that log_buf==NULL condition *does not* return -ENOSPC.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-19-andrii@kernel.org
Add additional test cases validating that log_true_size is consistent
between fixed and rotating log modes, and that log_true_size can be
used *exactly* without causing -ENOSPC, while using just 1 byte shorter
log buffer would cause -ENOSPC.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-18-andrii@kernel.org
Add selftests validating BPF_LOG_FIXED behavior, which used to be the
only behavior, and now default rotating BPF verifier log, which returns
just up to last N bytes of full verifier log, instead of returning
-ENOSPC.
To stress test correctness of in-kernel verifier log logic, we force it
to truncate program's verifier log to all lengths from 1 all the way to
its full size (about 450 bytes today). This was a useful stress test
while developing the feature.
For both fixed and rotating log modes we expect -ENOSPC if log contents
doesn't fit in user-supplied log buffer.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-7-andrii@kernel.org
Add --log-size to be able to customize log buffer sent to bpf() syscall
for BPF program verification logging.
Add --log-fixed to enforce BPF_LOG_FIXED behavior for BPF verifier log.
This is useful in unlikely event that beginning of truncated verifier
log is more important than the end of it (which with rotating verifier
log behavior is the default now).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-6-andrii@kernel.org
Currently, if user-supplied log buffer to collect BPF verifier log turns
out to be too small to contain full log, bpf() syscall returns -ENOSPC,
fails BPF program verification/load, and preserves first N-1 bytes of
the verifier log (where N is the size of user-supplied buffer).
This is problematic in a bunch of common scenarios, especially when
working with real-world BPF programs that tend to be pretty complex as
far as verification goes and require big log buffers. Typically, it's
when debugging tricky cases at log level 2 (verbose). Also, when BPF program
is successfully validated, log level 2 is the only way to actually see
verifier state progression and all the important details.
Even with log level 1, it's possible to get -ENOSPC even if the final
verifier log fits in log buffer, if there is a code path that's deep
enough to fill up entire log, even if normally it would be reset later
on (there is a logic to chop off successfully validated portions of BPF
verifier log).
In short, it's not always possible to pre-size log buffer. Also, what's
worse, in practice, the end of the log most often is way more important
than the beginning, but verifier stops emitting log as soon as initial
log buffer is filled up.
This patch switches BPF verifier log behavior to effectively behave as
rotating log. That is, if user-supplied log buffer turns out to be too
short, verifier will keep overwriting previously written log,
effectively treating user's log buffer as a ring buffer. -ENOSPC is
still going to be returned at the end, to notify user that log contents
was truncated, but the important last N bytes of the log would be
returned, which might be all that user really needs. This consistent
-ENOSPC behavior, regardless of rotating or fixed log behavior, allows
to prevent backwards compatibility breakage. The only user-visible
change is which portion of verifier log user ends up seeing *if buffer
is too small*. Given contents of verifier log itself is not an ABI,
there is no breakage due to this behavior change. Specialized tools that
rely on specific contents of verifier log in -ENOSPC scenario are
expected to be easily adapted to accommodate old and new behaviors.
Importantly, though, to preserve good user experience and not require
every user-space application to adopt to this new behavior, before
exiting to user-space verifier will rotate log (in place) to make it
start at the very beginning of user buffer as a continuous
zero-terminated string. The contents will be a chopped off N-1 last
bytes of full verifier log, of course.
Given beginning of log is sometimes important as well, we add
BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows
tools like veristat to request first part of verifier log, if necessary.
BPF_LOG_FIXED flag is also a simple and straightforward way to check if
BPF verifier supports rotating behavior.
On the implementation side, conceptually, it's all simple. We maintain
64-bit logical start and end positions. If we need to truncate the log,
start position will be adjusted accordingly to lag end position by
N bytes. We then use those logical positions to calculate their matching
actual positions in user buffer and handle wrap around the end of the
buffer properly. Finally, right before returning from bpf_check(), we
rotate user log buffer contents in-place as necessary, to make log
contents contiguous. See comments in relevant functions for details.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-4-andrii@kernel.org
get_llc_perf() function comment refers to cpu_no parameter that does
not exist.
Correct get_llc_perf() the comment to document llc_perf_miss instead.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
remount_resctrlfs() accepts a boolean value as an argument. Some tests
pass 0/1 and some tests pass true/false.
Make all the callers of remount_resctrlfs() use true/false so that the
parameter usage is consistent across tests.
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
CBM_MASK_PATH is actually the path to resctrl/info.
Change the macro name to correctly indicate what it represents.
[ ij: Tweaked the changelog. ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
initialize_llc_perf() unconditionally returns 0.
initialize_llc_perf() performs only memory initialization, none of
which can fail.
Change the return type from int to void to accurately reflect that its
return value doesn't need to be checked. Remove the error checking from
the only callsite.
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
MBA test case writes schemata but it does not check if the write is
successful or not.
Add the error check and return error properly.
Fixes: 01fee6b4d1 ("selftests/resctrl: Add MBA test")
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
resctrl_val() assumes ->setup() always returns either 0 to continue
tests or < 0 in case of the normal termination of tests after x runs.
The latter overlaps with normal error returns.
Define END_OF_TESTS (=1) to differentiate the normal termination of
tests and return errors as negative values. Alter callers of ->setup()
to handle errors properly.
Fixes: 790bf585b0 ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
Fixes: ecdbb911f2 ("selftests/resctrl: Add MBM test")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
resctrl_val() function is called only by MBM, MBA, and CMT tests which
means the else branch is never used.
Both test branches call param->setup().
Remove the unused else branch and place the ->setup() call outside of
the test specific branches reducing code duplication.
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
malloc_and_init_memory() in fill_buf isn't checking if memalign()
successfully allocated memory or not before accessing the memory.
Check the return value of memalign() and return NULL if allocating
aligned memory fails.
Fixes: a2561b12fe ("selftests/resctrl: Add built in benchmark")
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When trying to add a name to the hashmap, an error code of EEXIST is
returned and we continue as names are possibly duplicated in the sys
file.
If the last name in the file is a duplicate, we will continue to the
next iteration of the while loop, and exit the loop with a value of err
set to EEXIST and enter the error label with err set, which causes the
test to fail when it should not.
This change reset err to 0 before continue-ing into the next iteration,
this way, if there is no more data to read from the file we iterate
through, err will be set to 0.
Behaviour prior to this change:
```
test_kprobe_multi_bench_attach:FAIL:get_syms unexpected error: -17
(errno 2)
All error logs:
test_kprobe_multi_bench_attach:FAIL:get_syms unexpected error: -17
(errno 2)
Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
```
After this change:
```
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
```
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230408022919.54601-1-chantr4@gmail.com
23 are cc:stable and the other 5 address issues which were introduced
during this merge cycle.
20 are for MM and the remainder are for other subsystems.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZDCmIAAKCRDdBJ7gKXxA
jhZuAQDn8ErAotUpLn1Pq6WU1liPenGoraBo/a2ubpOjguSINwD+J7L85vgVmA78
YzoKHObW18yBW7JSzpWZ2zw8q2gLQwQ=
=a1n7
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"28 hotfixes.
23 are cc:stable and the other five address issues which were
introduced during this merge cycle.
20 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: fix a potential concurrency bug in RCU mode
maple_tree: fix get wrong data_end in mtree_lookup_walk()
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
nilfs2: fix sysfs interface lifetime
mm: take a page reference when removing device exclusive entries
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
zsmalloc: document freeable stats
zsmalloc: document new fullness grouping
fsdax: force clear dirty mark if CoW
mm/hugetlb: fix uffd wr-protection for CoW optimization path
mm: enable maple tree RCU mode by default
maple_tree: add RCU lock checking to rcu callback functions
maple_tree: add smp_rmb() to dead node detection
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: detect dead nodes in mas_start()
maple_tree: be more cautious about dead nodes
...
There is a spelling mistake in a test assert message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230406080226.122955-1-colin.i.king@gmail.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDCaHgAKCRDbK58LschI
g2CXAP9sgjqCaRhfSSYWbESKWxokJAa1j6v7phNjR9iqHrBjzwEAg6aDHOUqcYpD
Zlp/5JV9HIkc1wTmuIUuI74YAVZBJAU=
=V2jo
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-04-08
We've added 4 non-merge commits during the last 11 day(s) which contain
a total of 5 files changed, 39 insertions(+), 6 deletions(-).
The main changes are:
1) Fix BPF TCP socket iterator to use correct helper for dropping
socket's refcount, that is, sock_gen_put instead of sock_put,
from Martin KaFai Lau.
2) Fix a BTI exception splat in BPF trampoline-generated code on arm64,
from Xu Kuohai.
3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo.
4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest,
from Lorenzo Bianconi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver
bpf, arm64: Fixed a BTI error on returning to patched function
LoongArch, bpf: Fix jit to skip speculation barrier opcode
bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
====================
Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The following example forces veristat to loop indefinitely:
$ cat two-ok
file_name,prog_name,verdict,total_states
file-a,a,success,12
file-b,b,success,67
$ cat add-failure
file_name,prog_name,verdict,total_states
file-a,a,success,12
file-b,b,success,67
file-b,c,failure,32
$ veristat -C two-ok add-failure
<does not return>
The loop is caused by handle_comparison_mode() not checking if `base`
variable points to `fallback_stats` prior advancing joined results
using `base`.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230407154125.896927-1-eddyz87@gmail.com
perf_event with type=PERF_TYPE_RAW and config=0x1b00 turned out to be not
reliable in ensuring LBR is active. Thus, test_progs:get_branch_snapshot is
not reliable in some systems. Replace it with PERF_COUNT_HW_CPU_CYCLES
event, which gives more consistent results.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230407190130.2093736-1-song@kernel.org
This patch add bonding arp validate tests with mode active backup,
monitor arp_ip_target and ns_ip6_target. It also checks mii_status
to make sure all slaves are UP.
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve the testing process for bond options, A new bond topology lib
is added to our testing setup. The current option_prio.sh file will be
renamed to bond_options.sh so that all bonding options can be tested here.
Specifically, for priority testing, we will run all tests using modes
1, 5, and 6. These changes will help us streamline the testing process
and ensure that our bond options are rigorously evaluated.
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Running this test makes little sense if the enabled l3_stats are not
actually reported as "used". This can signify a failure of a driver to
install the necessary counters, or simply lack of support for enabling
in-HW counters on a given netdevice. It is generally impossible to tell
from the outside which it is. But more likely than not, if somebody is
running this on veth pairs, they do not intend to actually test that a
certain piece of HW can install in-HW counters for the veth. It is more
likely they are e.g. running the test by mistake.
Therefore detect that the counter has not been actually installed. In that
case, if the netdevice is one end of a veth pair, SKIP. Otherwise FAIL.
Suggested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/a86817961903cca5cb0aebf2b2a06294b8aa7dea.1680704172.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add various tests for code pattern '<const> <cond_op> <non_const>' to
exercise the previous verifier patch.
The following are veristat changed number of processed insns stat
comparing the previous patch vs. this patch:
File Program Insns (A) Insns (B) Insns (DIFF)
----------------------------------------------------- ---------------------------------------------------- --------- --------- -------------
test_seg6_loop.bpf.linked3.o __add_egr_x 12423 12314 -109 (-0.88%)
Only one program is affected with minor change.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230406164510.1047757-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, the verifier does not handle '<const> <cond_op> <non_const>' well.
For example,
...
10: (79) r1 = *(u64 *)(r10 -16) ; R1_w=scalar() R10=fp0
11: (b7) r2 = 0 ; R2_w=0
12: (2d) if r2 > r1 goto pc+2
13: (b7) r0 = 0
14: (95) exit
15: (65) if r1 s> 0x1 goto pc+3
16: (0f) r0 += r1
...
At insn 12, verifier decides both true and false branch are possible, but
actually only false branch is possible.
Currently, the verifier already supports patterns '<non_const> <cond_op> <const>.
Add support for patterns '<const> <cond_op> <non_const>' in a similar way.
Also fix selftest 'verifier_bounds_mix_sign_unsign/bounds checks mixing signed and unsigned, variant 10'
due to this change.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230406164505.1046801-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add various tests for code pattern '<non-const> NE/EQ <const>' implemented
in the previous verifier patch. Without the verifier patch, these new
tests will fail.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230406164500.1045715-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verify that disabling the guest's vPMU via CPUID also disables LBRs.
KVM has had at least one bug where LBRs would remain enabled even though
the intent was to disable everything PMU related.
Link: https://lore.kernel.org/r/20230311004618.920745-22-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rework the LBR format test to use the bitfield instead of a separate
mask macro, mainly so that adding a nearly-identical PEBS format test
doesn't have to copy-paste-tweak the macro too.
No functional change intended.
Link: https://lore.kernel.org/r/20230311004618.920745-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that KVM disallows changing PERF_CAPABILITIES after KVM_RUN, expand
the host side checks to verify KVM rejects any attempts to change bits
from userspace.
Link: https://lore.kernel.org/r/20230311004618.920745-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Test that the guest can't write 0 to PERF_CAPABILITIES, can't write the
current value, and can't toggle _any_ bits. There is no reason to special
case the LBR format.
Link: https://lore.kernel.org/r/20230311004618.920745-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add negative testing of all immutable bits in PERF_CAPABILITIES, i.e.
single bits that are reserved-0 or are effectively reserved-1 by KVM.
Omit LBR and PEBS format bits from the test as it's easier to test them
manually than it is to add safeguards to the comment path, e.g. toggling
a single bit can yield a format of '0', which is legal as a "disable"
value.
Link: https://lore.kernel.org/r/20230311004618.920745-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that vcpu_set_msr() verifies the expected "read what was wrote"
semantics of all durable MSRs, including PERF_CAPABILITIES, drop the
now-redundant manual checks in the VMX PMU caps test.
Link: https://lore.kernel.org/r/20230311004618.920745-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Assert that KVM provides "read what you wrote" semantics for all "durable"
MSRs (for lack of a better name). The extra coverage is cheap from a
runtime performance perspective, and verifying the behavior in the common
helper avoids gratuitous copy+paste in individual tests.
Note, this affects all tests that set MSRs from userspace!
Link: https://lore.kernel.org/r/20230311004618.920745-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reimplement vcpu_set_msr() as a macro and pretty print the failing MSR
(when possible) and the value if KVM_SET_MSRS fails instead of using the
using the standard KVM_IOCTL_ERROR(). KVM_SET_MSRS is somewhat odd in
that it returns the index of the last successful write, i.e. will be
'0' on failure barring an entirely different KVM bug. And for writing
MSRs, the MSR being written and the value being written are almost always
relevant to the failure, i.e. just saying "failed!" doesn't help debug.
Place the string goo in a separate macro in anticipation of using it to
further expand MSR testing.
Link: https://lore.kernel.org/r/20230311004618.920745-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use a separate sub-test to verify userspace can clear PERF_CAPABILITIES
and restore it to the KVM-supported value, as the testcase isn't unique
to the LBR format.
Link: https://lore.kernel.org/r/20230311004618.920745-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Split the PERF_CAPABILITIES subtests into two parts so that the LBR format
testcases don't execute after KVM_RUN. Similar to the guest CPUID model,
KVM will soon disallow changing PERF_CAPABILITIES after KVM_RUN, at which
point attempting to set the MSR after KVM_RUN will yield false positives
and/or false negatives depending on what the test is trying to do.
Land the LBR format test in a more generic "immutable features" test in
anticipation of expanding its scope to other immutable features.
Link: https://lore.kernel.org/r/20230311004618.920745-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
-eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP pkts
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQu4qgACgkQMUZtbf5S
Irv7fA//elLM+YvGQDPgGs3KDZVnb5vnGTEPosc6mCWsYqR6EBxk6sf89yqk31xg
IYbzOGXqkmi5ozhdjnNaFRGCtb+mBluV3oSPm8pM8d0NcuZta7MPPhduguEfnMS9
FcI98bxmzSXPIRzG/sCrc/tzedhepcAMlN80PtTzkxSUFlxA7z+vniatVymOZQtt
MSWPa9gXl1Keon7DBzGvHlZtOK1ptDjti5cp81zw/bA20wArCEm3Zg99Xz2r9rYp
eAF+KqKoclKieGUbJ7lXQIxWrHrFRznPoMbvW/ofU6JXQFi8KOh0zqJFIi9VnU0D
EdtZxOgLXuLcjvKj8ijKFdIA5OFqMA65pWs2t2foBR9C0DVle8LztGpyZODf0huT
agK9ZgM3av6jLzMe8CtJpz31nsWL1s4f3njM1PRucF/jTso72RWUdAx1fBurcnXm
45MK+uS0aAGch6cFT7mHqUAniGUakR+NPChA7ecn5iMetasinEWRLFxw0eQXEBcM
kSPFVGXlT4u0a56xN2FoTPnXHb+k08035+cd+bRbTlUXKeMCVYg/k7DiJUr21IWL
hHWVOzEnzRpDa5gsQ7apct3bcRZnHO/jlWGjkl/g+AGjwaMXae0zDFjajEazsmJ0
ZKOVsZgIcSCVAdnRLzP2IyKACuiFls6Qc46eARStKRwDjQsEoUU=
=1AWK
-----END PGP SIGNATURE-----
Merge tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
- eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of
ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP
pkts"
* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: stmmac: check fwnode for phy device before scanning for phy
net: stmmac: Add queue reset into stmmac_xdp_open() function
selftests: net: rps_default_mask.sh: delete veth link specifically
net: fec: make use of MDIO C45 quirk
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
gve: Secure enough bytes in the first TX desc for all TCP pkts
netlink: annotate lockless accesses to nlk->max_recvmsg_len
ethtool: reset #lanes when lanes is omitted
ping: Fix potentail NULL deref for /proc/net/icmp.
raw: Fix NULL deref in raw_get_next().
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
net: stmmac: fix up RX flow hash indirection table when setting channels
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
...
This Kselftest fixes update for Linux 6.3-rc6 consists of one single
fix to mount_setattr_test build failure.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmQu3wYACgkQCwJExA0N
QxwKhQ//Xnnec3vQgdy4lD0nfhhVOJyRGLtwzcKOY73YJ6BuXjWwasghUVkaw0N8
qakbpY4U2gVfzf+DgZkWsAOkhY013AHJojppOLvJF0tPkeD7oAfBgotjDZisnzqv
5XABe7lSMhueCRjcFZDTLKA5MQJ/ifwA1cVpfWAo6aoKfYDHv36AAF3cWc2C831J
JUGwVnFYJLk6DctpjJOStLzZSfWv07CsFIcX52I5so0zMSfUFkgfPrhDx56rAwUJ
xs2GH/fYZJ5l2SYAoyhFviRsYd0AZWI2tGN4MY4a0XNkdlIgVh0dNPSMxoO3+sKG
hXslQ4sTifKQLazTcdx056n+zE4SXwQ4EadzhXlBv2Gs2AcZiEiG1C3JGG0pkLEQ
cr4BHQIkTRj15X6fUnd3S8AmVCpPzFXaT3Yl5ot1bXinzJzEnT3a/HHrsx9T2d9K
8KEOANAkxYhedgPC/AoR2edSHCAwFXVRak+TXu/DsfT6J86u+NqZpboWDWMWajyr
iUIX6i1mkv89JDg8+w1X8MFH2QdL050LitrDVo2tdcBuqcG9roEk/Au1CdO3j2Sl
OB5JCA8owbofY4ANklXc3c80rJKAExxRMilYSx0jj+4/F8SjVNM40BfVWOoicINU
DnHIjzoMByHN/lY1n+JbeGdyXZVQLNrp+nhtUxZuLApNsLJoPBk=
=oGdA
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"One single fix to mount_setattr_test build failure"
* tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests mount: Fix mount_setattr_test builds failed
ACPICA commit 44f1af0664599e87bebc3a1260692baa27b2f264
Similar to "Replace one-element array with flexible-array", replace the
1-element array with a proper flexible array member as defined by C99.
This allows the code to operate without tripping compile-time and run-
time bounds checkers (e.g. via __builtin_object_size(), -fsanitize=bounds,
and/or -fstrict-flex-arrays=3).
The sizeof() uses with struct acpi_nfit_flush_address and struct
acpi_nfit_smbios have been adjusted to drop the open-coded subtraction
of the trailing single element. The result is no binary differences in
.text nor .data sections.
Link: https://github.com/acpica/acpica/commit/44f1af06
Signed-off-by: Bob Moore <robert.moore@intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add unaligned descriptor test for frame size of 4001. Using an odd frame
size ensures that the end of the UMEM is not near a page boundary. This
allows testing descriptors that staddle the end of the UMEM but not a
page.
This test used to fail without the previous commit ("xsk: Fix unaligned
descriptor validation").
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Link: https://lore.kernel.org/r/20230405235920.7305-3-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
xdp-features supported by veth driver are no more static, but they
depends on veth configuration (e.g. if GRO is enabled/disabled or
TX/RX queue configuration). Take it into account in xdp_redirect
xdp-features selftest for veth driver.
Fixes: fccca038f3 ("veth: take into account device reconfiguration for xdp_features flag")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/bc35455cfbb1d4f7f52536955ded81ad47d8dc54.1680777371.git.lorenzo@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
kernel test robot reported the following warning:
All warnings (new ones prefixed by >>):
tcp_mmap.c: In function 'child_thread':
>> tcp_mmap.c:211:61: warning: 'lu' may be used uninitialized in this function [-Wmaybe-uninitialized]
211 | zc.length = min(chunk_size, FILE_SZ - lu);
We want to read FILE_SZ bytes, so the correct expression
should be (FILE_SZ - total)
Fixes: 5c5945dc69 ("selftests/net: Add SHA256 computation over data sent in tcp_mmap")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304042104.UFIuevBp-lkp@intel.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xiaoyan Li <lixiaoyan@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230405071556.1019623-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 515bddf0ec ("selftests/clone3: test clone3 with CLONE_NEWTIME")
added an additional test, so the number passed to ksft_set_plan needs to
be bumped accordingly.
Also use ksft_finished() to print results and exit. This will catch future
mismatches between ksft_set_plan() and the number of tests being run.
Fixes: 515bddf0ec ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The kernel's default behaviour is to obstruct the allocation of high
virtual address as it handles memory overcommit in a heuristic manner.
Setting the parameter as OVERCOMMIT_ALWAYS, ensures kernel isn't
susceptible to the availability of a platform's physical memory when
denying a memory allocation request.
Link: https://lkml.kernel.org/r/20230323060121.1175830-4-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Although there is a provision for 52 bit VA on arm64 platform, it remains
unutilised and higher addresses are not allocated. In order to
accommodate 4PB [2^52] virtual address space where supported,
NR_CHUNKS_HIGH is changed accordingly.
Array holding addresses is changed from static allocation to dynamic
allocation to accommodate its voluminous nature which otherwise might
overflow the stack.
Link: https://lkml.kernel.org/r/20230323060121.1175830-3-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests: Fix virtual address range for arm64", v2.
When the virtual address range selftest is run on arm64 and x86 platforms,
it is observed that both the low and high VA range iterations are skipped
when the MAP_CHUNK_SIZE is set to 16GB. The MAP_CHUNK_SIZE is changed to
1GB to resolve this issue, following which support for arm64 platform is
added by changing the NR_CHUNKS_HIGH for aarch64 to accommodate up to 4PB
of virtual address space allocation requests. Dynamic memory allocation
of array holding addresses is introduced to prevent overflow of the stack.
Finally, the overcommit_policy is set as OVERCOMMIT_ALWAYS to prevent the
kernel from denying a memory allocation request based on a platform's
physical memory availability.
This patch (of 3):
mmap() fails to allocate 16GB virtual space chunk, skipping both low and
high VA range iterations. Hence, reduce MAP_CHUNK_SIZE to 1GB and update
relevant macros as required.
Link: https://lkml.kernel.org/r/20230323060121.1175830-1-chaitanyas.prakash@arm.com
Link: https://lkml.kernel.org/r/20230323060121.1175830-2-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new PTE
to resolve a missing fault, one can install a write-protected one. This
is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination.
This was motivated by testing HugeTLB HGM [1], and in particular its
interaction with userfaultfd features. Existing userfaultfd code supports
using WP and MINOR modes together (i.e. you can register an area with
both enabled), but without this CONTINUE flag the combination is in
practice unusable.
So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing as
UFFDIO_COPY_MODE_WP, but for *minor* faults.
Update the selftest to do some very basic exercising of the new flag.
Update Documentation/ to describe how these flags are used (neither the
COPY nor the new CONTINUE versions of this mode flag were described there
before).
[1]: https://patchwork.kernel.org/project/linux-mm/cover/20230218002819.1486479-1-jthoughton@google.com/
Link: https://lkml.kernel.org/r/20230314221250.682452-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Enable it by default on the stress test, and add some smoke tests for the
pte markers on anonymous.
Link: https://lkml.kernel.org/r/20230309223711.823547-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.
Before this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
RTNETLINK answers: File exists
changing rps_default_mask in child ns don't affect the main one[ ok ]
cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
[fail] expected 1 found
changing rps_default_mask in child ns don't affect existing devices[ ok ]
After this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
changing rps_default_mask in child ns don't affect the main one[ ok ]
changing rps_default_mask in child ns affects new childns devices[ ok ]
changing rps_default_mask in child ns don't affect existing devices[ ok ]
Fixes: 3a7d84eae0 ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes. To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.
There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead. Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.
Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.
This is necessary for the RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Close the "current_clocksource" file descriptor before returning or exiting
from stable_tsc_check_supported() in vmx_nested_tsc_scaling_test.
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Vipin Sharma <vipinsh@google.com>
Link: https://lore.kernel.org/r/20230405101350.259000-1-gehao@kylinos.cn
Signed-off-by: Sean Christopherson <seanjc@google.com>
In some cases the loopback latency might be large enough, causing
the assertion on invocations to be run before ingress prog getting
executed. The assertion would fail and the test would flake.
This can be reliably reproduced by arbitrarily increasing the
loopback latency (thanks to [1]):
tc qdisc add dev lo root handle 1: htb default 12
tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
tc qdisc add dev lo parent 1:12 netem delay 100ms
Fix this by waiting on the receive end, instead of instantly
returning to the assert. The call to read() will wait for the
default SO_RCVTIMEO timeout of 3 seconds provided by
start_server().
[1] https://gist.github.com/kstevens715/4598301
Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
Fixes: 3573f38401 ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/20230405193354.1956209-1-zhuyifei@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Fix flaky STATS_RX_DROPPED test. The receiver calls getsockopt after
receiving the last (valid) packet which is not the final packet sent in
the test (valid and invalid packets are sent in alternating fashion with
the final packet being invalid). Since the last packet may or may not
have been dropped already, both outcomes must be allowed.
This issue could also be fixed by making sure the last packet sent is
valid. This alternative is left as an exercise to the reader (or the
benevolent maintainers of this file).
This problem was quite visible on certain setups. On one machine this
failure was observed 50% of the time.
Also, remove a redundant assignment of pkt_stream->nb_pkts. This field
is already initialized by __pkt_stream_alloc.
Fixes: 27e934bec3 ("selftests: xsk: make stat tests not spin on getsockopt")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230403120400.31018-1-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This change fixes flakiness in the BIDIRECTIONAL test:
# [is_pkt_valid] expected length [60], got length [90]
not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL
When IPv6 is enabled, the interface will periodically send MLDv1 and
MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail
since it uses VETH0 for RX.
For other tests, this was not a problem since they only receive on VETH1
and IPv6 was already disabled on VETH0.
Fixes: a89052572e ("selftests/bpf: Xsk selftests framework")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Link: https://lore.kernel.org/r/20230405082905.6303-1-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add basic support to run SH under QEMU via kunit_tool using the
virtualized r2d platform.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
On some platforms, the console is not the first serial port. To make
this work, the first serial port in QEMU must be set to "null".
Add support for this by adding an optional "serial" parameter, which
defaults to "stdio", and can be overridden by platform-specific
configuration.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add test case to testapp_invalid_desc for valid packets at the end of
the UMEM.
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230403145047.33065-3-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Avoid UMEM_SIZE macro in testapp_invalid_desc which is incorrect when
the frame size is not XSK_UMEM__DEFAULT_FRAME_SIZE. Also remove the
macro since it's no longer being used.
Fixes: 909f0e2820 ("selftests: xsk: Add tests for 2K frame size")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230403145047.33065-2-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Now that all references to CONFIG_SRCU have been removed, it is time to
remove CONFIG_SRCU itself.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Add a selftest for the SMCCC filter, ensuring basic UAPI constraints
(e.g. reserved ranges, non-overlapping ranges) are upheld. Additionally,
test that the DENIED and FWD_TO_USER work as intended.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230404154050.2270077-14-oliver.upton@linux.dev
Build a helper for doing SMCs in selftests by macro-izing the current
HVC implementation and taking the conduit instruction as an argument.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230404154050.2270077-13-oliver.upton@linux.dev
bpf_testmod.ko sometimes fails to build from a clean checkout:
BTF [M] linux/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.ko
/bin/sh: 1: linux-build//tools/build/resolve_btfids/resolve_btfids: not found
The reason is that RESOLVE_BTFIDS may not yet be built. Fix by adding a
dependency.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230403172935.1553022-1-iii@linux.ibm.com
The following selftest patch requires both the bug fixes and the
improvements of the selftest framework.
* iommufd/for-rc:
iommufd: Do not corrupt the pfn list when doing batch carry
iommufd: Fix unpinning of pages when an access is present
iommufd: Check for uptr overflow
Linux 6.3-rc5
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This updates expected return values for invalid buffer test. Now such
values are returned from transport, not from af_vsock.c.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 22df776a9a ("tasks: Extract rcu_users out of union"), the
'refcount_t rcu_users' field was extracted out of a union with the
'struct rcu_head rcu' field. This allows us to safely perform a
refcount_inc_not_zero() on task->rcu_users when acquiring a reference on
a task struct. A prior patch leveraged this by making struct task_struct
an RCU-protected object in the verifier, and by bpf_task_acquire() to
use the task->rcu_users field for synchronization.
Now that we can use RCU to protect tasks, we no longer need
bpf_task_kptr_get(), or bpf_task_acquire_not_zero(). bpf_task_kptr_get()
is truly completely unnecessary, as we can just use RCU to get the
object. bpf_task_acquire_not_zero() is now equivalent to
bpf_task_acquire().
In addition to these changes, this patch also updates the associated
selftests to no longer use these kfuncs.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230331195733.699708-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
struct task_struct objects are a bit interesting in terms of how their
lifetime is protected by refcounts. task structs have two refcount
fields:
1. refcount_t usage: Protects the memory backing the task struct. When
this refcount drops to 0, the task is immediately freed, without
waiting for an RCU grace period to elapse. This is the field that
most callers in the kernel currently use to ensure that a task
remains valid while it's being referenced, and is what's currently
tracked with bpf_task_acquire() and bpf_task_release().
2. refcount_t rcu_users: A refcount field which, when it drops to 0,
schedules an RCU callback that drops a reference held on the 'usage'
field above (which is acquired when the task is first created). This
field therefore provides a form of RCU protection on the task by
ensuring that at least one 'usage' refcount will be held until an RCU
grace period has elapsed. The qualifier "a form of" is important
here, as a task can remain valid after task->rcu_users has dropped to
0 and the subsequent RCU gp has elapsed.
In terms of BPF, we want to use task->rcu_users to protect tasks that
function as referenced kptrs, and to allow tasks stored as referenced
kptrs in maps to be accessed with RCU protection.
Let's first determine whether we can safely use task->rcu_users to
protect tasks stored in maps. All of the bpf_task* kfuncs can only be
called from tracepoint, struct_ops, or BPF_PROG_TYPE_SCHED_CLS, program
types. For tracepoint and struct_ops programs, the struct task_struct
passed to a program handler will always be trusted, so it will always be
safe to call bpf_task_acquire() with any task passed to a program.
Note, however, that we must update bpf_task_acquire() to be KF_RET_NULL,
as it is possible that the task has exited by the time the program is
invoked, even if the pointer is still currently valid because the main
kernel holds a task->usage refcount. For BPF_PROG_TYPE_SCHED_CLS, tasks
should never be passed as an argument to the any program handlers, so it
should not be relevant.
The second question is whether it's safe to use RCU to access a task
that was acquired with bpf_task_acquire(), and stored in a map. Because
bpf_task_acquire() now uses task->rcu_users, it follows that if the task
is present in the map, that it must have had at least one
task->rcu_users refcount by the time the current RCU cs was started.
Therefore, it's safe to access that task until the end of the current
RCU cs.
With all that said, this patch makes struct task_struct is an
RCU-protected object. In doing so, we also change bpf_task_acquire() to
be KF_ACQUIRE | KF_RCU | KF_RET_NULL, and adjust any selftests as
necessary. A subsequent patch will remove bpf_task_kptr_get(), and
bpf_task_acquire_not_zero() respectively.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230331195733.699708-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix few potentially unitialized variables uses, found while building
veristat.c in release (-O2) mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Drop linux/compiler.h include, which seems to be needed for ARRAY_SIZE
macro only. Redefine own version of ARRAY_SIZE instead.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For packaging version of the tool is important, so add a simple way to
specify veristat version for upstream mirror at Github.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Dual-license veristat.c to dual GPL-2.0-only or BSD-2-Clause license.
This is needed to mirror it to Github to make it convenient for distro
packagers to package veristat as a separate package.
Veristat grew into a useful tool by itself, and there are already
a bunch of users relying on veristat as generic BPF loading and
verification helper tool. So making it easy to packagers by providing
Github mirror just like we do for bpftool and libbpf is the next step to
get veristat into the hands of users.
Apart from few typo fixes, I'm the sole contributor to veristat.c so
far, so no extra Acks should be needed for relicensing.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The fork function in gcc is considered a built in function due to
being used by libgcov when building with gnu extensions.
Rename fork to sched_process_fork to prevent this conflict.
See details:
d1c3882392https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82457
Fixes the following error:
In file included from progs/bench_local_storage_create.c:6:
progs/bench_local_storage_create.c:43:14: error: conflicting types for
built-in function 'fork'; expected 'int(void)'
[-Werror=builtin-declaration-mismatch]
43 | int BPF_PROG(fork, struct task_struct *parent, struct
task_struct *child)
| ^~~~
Fixes: cbe9d93d58 ("selftests/bpf: Add bench for task storage creation")
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230331075848.1642814-1-james.hilliard1@gmail.com
Replacing extract_build_id with read_build_id that parses out
build id directly from elf without using readelf tool.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230331093157.1749137-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding read_build_id function that parses out build id from
specified binary.
It will replace extract_build_id and also be used in following
changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230331093157.1749137-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Moving error macros from profiler.inc.h to new err.h header.
It will be used in following changes.
Also adding PTR_ERR macro that will be used in following changes.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230331093157.1749137-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add a selftest that configures metadata tunnel encapsulation using the TC
"tunnel_key" action: it includes a test case for setting "nofrag" flag.
Example output:
# selftests: net/forwarding: tc_tunnel_key.sh
# TEST: tunnel_key nofrag (skip_hw) [ OK ]
# INFO: Could not test offloaded functionality
ok 1 selftests: net/forwarding: tc_tunnel_key.sh
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
currently, users can skip individual test cases by means of writing
"skip": "yes"
in the scenario file. Extend this functionality, introducing 'dependsOn':
it's optional property like "skip", but the value contains a command (for
example, a probe on iproute2 to check if it supports a specific feature).
If such property is present, tdc executes that command and skips the test
when the return value is non-zero.
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This selftest was introduced recently in the commit cited below. It misses
several check_err() invocations to actually verify that the previous
command succeeded. When these are added, the first one fails, because
besides the addresses added by hand, there can be a link-local address
added by the kernel. Adjust the check to expect at least three addresses
instead of exactly three, and add the missing check_err's.
Furthermore, the explanatory comments assume that the address with no
protocol is $addr2, when in fact it is $addr3. Update the comments.
Fixes: 6a414fd77f ("selftests: rtnetlink: Add an address proto test")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/53a579bc883e1bf2fe490d58427cf22c2d1aa21f.1680102695.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Conflicts:
drivers/net/ethernet/mediatek/mtk_ppe.c
3fbe4d8c0e ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
924531326e ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The second argument of the bpf_kptr_xchg() helper function is
ARG_PTR_TO_BTF_ID_OR_NULL. A recent patch fixed a bug whereby the
verifier would fail with an internal error message if a program invoked
the helper with a PTR_TO_BTF_ID | PTR_MAYBE_NULL register. This testcase
adds some testcases to ensure that it fails gracefully moving forward.
Before the fix, these testcases would have failed an error resembling
the following:
; p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
99: (7b) *(u64 *)(r10 -16) = r7 ; frame1: ...
100: (bf) r1 = r10 ; frame1: ...
101: (07) r1 += -16 ; frame1: ...
; p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
102: (85) call bpf_kfunc_call_test_acquire#13908
; frame1: R0_w=ptr_or_null_prog_test_ref_kfunc...
; p = bpf_kptr_xchg(&v->ref_ptr, p);
103: (bf) r1 = r6 ; frame1: ...
104: (bf) r2 = r0
; frame1: R0_w=ptr_or_null_prog_test_ref_kfunc...
105: (85) call bpf_kptr_xchg#194
verifier internal error: invalid PTR_TO_BTF_ID register for type match
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230330145203.80506-2-void@manifault.com
Current release - regressions:
- phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement
Current release - new code bugs:
- eth: wangxun: fix vector length of interrupt cause
- vsock/loopback: consistently protect the packet queue with
sk_buff_head.lock
- virtio/vsock: fix header length on skb merging
- wpan: ca8210: fix unsigned mac_len comparison with zero
Previous releases - regressions:
- eth: stmmac: don't reject VLANs when IFF_PROMISC is set
- eth: smsc911x: avoid PHY being resumed when interface is not up
- eth: mtk_eth_soc: fix tx throughput regression with direct 1G links
- eth: bnx2x: use the right build_skb() helper after core rework
- wwan: iosm: fix 7560 modem crash on use on unsupported channel
Previous releases - always broken:
- eth: sfc: don't overwrite offload features at NIC reset
- eth: r8169: fix RTL8168H and RTL8107E rx crc error
- can: j1939: prevent deadlock by moving j1939_sk_errqueue()
- virt: vmxnet3: use GRO callback when UPT is enabled
- virt: xen: don't do grant copy across page boundary
- phy: dp83869: fix default value for tx-/rx-internal-delay
- dsa: ksz8: fix multiple issues with ksz8_fdb_dump
- eth: mvpp2: fix classification/RSS of VLAN and fragmented packets
- eth: mtk_eth_soc: fix flow block refcounting logic
Misc:
- constify fwnode pointers in SFP handling
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQl6Z8ACgkQMUZtbf5S
Irsodw//SxeZ16kZHdSuLsUd1bWWPyANsNG4UzKsS912sD8yErzvy3OAiRvTWXxH
A7t6QCFZeCHOHLXVuaHXqdyrWcC1eJdCaSnJiDffCwS2oeP9d4IBs3pyLJaeQSXJ
0bep4EcFpxPwCpzFCYNbShvKBLi8vRCX2ZjNii76eMAc0bZ/HvY0rCvULsJ3cwOo
cDdsL+lbPSI6suMm5ekm8Hdt7NuSB5jxNFlj2ene4pV7DqHC/a8UErak4fOKbX4h
QguePqawC22ZurzXeWvstgcNlJPoPLpcYyXGxSU8qirhbkn1WmkXR11TNEbYE4UD
YvJlKIyYOxjN36/xBiJ8LpXV+BQBpfZEV/LSaWByhclVS4c2bY0KkSRfZKOBY82K
ejBbMJiGaAyA86QUFkkWxC+zfHiFIywy2nOYCKBsDhW17krBOu3vaFS/jNVC7Ef4
ilk4tUXTYs3ojBUeROnuI1NcR2o0wNeCob/0U/bMMdn51khee70G/muHYTGr/NWA
JAZIT/3bOWr5syarpnvtb/PtLlWcoClN022W4iExgFmzbMlQGyD46g+XhsvoXAYO
wpo/js0/J+kVfuVsTSQT5MmsVjnzs1r9Y+wNJUM6dE35Z4iIXc1NJhmAZ7nJz0+P
ryn8rdnIgCbqzX3DeJC0i9uv0alwLXA/xGCFRV3xflfV/nvxiR0=
=/1Gi
-----END PGP SIGNATURE-----
Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and WPAN.
Still quite a few bugs from this release. This pull is a bit smaller
because major subtrees went into the previous one. Or maybe people
took spring break off?
Current release - regressions:
- phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement
Current release - new code bugs:
- eth: wangxun: fix vector length of interrupt cause
- vsock/loopback: consistently protect the packet queue with
sk_buff_head.lock
- virtio/vsock: fix header length on skb merging
- wpan: ca8210: fix unsigned mac_len comparison with zero
Previous releases - regressions:
- eth: stmmac: don't reject VLANs when IFF_PROMISC is set
- eth: smsc911x: avoid PHY being resumed when interface is not up
- eth: mtk_eth_soc: fix tx throughput regression with direct 1G links
- eth: bnx2x: use the right build_skb() helper after core rework
- wwan: iosm: fix 7560 modem crash on use on unsupported channel
Previous releases - always broken:
- eth: sfc: don't overwrite offload features at NIC reset
- eth: r8169: fix RTL8168H and RTL8107E rx crc error
- can: j1939: prevent deadlock by moving j1939_sk_errqueue()
- virt: vmxnet3: use GRO callback when UPT is enabled
- virt: xen: don't do grant copy across page boundary
- phy: dp83869: fix default value for tx-/rx-internal-delay
- dsa: ksz8: fix multiple issues with ksz8_fdb_dump
- eth: mvpp2: fix classification/RSS of VLAN and fragmented packets
- eth: mtk_eth_soc: fix flow block refcounting logic
Misc:
- constify fwnode pointers in SFP handling"
* tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
net: ethernet: mtk_eth_soc: fix flow block refcounting logic
net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
net: dsa: sync unicast and multicast addresses for VLAN filters too
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
xen/netback: use same error messages for same errors
test/vsock: new skbuff appending test
virtio/vsock: WARN_ONCE() for invalid state of socket
virtio/vsock: fix header length on skb merging
bnxt_en: Add missing 200G link speed reporting
bnxt_en: Fix typo in PCI id to device description string mapping
bnxt_en: Fix reporting of test result in ethtool selftest
i40e: fix registers dump after run ethtool adapter self test
bnx2x: use the right build_skb() helper
net: ipa: compute DMA pool size properly
net: wwan: iosm: fixes 7560 modem crash
net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ice: add profile conflict check for AVF FDIR
...
SCHED_CLS seems to be a better option as a default guess for freplace
programs that have __sk_buff as a context type.
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230330190115.3942962-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
All otherwise unspecified aarch64 ID registers should be read as zero so
we cover the whole ID register space in the get-reg-list test but we've
added comments for those that have been named. Add comments for
ID_AA64PFR2_EL1, ID_AA64SMFR0_EL1, ID_AA64ISAR2_EL1, ID_AA64MMFR3_EL1
and ID_AA64MMFR4_EL1 which have been defined since the comments were
added so someone looking for them will see that they are covered.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230210-kvm-arm64-getreg-comments-v1-1-a16c73be5ab4@kernel.org
Bits [51:48] of the pgd address are stored at bits [5:2] of ttbr0_el1.
page_table_test stores its page tables at the far end of IPA space so
was tripping over this when run on a system that supports FEAT_LPA (or
FEAT_LPA2).
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230308110948.1820163-4-ryan.roberts@arm.com
The high bits [51:48] of a physical address should appear at [15:12] in
a 64K pte, not at [51:48] as was previously being programmed. Fix this
with new helper functions that do the conversion correctly. This also
sets us up nicely for adding LPA2 encodings in future.
Fixes: 7a6629ef74 ("kvm: selftests: add virt mem support for aarch64")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230308110948.1820163-3-ryan.roberts@arm.com
access_tracking_perf_test requires CONFIG_IDLE_PAGE_TRACKING. However
this is missing from the config fragment, so add it in so that this test
is no longer skipped.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230308110948.1820163-2-ryan.roberts@arm.com
Make sure the timer test can properly handle a spurious timer
interrupt, something that is far from being unlikely.
This involves checking for the GIC IAR return value (don't bother
handling the interrupt if it was spurious) as well as the timer
control register (don't do anything if the interrupt is masked
or the timer disabled). Take this opportunity to rewrite the
timer handler in a more readable way.
This solves a bunch of failures that creep up on systems that
are slow to retire the interrupt, something that the GIC architecture
makes no guarantee about.
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-20-maz@kernel.org
Now that KVM exposes CNTPCT_EL0, CNTP_CTL_EL0 and CNT_CVAL_EL0 to
userspace, add them to the get-reg-list selftest.
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-19-maz@kernel.org
dd logs info to stderr by default. This info is pointless in the
selftests and makes legitimate issues harder to spot.
Pass the option to silence the info logs. Actual errors would still be
printed.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230228000709.124727-4-bgray@linux.ibm.com
Make supports passing the 'jobserver' (parallel make support) to child
invocations of make when either
1. The target command uses $(MAKE) directly
2. The command starts with '+'
This context is not passed through expansions that result in $(MAKE), so
the macros used in several places fail to pass on the jobserver context.
Warnings are also raised by the child mentioning this.
Prepend macros lines that invoke $(MAKE) with '+' to allow passing the
jobserver context to these children.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230228000709.124727-3-bgray@linux.ibm.com
The CLEAN macro was added in 337f1e36 to prevent the
Makefile:50: warning: overriding recipe for target 'clean'
../../lib.mk:124: warning: ignoring old recipe for target 'clean'
style warnings. Expand it's use to fix another case of redefining a
target directly.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230228000709.124727-2-bgray@linux.ibm.com
This adds test which checks case when data of newly received skbuff is
appended to the last skbuff in the socket's queue. It looks like simple
test with 'send()' and 'recv()', but internally it triggers logic which
appends one received skbuff to another. Test checks that this feature
works correctly.
This test is actual only for virtio transport.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The two infinite loops in bound check cases added by commit
1a3148fc17 ("selftests/bpf: Check when bounds are not in the 32-bit range")
increased the execution time of test_verifier from about 6 seconds to
about 9 seconds. Rewrite these two infinite loops to finite loops to get
rid of this extra time cost.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20230329011048.1721937-1-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
SEC("freplace") (i.e., BPF_PROG_TYPE_EXT) programs are not loadable as
is through veristat, as kernel expects actual program's FD during
BPF_PROG_LOAD time, which veristat has no way of knowing.
Unfortunately, freplace programs are a pretty important class of
programs, especially when dealing with XDP chaining solutions, which
rely on EXT programs.
So let's do our best and teach veristat to try to guess the original
program type, based on program's context argument type. And if guessing
process succeeds, we manually override freplace/EXT with guessed program
type using bpf_program__set_type() setter to increase chances of proper
BPF verification.
We rely on BTF and maintain a simple lookup table. This process is
obviously not 100% bulletproof, as valid program might not use context
and thus wouldn't have to specify correct type. Also, __sk_buff is very
ambiguous and is the context type across many different program types.
We pick BPF_PROG_TYPE_CGROUP_SKB for now, which seems to work fine in
practice so far. Similarly, some program types require specifying attach
type, and so we pick one out of possible few variants.
Best effort at its best. But this makes veristat even more widely
applicable.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230327185202.1929145-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If user explicitly overrides programs's type with
bpf_program__set_type() API call, we need to disassociate whatever
SEC_DEF handler libbpf determined initially based on program's SEC()
definition, as it's not goind to be valid anymore and could lead to
crashes and/or confusing failures.
Also, fix up bpf_prog_test_load() helper in selftests/bpf, which is
force-setting program type (even if that's completely unnecessary; this
is quite a legacy piece of code), and thus should expect auto-attach to
not work, yet one of the tests explicitly relies on auto-attach for
testing.
Instead, force-set program type only if it differs from the desired one.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230327185202.1929145-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test whether a TCP CC implemented in BPF is allowed to write
app_limited in struct tcp_sock. This is already allowed for
the built-in TCP CC.
Signed-off-by: Yixin Shen <bobankhshen@gmail.com>
Link: https://lore.kernel.org/r/20230329073558.8136-3-bobankhshen@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch makes the following minor updates to the cpuset partition
testing script test_cpuset_prs.sh.
- Remove online_cpus function call as it will be called anyway on exit
in cleanup.
- Make the enabling of sched/verbose debugfs flag conditional on the
"-v" verbose option and set DELAY_FACTOR to 2 in this case as cpuset
partition operations are likely to be slowed down by enabling that.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add ABI specific self-test to ensure enablements work in various
scenarios such as fork, VM_CLONE, and basic event enable/disable.
Ensure ABI contracts/limits are also being upheld, such as bit limits
and data size limits.
Link: https://lkml.kernel.org/r/20230328235219.203-8-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
ABI has been changed to remote writes, update existing test cases to use
this new ABI to ensure existing functionality continues to work.
Link: https://lkml.kernel.org/r/20230328235219.203-7-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>