Replace is_intel/amd_cpu helpers with this_cpu_* helpers to better
convey the intent of querying vendor of the current cpu.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20230111004445.416840-2-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The assert incorrectly identifies the ioctl being called. Switch it
from KVM_GET_MSRS to KVM_SET_MSRS.
Fixes: 6ebfef83f0 ("KVM: selftest: Add proper helpers for x86-specific save/restore ioctls")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221209201326.2781950-1-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
kvm_vm_elf_load() and elfhdr_get() open one file each, but they
never close the opened file descriptor. If a test repeatedly
creates and destroys a VM with __vm_create(), which
(directly or indirectly) calls those two functions, the test
might end up getting a open failure with EMFILE.
Fix those two functions to close the file descriptor.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221220170921.2499209-2-reijiw@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add testing to show that a pmu event can be filtered with a generalized
match on it's unit mask.
These tests set up test cases to demonstrate various ways of filtering
a pmu event that has multiple unit mask values. It does this by
setting up the filter in KVM with the masked events provided, then
enabling three pmu counters in the guest. The test then verifies that
the pmu counters agree with which counters should be counting and which
counters should be filtered for both a sparse filter list and a dense
filter list.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-8-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Test that masked events are not using invalid bits, and if they are,
ensure the pmu event filter is not accepted by KVM_SET_PMU_EVENT_FILTER.
The only valid bits that can be used for masked events are set when
using KVM_PMU_ENCODE_MASKED_ENTRY() with one exception: If any of the
high bits (35:32) of the event select are set when using Intel, the pmu
event filter will fail.
Also, because validation was not being done prior to the introduction
of masked events, only expect validation to fail when masked events
are used. E.g. in the first test a filter event with all its bits set
is accepted by KVM_SET_PMU_EVENT_FILTER when flags = 0.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-7-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Now that the flags field can be non-zero, pass it in when creating a
pmu event filter.
This is needed in preparation for testing masked events.
No functional change intended.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-6-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
ARM:
* Fix the PMCR_EL0 reset value after the PMU rework
* Correctly handle S2 fault triggered by a S1 page table walk
by not always classifying it as a write, as this breaks on
R/O memslots
* Document why we cannot exit with KVM_EXIT_MMIO when taking
a write fault from a S1 PTW on a R/O memslot
* Put the Apple M2 on the naughty list for not being able to
correctly implement the vgic SEIS feature, just like the M1
before it
* Reviewer updates: Alex is stepping down, replaced by Zenghui
x86:
* Fix various rare locking issues in Xen emulation and teach lockdep
to detect them
* Documentation improvements
* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
William reports kernel soft-lockups on some OVS topologies when TC mirred
egress->ingress action is hit by local TCP traffic [1].
The same can also be reproduced with SCTP (thanks Xin for verifying), when
client and server reach themselves through mirred egress to ingress, and
one of the two peers sends a "heartbeat" packet (from within a timer).
Enqueueing to backlog proved to fix this soft lockup; however, as Cong
noticed [2], we should preserve - when possible - the current mirred
behavior that counts as "overlimits" any eventual packet drop subsequent to
the mirred forwarding action [3]. A compromise solution might use the
backlog only when tcf_mirred_act() has a nest level greater than one:
change tcf_mirred_forward() accordingly.
Also, add a kselftest that can reproduce the lockup and verifies TC mirred
ability to account for further packet drops after TC mirred egress->ingress
(when the nest level is 1).
[1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
[3] such behavior is not guaranteed: for example, if RPS or skb RX
timestamping is enabled on the mirred target device, the kernel
can defer receiving the skb and return NET_RX_SUCCESS inside
tcf_mirred_forward().
Reported-by: William Zhao <wizhao@redhat.com>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In commit 72653ae530 ("selftests: net: tcp_mmap:
Use huge pages in send path") I made a change to use hugepages
for the buffer used by the client (tx path)
Today, I understood that the cause for poor zerocopy
performance was that after a mmap() for a 512KB memory
zone, kernel uses a single zeropage, mapped 128 times.
This was really the reason for poor tx path performance
in zero copy mode, because this zero page refcount is
under high pressure, especially when TCP ACK packets
are processed on another cpu.
We need either to force a COW on all the memory range,
or use MAP_POPULATE so that a zero page is not abused.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230120181136.3764521-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Turns out splice() is one of the syscalls that's using current maximum
number of arguments (six). This is perfect for testing, so extend
bpf_syscall_macro selftest to also trace splice() syscall, using
BPF_KSYSCALL() macro. This makes sure all the syscall argument register
definitions are correct.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-25-andrii@kernel.org
Update uprobe_autoattach selftest to validate architecture-specific
argument passing through registers. Use new BPF_UPROBE and
BPF_URETPROBE, and construct both BPF-side and user-space side in such
a way that for different architectures we are fetching and checking
different number of arguments, matching architecture-specific limit of
how many registers are available for argument passing.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-12-andrii@kernel.org
In commit 537c3f66ea ("selftests/bpf: add generic BPF program tester-loader"),
a new mechanism was added to the BPF selftest framework to allow testsuites to
use macros to define expected failing testcases.
This allows any testsuite which tests verification failure to remove a good
amount of boilerplate code. This patch updates the task_kfunc selftest suite
to use these new macros.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230120021844.3048244-1-void@manifault.com
The test_cmd_destroy_access() should end with a semicolon, so add one.
There is a test_ioctl_destroy(ioas_id) following already, so drop one.
Fixes: 57f0988706 ("iommufd: Add a selftest")
Link: https://lore.kernel.org/r/20230120074204.1368-1-nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
To be used for verification of driver implementations. Note that
the skb path is gone from the series, but I'm still keeping the
implementation for any possible future work.
$ xdp_hw_metadata <ifname>
On the other machine:
$ echo -n xdp | nc -u -q1 <target> 9091 # for AF_XDP
$ echo -n skb | nc -u -q1 <target> 9092 # for skb
Sample output:
# xdp
xsk_ring_cons__peek: 1
0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
rx_timestamp_supported: 1
rx_timestamp: 1667850075063948829
0x19f9090: complete idx=8 addr=8000
# skb
found skb hwtstamp = 1668314052.854274681
Decoding:
# xdp
rx_timestamp=1667850075.063948829
$ date -d @1667850075
Mon Nov 7 11:41:15 AM PST 2022
$ date
Mon Nov 7 11:42:05 AM PST 2022
# skb
$ date -d @1668314052
Sat Nov 12 08:34:12 PM PST 2022
$ date
Sat Nov 12 08:37:06 PM PST 2022
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-18-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
- create new netns
- create veth pair (veTX+veRX)
- setup AF_XDP socket for both interfaces
- attach bpf to veRX
- send packet via veTX
- verify the packet has expected metadata at veRX
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-12-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Generic check has a different error message, update the selftest.
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-7-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
reclaim_period_ms used to be positive only but the commit 0001725d0f
("KVM: selftests: Add atoi_positive() and atoi_non_negative() for input
validation") incorrectly changed it to non-negative validation.
Change validation to allow only positive input.
Fixes: 0001725d0f ("KVM: selftests: Add atoi_positive() and atoi_non_negative() for input validation")
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reported-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230111183408.104491-1-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Placing a declaration of evt_reset is pedantically invalid
according to the C standard. While GCC does not really care
and only warns with -Wpedantic, clang ignores the declaration
altogether with an error:
x86_64/xen_shinfo_test.c:965:2: error: expected expression
struct kvm_xen_hvm_attr evt_reset = {
^
x86_64/xen_shinfo_test.c:969:38: error: use of undeclared identifier evt_reset
vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset);
^
Reported-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reported-by: Sean Christopherson <seanjc@google.com>
Fixes: a79b53aaaa ("KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET", 2022-12-28)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
First test that we allow overwriting dynptr slots and reinitializing
them in unreferenced case, and disallow overwriting for referenced case.
Include tests to ensure slices obtained from destroyed dynptrs are being
invalidated on their destruction. The destruction needs to be scoped, as
in slices of dynptr A should not be invalidated when dynptr B is
destroyed. Next, test that MEM_UNINIT doesn't allow writing dynptr stack
slots.
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-13-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Try creating a dynptr, then overwriting second slot with first slot of
another dynptr. Then, the first slot of first dynptr should also be
invalidated, but without our fix that does not happen. As a consequence,
the unfixed case allows passing first dynptr (as the kernel check only
checks for slot_type and then first_slot == true).
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-12-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ensure that variable offset is handled correctly, and verifier takes
both fixed and variable part into account. Also ensures that only
constant var_off is allowed.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add verifier tests that verify the new pruning behavior for STACK_DYNPTR
slots, and ensure that state equivalence takes into account changes to
the old and current verifier state correctly. Also ensure that the
stacksafe changes are actually enabling pruning in case states are
equivalent from pruning PoV.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The previous commit implemented destroy_if_dynptr_stack_slot. It
destroys the dynptr which given spi belongs to, but still doesn't
invalidate the slices that belong to such a dynptr. While for the case
of referenced dynptr, we don't allow their overwrite and return an error
early, we still allow it and destroy the dynptr for unreferenced dynptr.
To be able to enable precise and scoped invalidation of dynptr slices in
this case, we must be able to associate the source dynptr of slices that
have been obtained using bpf_dynptr_data. When doing destruction, only
slices belonging to the dynptr being destructed should be invalidated,
and nothing else. Currently, dynptr slices belonging to different
dynptrs are indistinguishible.
Hence, allocate a unique id to each dynptr (CONST_PTR_TO_DYNPTR and
those on stack). This will be stored as part of reg->id. Whenever using
bpf_dynptr_data, transfer this unique dynptr id to the returned
PTR_TO_MEM_OR_NULL slice pointer, and store it in a new per-PTR_TO_MEM
dynptr_id register state member.
Finally, after establishing such a relationship between dynptrs and
their slices, implement precise invalidation logic that only invalidates
slices belong to the destroyed dynptr in destroy_if_dynptr_stack_slot.
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, while reads are disallowed for dynptr stack slots, writes are
not. Reads don't work from both direct access and helpers, while writes
do work in both cases, but have the effect of overwriting the slot_type.
While this is fine, handling for a few edge cases is missing. Firstly,
a user can overwrite the stack slots of dynptr partially.
Consider the following layout:
spi: [d][d][?]
2 1 0
First slot is at spi 2, second at spi 1.
Now, do a write of 1 to 8 bytes for spi 1.
This will essentially either write STACK_MISC for all slot_types or
STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
of zeroes). The end result is that slot is scrubbed.
Now, the layout is:
spi: [d][m][?]
2 1 0
Suppose if user initializes spi = 1 as dynptr.
We get:
spi: [d][d][d]
2 1 0
But this time, both spi 2 and spi 1 have first_slot = true.
Now, when passing spi 2 to dynptr helper, it will consider it as
initialized as it does not check whether second slot has first_slot ==
false. And spi 1 should already work as normal.
This effectively replaced size + offset of first dynptr, hence allowing
invalid OOB reads and writes.
Make a few changes to protect against this:
When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
STACK_DYNPTR type, mark both first and second slot (regardless of which
slot we touch) as STACK_INVALID. Reads are already prevented.
Second, prevent writing to stack memory from helpers if the range may
contain any STACK_DYNPTR slots. Reads are already prevented.
For helpers, we cannot allow it to destroy dynptrs from the writes as
depending on arguments, helper may take uninit_mem and dynptr both at
the same time. This would mean that helper may write to uninit_mem
before it reads the dynptr, which would be bad.
PTR_TO_MEM: [?????dd]
Depending on the code inside the helper, it may end up overwriting the
dynptr contents first and then read those as the dynptr argument.
Verifier would only simulate destruction when it does byte by byte
access simulation in check_helper_call for meta.access_size, and
fail to catch this case, as it happens after argument checks.
The same would need to be done for any other non-trivial objects created
on the stack in the future, such as bpf_list_head on stack, or
bpf_rb_root on stack.
A common misunderstanding in the current code is that MEM_UNINIT means
writes, but note that writes may also be performed even without
MEM_UNINIT in case of helpers, in that case the code after handling meta
&& meta->raw_mode will complain when it sees STACK_DYNPTR. So that
invalid read case also covers writes to potential STACK_DYNPTR slots.
The only loophole was in case of meta->raw_mode which simulated writes
through instructions which could overwrite them.
A future series sequenced after this will focus on the clean up of
helper access checks and bugs around that.
Fixes: 97e03f5210 ("bpf: Add verifier support for dynptrs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, the dynptr function is not checking the variable offset part
of PTR_TO_STACK that it needs to check. The fixed offset is considered
when computing the stack pointer index, but if the variable offset was
not a constant (such that it could not be accumulated in reg->off), we
will end up a discrepency where runtime pointer does not point to the
actual stack slot we mark as STACK_DYNPTR.
It is impossible to precisely track dynptr state when variable offset is
not constant, hence, just like bpf_timer, kptr, bpf_spin_lock, etc.
simply reject the case where reg->var_off is not constant. Then,
consider both reg->off and reg->var_off.value when computing the stack
pointer index.
A new helper dynptr_get_spi is introduced to hide over these details
since the dynptr needs to be located in multiple places outside the
process_dynptr_func checks, hence once we know it's a PTR_TO_STACK, we
need to enforce these checks in all places.
Note that it is disallowed for unprivileged users to have a non-constant
var_off, so this problem should only be possible to trigger from
programs having CAP_PERFMON. However, its effects can vary.
Without the fix, it is possible to replace the contents of the dynptr
arbitrarily by making verifier mark different stack slots than actual
location and then doing writes to the actual stack address of dynptr at
runtime.
Fixes: 97e03f5210 ("bpf: Add verifier support for dynptrs")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This Kselftest fixes update for Linux 6.2-rc5 consists of a single
fix address error seen during unconfigured LLVM builds.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmPKv0MACgkQCwJExA0N
QxyVzg//SDB26SRVRIClssnq66PUlT6/MDBSRIO1fyzsurgvpZuB0As2sMBN8JB9
lfmMcG/bR61FYpF9aUAnObz9HhV9Kil/oEik63sTZkE+3LjWbOyePQqykjtqk6/z
Rk1Y4ZDrwVztR8XiwzlOEuIyeR/tdnkwJlx4WE0VI+NcsAKzI5012guYbr59tnN0
6eKm1ZjCy5wLSyiEVEYgCMgX7mJEvWYiqSQl/kGdffuIu+4QSJYnEXY5HMMnYIiX
HasoIBTnAC1yNdeLEc9rQF5WETzOO5PZdHypkgKwkwAzxPcM2VsCLYwbItgJqTj6
qhoJSrXlwQ8lVrnS+GXR5rAInObhWmN4WSTq4sX0WJWE8gSbYH03ieMAkJRDPmWE
jb/WolrPuSIUSkuGRN6cTBM0I08BeNKKPMh26F1RXf0/eDa51FjaOUqFOJohSi9k
JUprlni/tgLygj+A6GUVP4gjd1K6jzLEFXrbBPFztn1lxiuRkdW+XqpoExDB9xtd
D88K+wvthc5sYYfOxwkDLftDVgjjRSpQ4XiZ2DHxyAQWy3dmRDA1xVjhDW+Qj2aA
8lWRxDAk5Jaqz8E1DFh/pcwHKnsY12Lt0s0iJ4Vq1H7FFcoXiPa4TfzTMOF4xuXa
OZRPzfrbrURd/Lvwrr4uCGbCI+J3Ky/Exz27VDocX+OTLJDMtTY=
=O4Lm
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
"Fix an error seen during unconfigured LLVM builds"
* tag 'linux-kselftest-fixes-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: Fix error message for unconfigured LLVM builds
Current release - regressions:
- Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
addrconf", fix nsna_ping mode of team
- wifi: mt76: fix bugs in Rx queue handling and DMA mapping
- eth: mlx5:
- add missing mutex_unlock in error reporter
- protect global IPsec ASO with a lock
Current release - new code bugs:
- rxrpc: fix wrong error return in rxrpc_connect_call()
Previous releases - regressions:
- bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
- wifi:
- mac80211: fix crashes on Rx due to incorrect initialization of
rx->link and rx->link_sta
- mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
aggregation handling, crashes
- brcmfmac: fix regression for Broadcom PCIe wifi devices
- rndis_wlan: prevent buffer overflow in rndis_query_oid
- netfilter: conntrack: handle tcp challenge acks during connection
reuse
- sched: avoid grafting on htb_destroy_class_offload when destroying
- virtio-net: correctly enable callback during start_xmit, fix stalls
- tcp: avoid the lookup process failing to get sk in ehash table
- ipa: disable ipa interrupt during suspend
- eth: stmmac: enable all safety features by default
Previous releases - always broken:
- bpf:
- fix pointer-leak due to insufficient speculative store bypass
mitigation (Spectre v4)
- skip task with pid=1 in send_signal_common() to avoid a splat
- fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
PERF_BPF_EVENT_PROG_UNLOAD events
- fix potential deadlock in htab_lock_bucket from same bucket index
but different map_locked index
- bluetooth:
- fix a buffer overflow in mgmt_mesh_add()
- hci_qca: fix driver shutdown on closed serdev
- ISO: fix possible circular locking dependency
- CIS: hci_event: fix invalid wait context
- wifi: brcmfmac: fixes for survey dump handling
- mptcp: explicitly specify sock family at subflow creation time
- netfilter: nft_payload: incorrect arithmetics when fetching VLAN
header bits
- tcp: fix rate_app_limited to default to 1
- l2tp: close all race conditions in l2tp_tunnel_register()
- eth: mlx5: fixes for QoS config and eswitch configuration
- eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
- eth: stmmac: fix invalid call to mdiobus_get_phy()
Misc:
- ethtool: add netlink attr in rss get reply only if the value is
not empty
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmPKxFAACgkQMUZtbf5S
Irtdlg/+OUggv3sKhcpUv39SnmMyiIhnNj9KhG+25Iiy7MJxaoNntCGsW3KXXTGo
JylOGMesociz+hCv4xHl9J61uwmz+qrUPKqqi7hSbEoHlAa3OrubQb+LW4/x0Jp0
bNJqqAYt04C+txhvsuF9odfZbKtvQ7RIU0XEzqEoES4UXTYQoCHEAwfn5twKDNNS
/x9OLSZnctAv1pinKZ8QTLjz0IZwHaBAbWNXkLe/HEu9nGrUndFtA5rJjyzrjw10
ZltTDfV2lr3SWVHsJShnTJ64u+aPBGmJmVzeNw64qRrmnYdFMCpUVoH222IurexO
aVPY9WUOwgUovetB8fmhPF0+n5Aa6pbTb4toQB1oVZ8X0h7WNrdfXZug1QDQOMbC
eGzsNdk6hvOeqBhbIKPLQXzaIxbPyXM+KUUbOxi+V4dahG79vG2BaQsrpFymueVs
cna7pL8dE1S9dR3SEB0KW4nyoWIugukZrzuX0efv1hxovuWn4yNJBt2lp8gQwY6v
yTk93Ou2LYDrm4yXLrHHWYNXU1u68Pq0o14xbx7tOYGan/evqfaaa1lmAvj7b1bq
g19FB4IrwA/1ZBoaOIMV8Ml7u5ww9LAFzJRAClEptOopADN4Gro2jgUYWjmxm+uV
RdlpQ2mI8iEeEH0FOITmdlFy7cbh7TWIkoiXHcCWifgfUE7sxnY=
=F3be
-----END PGP SIGNATURE-----
Merge tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, bluetooth, bpf and netfilter.
Current release - regressions:
- Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
addrconf", fix nsna_ping mode of team
- wifi: mt76: fix bugs in Rx queue handling and DMA mapping
- eth: mlx5:
- add missing mutex_unlock in error reporter
- protect global IPsec ASO with a lock
Current release - new code bugs:
- rxrpc: fix wrong error return in rxrpc_connect_call()
Previous releases - regressions:
- bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
- wifi:
- mac80211: fix crashes on Rx due to incorrect initialization of
rx->link and rx->link_sta
- mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
aggregation handling, crashes
- brcmfmac: fix regression for Broadcom PCIe wifi devices
- rndis_wlan: prevent buffer overflow in rndis_query_oid
- netfilter: conntrack: handle tcp challenge acks during connection
reuse
- sched: avoid grafting on htb_destroy_class_offload when destroying
- virtio-net: correctly enable callback during start_xmit, fix stalls
- tcp: avoid the lookup process failing to get sk in ehash table
- ipa: disable ipa interrupt during suspend
- eth: stmmac: enable all safety features by default
Previous releases - always broken:
- bpf:
- fix pointer-leak due to insufficient speculative store bypass
mitigation (Spectre v4)
- skip task with pid=1 in send_signal_common() to avoid a splat
- fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
PERF_BPF_EVENT_PROG_UNLOAD events
- fix potential deadlock in htab_lock_bucket from same bucket
index but different map_locked index
- bluetooth:
- fix a buffer overflow in mgmt_mesh_add()
- hci_qca: fix driver shutdown on closed serdev
- ISO: fix possible circular locking dependency
- CIS: hci_event: fix invalid wait context
- wifi: brcmfmac: fixes for survey dump handling
- mptcp: explicitly specify sock family at subflow creation time
- netfilter: nft_payload: incorrect arithmetics when fetching VLAN
header bits
- tcp: fix rate_app_limited to default to 1
- l2tp: close all race conditions in l2tp_tunnel_register()
- eth: mlx5: fixes for QoS config and eswitch configuration
- eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
- eth: stmmac: fix invalid call to mdiobus_get_phy()
Misc:
- ethtool: add netlink attr in rss get reply only if the value is not
empty"
* tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
Revert "Merge branch 'octeontx2-af-CPT'"
tcp: fix rate_app_limited to default to 1
bnxt: Do not read past the end of test names
net: stmmac: enable all safety features by default
octeontx2-af: add mbox to return CPT_AF_FLT_INT info
octeontx2-af: update cpt lf alloc mailbox
octeontx2-af: restore rxc conf after teardown sequence
octeontx2-af: optimize cpt pf identification
octeontx2-af: modify FLR sequence for CPT
octeontx2-af: add mbox for CPT LF reset
octeontx2-af: recover CPT engine when it gets fault
net: dsa: microchip: ksz9477: port map correction in ALU table entry register
selftests/net: toeplitz: fix race on tpacket_v3 block close
net/ulp: use consistent error code when blocking ULP
octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
tcp: avoid the lookup process failing to get sk in ehash table
Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
MAINTAINERS: add networking entries for Willem
net: sched: gred: prevent races when adding offloads to stats
l2tp: prevent lockdep issue in l2tp_tunnel_register()
...
It looks like a copy-paste error to describe the ZA buffer size using (the
number of P registers * the maximum size of a Z register). This doesn't
have practical impact though as we're always allocating enough space even
for the architectural maximum ZA storage, with SVL equals to 2048 bits.
Switch to use ZA_SIG_REGS_SIZE(SVE_VQ_MAX). setup_za() will need to
initialize two 64MB arraies with this change and can be optimized later (if
someone complain).
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221218092942.1940-2-yuzenghui@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It was introduced in commit b77e995e3b ("kselftest/arm64: Add a test
program to exercise the syscall ABI") but never actually used. Remove it.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221218092942.1940-1-yuzenghui@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add a test that generates SSVE and ZA context in a single signal frame to
ensure that nothing is going wrong in that case for any reason.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230117-arm64-test-ssve-za-v1-2-203c00150154@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There's a stray comment in the MTE test Makefile which documents
something that's since been removed, delete it.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230111-arm64-kselftest-clang-v1-6-89c69d377727@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The assembly portions of the MTE selftests need to be built with a
toolchain supporting MTE. Since we support GCC versions that lack MTE
support we have logic to suppress build of these tests when using such a
toolchain but that logic is broken for LLVM=1 builds, it uses CC but CC
is only set for LLVM builds in libs.mk which needs to be included after
we have selected which test programs to build.
Since all supported LLVM versions support MTE we can simply assume MTE
support when LLVM is set. This is not a thing of beauty but it does the
job.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230111-arm64-kselftest-clang-v1-5-89c69d377727@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When building with clang the toolchain refuses to link the signals
testcases since the assembly code has a reference to current which has
no initialiser so is placed in the BSS:
/tmp/signals-af2042.o: in function `fake_sigreturn':
<unknown>:51:(.text+0x40): relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against symbol `current' defined in .bss section in /tmp/test_signals-ec1160.o
Since the first statement in main() initialises current we may as well
fix this by moving the initialisation to build time so the variable
doesn't end up in the BSS.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230111-arm64-kselftest-clang-v1-4-89c69d377727@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The signal Makefile rules pass all the dependencies for each executable,
including headers, to the compiler which GCC is happy enough with but
clang rejects:
clang --target=aarch64-none-linux-gnu -fintegrated-as -Wall -O2 -g -I/home/broonie/git/linux/tools/testing/selftests/ -isystem /home/broonie/git/linux/usr/include -D_GNU_SOURCE -std=gnu99 -I. test_signals.c test_signals_utils.c testcases/testcases.c signals.S testcases/fake_sigreturn_bad_magic.c test_signals.h test_signals_utils.h testcases/testcases.h -o testcases/fake_sigreturn_bad_magic
clang: error: cannot specify -o when generating multiple output files
This happens because clang gets confused about what to do with the
header files, failing to identify them as source. This is not amazing
behaviour on clang's part and should ideally be fixed but even if that
happens we'd still need a new clang release so let's instead rework the
Makefile so we use variables for the lists of header and source files,
allowing us to only pass the source files to the compiler and keep clang
happy.
As a bonus the resulting Makefile is a bit easier to read.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230111-arm64-kselftest-clang-v1-3-89c69d377727@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are a number of freestanding static executables used in floating
point testing that have no runtime at all. These all define the main entry
point as:
.globl _start
function _start
_start:
but clang's integrated assembler complains that:
error: symbol '_start' is already defined
due to having both a label and function directive. Remove the label to
allow building with clang.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230111-arm64-kselftest-clang-v1-2-89c69d377727@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The .pushsection directive used to store the strings used with the .puts
macro in the floating point helpers does not provide a section type but
according to the gas documentation this should be mandatory and with the
clang built in as it actually is. Provide one so that we can build these
tests with LLVM=1.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230111-arm64-kselftest-clang-v1-1-89c69d377727@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ensure that we get signal context for TPIDR2 if and only if SME is present
on the system. Since TPIDR2 is owned by libc we merely validate that the
value is whatever it was set to, this isn't ideal since it's likely to
just be the default of 0 with current systems but it avoids future false
positives.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-tpidr2-sig-v3-4-c77c6c8775f4@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When validating the set of signal context records check that any TPIDR2
record has the correct size, also suppressing warnings due to seeing an
unknown record type.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-tpidr2-sig-v3-3-c77c6c8775f4@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We should have a ZT register frame with an expected size when ZA is enabled
and have no ZT frame when ZA is disabled. Since we don't load any data into
ZT we expect the data to all be zeros since the architecture guarantees it
will be set to 0 as ZA is enabled.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-18-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Following the pattern for the other register sets add a stress test program
for ZT0 which continually loads and verifies patterns in the register in
an effort to discover context switching problems.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-14-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When use tools/testing/selftests/kselftest_install.sh to make the
kselftest-list.txt under tools/testing/selftests/kselftest_install.
Then use tools/testing/selftests/kselftest_install/run_kselftest.sh to run
all the kselftests in kselftest-list.txt, it will be blocked by case
"filesystems/fat: run_fat_tests.sh" with "Warning: file run_fat_tests.sh
is not executable", so grant executable permission to run_fat_tests.sh to
fix this issue.
Link: https://lkml.kernel.org/r/dfdbba6df8a1ab34bb1e81cd8bd7ca3f9ed5c369.1673424747.git.pengfei.xu@intel.com
Fixes: dd7c9be330 ("selftests/filesystems: add a vfat RENAME_EXCHANGE test")
Signed-off-by: Pengfei Xu <pengfei.xu@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A testcase to check that verifier.c:copy_register_state() preserves
register parentage chain and livness information.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230106142214.1040390-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Avoid race between process wakeup and tpacket_v3 block timeout.
The test waits for cfg_timeout_msec for packets to arrive. Packets
arrive in tpacket_v3 rings, which pass packets ("frames") to the
process in batches ("blocks"). The sk waits for req3.tp_retire_blk_tov
msec to release a block.
Set the block timeout lower than the process waiting time, else
the process may find that no block has been released by the time it
scans the socket list. Convert to a ring of more than one, smaller,
blocks with shorter timeouts. Blocks must be page aligned, so >= 64KB.
Fixes: 5ebfb4cc30 ("selftests/net: toeplitz test")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230118151847.4124260-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are some issues with the bpf/nat6to4.c building.
1. It use TEST_CUSTOM_PROGS, which will add the nat6to4.o to
kselftest-list file and run by common run_tests.
2. When building the test via `make -C tools/testing/selftests/
TARGETS="net"`, the nat6to4.o will be build in selftests/net/bpf/
folder. But in test udpgro_frglist.sh it refers to ../bpf/nat6to4.o.
The correct path should be ./bpf/nat6to4.o.
3. If building the test via `make -C tools/testing/selftests/ TARGETS="net"
install`. The nat6to4.o will be installed to kselftest_install/net/
folder. Then the udpgro_frglist.sh should refer to ./nat6to4.o.
To fix the confusing test path, let's just move the nat6to4.c to net folder
and build it as TEST_GEN_FILES.
Fixes: edae34a3ed ("selftests net: add UDP GRO fraglist + bpf self-tests")
Tested-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230118020927.3971864-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
__USE_GNU should be an internal macro only used inside glibc. Either
memfd_create() or fallocate() requires _GNU_SOURCE per man page, where
__USE_GNU will further be defined by glibc headers include/features.h:
#ifdef _GNU_SOURCE
# define __USE_GNU 1
#endif
This fixes:
>> hugetlb-madvise.c:20: warning: "__USE_GNU" redefined
20 | #define __USE_GNU
|
In file included from /usr/include/x86_64-linux-gnu/bits/libc-header-start.h:33,
from /usr/include/stdlib.h:26,
from hugetlb-madvise.c:16:
/usr/include/features.h:407: note: this is the location of the previous definition
407 | # define __USE_GNU 1
|
Link: https://lkml.kernel.org/r/Y8V9z+z6Tk7NetI3@x1n
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Selftests vm builds break when doing cross-compilation. The Makefile
MACHINE variable incorrectly picks upp the host machine architecture.
If the CROSS_COMPILE variable is set, dig out the target host architecture
from CROSS_COMPILE, instead of calling uname.
Link: https://lkml.kernel.org/r/20230109114251.3349638-1-bjorn@kernel.org
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If MADV_PAGEOUT is not defined (e.g., on AlmaLinux 8), compilation will
fail. Let's fix that like khugepaged.c does by conditionally defining
MADV_PAGEOUT.
Link: https://lkml.kernel.org/r/20230109171255.488749-1-david@redhat.com
Fixes: 69c66add56 ("selftests/vm: anon_cow: test COW handling of anonymous memory")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test to assert that we can mremap() and expand a mapping starting
from an offset within an existing mapping. We unmap the last page in a 3
page mapping to ensure that the remap should always succeed, before
remapping from the 2nd page.
This is additionally a regression test for the issue solved in "mm,
mremap: fix mremap() expanding vma with addr inside vma" and confirmed to
fail prior to the change and pass after it.
Finally, this patch updates the existing mremap expand merge test to check
error conditions and reduce code duplication between the two tests.
[lstoakes@gmail.com: increment num_expand_tests so test doesn't complain about unexpected tests being run]
Link: https://lkml.kernel.org/r/8ff3ba3cadc0b6c1b2688ae5c851bf73aa062d57.1673701836.git.lstoakes@gmail.com
Link: https://lkml.kernel.org/r/02b117a8ffd52acc01dc66c2fb39754f08d92c0e.1672675824.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jakub Matěna <matenajakub@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rename selftets/vm to selftests/mm for being more consistent with the
code, documentation, and tools directories, and won't be confused with
virtual machines.
[sj@kernel.org: convert missing vm->mm changes]
Link: https://lkml.kernel.org/r/20230107230643.252273-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20230103180754.129637-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, anonymous PTE-mapped THPs cannot be collapsed in-place:
collapsing (e.g., via MADV_COLLAPSE) implies allocating a fresh THP and
mapping that new THP via a PMD: as it's a fresh anon THP, it will get the
exclusive flag set on the head page and everybody is happy.
However, if the kernel would ever support in-place collapse of anonymous
THPs (replacing a page table mapping each sub-page of a THP via PTEs with
a single PMD mapping the complete THP), exclusivity information stored for
each sub-page would have to be collapsed accordingly:
(1) All PTEs map !exclusive anon sub-pages: the in-place collapsed THP
must not not have the exclusive flag set on the head page mapped by
the PMD. This is the easiest case to handle ("simply don't set any
exclusive flags").
(2) All PTEs map exclusive anon sub-pages: when collapsing, we have to
clear the exclusive flag from all tail pages and only leave the
exclusive flag set for the head page. Otherwise, fork() after
collapse would not clear the exclusive flags from the tail pages
and we'd be in trouble once PTE-mapping the shared THP when writing
to shared tail pages that still have the exclusive flag set. This
would effectively revert what the PTE-mapping code does when
propagating the exclusive flag to all sub-pages.
(3) PTEs map a mixture of exclusive and !exclusive anon sub-pages (can
happen e.g., due to MADV_DONTFORK before fork()). We must not
collapse the THP in-place, otherwise bad things may happen:
the exclusive flags of sub-pages would get ignored and the
exclusive flag of the head page would get used instead.
Now that we have MADV_COLLAPSE in place to trigger collapsing a THP, let's
add some test cases that would bail out early, if we'd
voluntarily/accidantially unlock in-place collapse for anon THPs and
forget about taking proper care of exclusive flags.
Running the test on a kernel with MADV_COLLAPSE support:
# [INFO] Anonymous THP tests
# [RUN] Basic COW after fork() when collapsing before fork()
ok 169 No leak from parent into child
# [RUN] Basic COW after fork() when collapsing after fork() (fully shared)
ok 170 # SKIP MADV_COLLAPSE failed: Invalid argument
# [RUN] Basic COW after fork() when collapsing after fork() (lower shared)
ok 171 No leak from parent into child
# [RUN] Basic COW after fork() when collapsing after fork() (upper shared)
ok 172 No leak from parent into child
For now, MADV_COLLAPSE always seems to fail if all PTEs map shared
sub-pages.
Link: https://lkml.kernel.org/r/20230104144905.460075-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Preallocations are common in the VMA code to avoid allocating under
certain locking conditions. The preallocations must also cover the
worst-case scenario. Removing the GFP_ZERO flag from the
kmem_cache_alloc() (and bulk variant) calls will reduce the amount of time
spent zeroing memory that may not be used. Only zero out the necessary
area to keep track of the allocations in the maple state. Zero the entire
node prior to using it in the tree.
This required internal changes to node counting on allocation, so the test
code is also updated.
This restores some micro-benchmark performance: up to +9% in mmtests mmap1
by my testing +10% to +20% in mmap, mmapaddr, mmapmany tests reported by
Red Hat
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2149636
Link: https://lkml.kernel.org/r/20230105160427.2988454-1-Liam.Howlett@oracle.com
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Reported-by: Jirka Hladky <jhladky@redhat.com>
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix a typo of "comaring" which should be "comparing".
Link: https://lkml.kernel.org/r/202212231050245952617@zte.com.cn
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add simple test cases for scheme filters of DAMON sysfs interface. The
test cases check if the files are populated as expected, receives some
valid inputs, and refuses some invalid inputs.
Link: https://lkml.kernel.org/r/20221205230830.144349-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If CONFIG_NF_CONNTRACK=m, there are no definitions of NF_NAT_MANIP_SRC
and NF_NAT_MANIP_DST in vmlinux.h, build test_bpf_nf.c failed.
$ make -C tools/testing/selftests/bpf/
CLNG-BPF [test_maps] test_bpf_nf.bpf.o
progs/test_bpf_nf.c:160:42: error: use of undeclared identifier 'NF_NAT_MANIP_SRC'
bpf_ct_set_nat_info(ct, &saddr, sport, NF_NAT_MANIP_SRC);
^
progs/test_bpf_nf.c:163:42: error: use of undeclared identifier 'NF_NAT_MANIP_DST'
bpf_ct_set_nat_info(ct, &daddr, dport, NF_NAT_MANIP_DST);
^
2 errors generated.
Copy the definitions in include/net/netfilter/nf_nat.h to test_bpf_nf.c,
in order to avoid redefinitions if CONFIG_NF_CONNTRACK=y, rename them with
___local suffix. This is similar with commit 1058b6a78d ("selftests/bpf:
Do not fail build if CONFIG_NF_CONNTRACK=m/n").
Fixes: b06b45e82b ("selftests/bpf: add tests for bpf_ct_set_nat_info kfunc")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/1674028604-7113-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Now that the new API for hid_bpf_attach_prog() is in place, ensure we
get an fd when calling this function. And remove the fallback code.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We plan on changing the return value of hid_bpf_attach_prog().
Instead of returning an error code, it will return an fd to a bpf_link.
This bpf_link is responsible for the binding between the bpf program and
the hid device.
Add a fallback mechanism to not break bisections by pinning the program
when we run this test against the non changed kernel.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Turns out that if bpffs was not mounted, the test was silently passing.
So ensure it passes by checking the mount command result.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add a second BPF program to attach to the device, as the development of
this feature showed that we also need to ensure we can detach multiple
programs to a device (hid_bpf_link->hid_table_index was actually not set
initially, and this lead to any BPF program not being released except for
the first one).
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
clang doesn't like to compile a source to the final binary directly:
clang-14: error: cannot specify -o when generating multiple output files
So split the final rule in 2, and ensure we compile all dependencies
before.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Similar-ish in many points from the script in selftests/bpf, with a few
differences:
- relies on boot2container instead of a plain qemu image (meaning that
we can take any container in a registry as a base)
- runs in the hid selftest dir, and such uses the test program from there
- the working directory to store the config is in
tools/selftests/hid/results
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Adding verifier tests for loading all types od allowed
sleepable programs plus reject for tp_btf type.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230117223705.440975-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
the cc:stable tag.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY8XcmAAKCRDdBJ7gKXxA
jsSsAQC98lXwu4wz+3S7f2Y0u+rwttZ/PlGM3s+37XO50fDtqQEA1XVV3ABWr46M
XlwiwCtj7tFiM3zT1nLGS+SmOodvogA=
=WrCJ
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"21 hotfixes. Thirteen of these address pre-6.1 issues and hence have
the cc:stable tag"
* tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
init/Kconfig: fix typo (usafe -> unsafe)
nommu: fix split_vma() map_count error
nommu: fix do_munmap() error path
nommu: fix memory leak in do_mmap() error path
MAINTAINERS: update Robert Foss' email address
proc: fix PIE proc-empty-vm, proc-pid-vm tests
mm: update mmap_sem comments to refer to mmap_lock
include/linux/mm: fix release_pages_arg kernel doc comment
lib/win_minmax: use /* notation for regular comments
kasan: mark kasan_kunit_executing as static
nilfs2: fix general protection fault in nilfs_btree_insert()
Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning
mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects
hugetlb: unshare some PMDs when splitting VMAs
mm: fix vma->anon_name memory leak for anonymous shmem VMAs
mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()
...
Pable Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Increase timeout to 120 seconds for netfilter selftests to fix
nftables transaction tests, from Florian Westphal.
2) Fix overflow in bitmap_ip_create() due to integer arithmetics
in a 64-bit bitmask, from Gavrilov Ilia.
3) Fix incorrect arithmetics in nft_payload with double-tagged
vlan matching.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.
One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmPCrI8QHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kT1lB/wPbLpePLzZfDGyV/NR9gi4FuJiaRfhlklV
rbxnJce050GERbSQoF/r4zrxn2pzvIWGMh1xWZBGi/q8mT2rOIYtVqUahY9YuL/Z
7+xqdCOALIxEj+cXqYocqp8/NFgUWLGuMoomc9lWvEkUs+zOvkD8Z/bRecfPYvOa
BftPALmtXgx46Ecce0gZvvh4YULpVLNdDPPiwZTabV+47Cl8+cJ0Y+iEHsUfOesU
hQG0unWJH77O3IU4QxiirLekLP/6a5O5f0W7u3PZmNNv7N+UdwE+De+QF0aamfgA
LZDO1qOakflegFZvK0JchCzS4hc6dtRKqIvNM3cCBMXLvV4REHKP
=geNh
-----END PGP SIGNATURE-----
Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"memblock: always release pages to the buddy allocator in
memblock_free_late()
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the
deferred init process.
memblock_free_pages() is called by memblock_free_late(), which is used
to free reserved ranges after memblock_free_all() has run. All pages
in reserved ranges have been initialized at that point, and
accordingly, those pages are not touched by the deferred init process.
This means that currently, if the pages that memblock_free_late()
intends to release are in the deferred range, they will never be
released to the buddy allocator. They will forever be reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call
__free_pages_core() directly instead.
One case where this issue can occur in the wild is EFI boot on x86_64.
The x86 EFI code reserves all EFI boot services memory ranges via
memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively).
If any of those ranges happens to fall within the deferred init range,
the pages will not be released and that memory will be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages"
* tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm: Always release pages to the buddy allocator in memblock_free_late().
MPTCP protocol supports having subflows in both IPv4 and IPv6. In Linux,
it is possible to have that if the MPTCP socket has been created with
AF_INET6 family without the IPV6_V6ONLY option.
Here, a new IPv4 subflow is being added to the initial IPv6 connection,
then being removed using Netlink commands.
Cc: stable@vger.kernel.org # v5.19+
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the missing space after 'dest' variable assignment.
This change will resolve the following checkpatch.pl
script error:
ERROR: spaces required around that '+=' (ctx:VxW)
Signed-off-by: Roberto Valenzuela <valenzuelarober@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230113180257.39769-1-valenzuelarober@gmail.com
overlayfs may be disabled in the kernel configuration, causing related
tests to fail. Check that overlayfs is supported at runtime, so we can
skip layout2_overlay.* accordingly.
Signed-off-by: Jeff Xu <jeffxu@google.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230113053229.1281774-2-jeffxu@google.com
[mic: Reword comments and constify variables]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Current release - regressions:
- rxrpc:
- only disconnect calls in the I/O thread
- move client call connection to the I/O thread
- fix incoming call setup race
- eth: mlx5:
- restore pkt rate policing support
- fix memory leak on updating vport counters
Previous releases - regressions:
- gro: take care of DODGY packets
- ipv6: deduct extension header length in rawv6_push_pending_frames
- tipc: fix unexpected link reset due to discovery messages
Previous releases - always broken:
- sched: disallow noqueue for qdisc classes
- eth: ice: fix potential memory leak in ice_gnss_tty_write()
- eth: ixgbe: fix pci device refcount leak
- eth: mlx5:
- fix command stats access after free
- fix macsec possible null dereference when updating MAC security entity (SecY)
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmPAGskSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk9aQQAInUtOfTi0EwR5oveTMWOcDc8P1rGFru
Yfid6d4gVRKDm9tosW8HSlnMVCDIGrvhmwVfMevkLjgQtgRXYecXM7MYMVH+6f6e
yIF0azu5z2PEQvfTLTuTN++bQ3lgyfYXOB3mScCOtBE9BFXwjtL111Qby1QlvHTg
sPIH5kCxpDfg3i2rge1BiyoQ4BWc4c6Us86CriKDX1vl7lilJccpWYxKFY8hyRzl
PF3OVJlMph7jny4zKOa2chWUnDj5ycK289/x2rOla4EOX7R8IHDyL+sAAAvdm7/q
FDuuetC3M+eo8/NTLiZkjTipw1nO+G0c1VtzAZ/wX1QkomwmN0yyPx47EllVH+ez
YQ80UrXOF3f7xYXHZIhwCrIVSaHpLyZHSfDBW1r+vTokIRSJ+5TOIH/YAUUKSR0U
kE4r+eHU3AdcBsDV0pZXtE0mUxROwRatOt5u+XQ3WYdORDyKo0HYu8QskIurgqUv
Cnr554zogmC4Bt/uY7j5u9NvhUH/Xyp5RVXtaQnwz+hcncgVFASDDpblejHE2Lcu
8fb1NrwB7AsPnMDGUSjnG0BbQaTo6ccacBrIrhWxRbEBAQmZEbO507yoYz21EBM5
5XKWd1bTq1YG5oPYl9WR3FI9hSQN7vKsUoW4SXsuh5j65ENhCAwBK8i3liy1j+dS
sf5xUgCg6KyH
=4ANJ
-----END PGP SIGNATURE-----
Merge tag 'net-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc.
The rxrpc changes are noticeable large: to address a recent regression
has been necessary completing the threaded refactor.
Current release - regressions:
- rxrpc:
- only disconnect calls in the I/O thread
- move client call connection to the I/O thread
- fix incoming call setup race
- eth: mlx5:
- restore pkt rate policing support
- fix memory leak on updating vport counters
Previous releases - regressions:
- gro: take care of DODGY packets
- ipv6: deduct extension header length in rawv6_push_pending_frames
- tipc: fix unexpected link reset due to discovery messages
Previous releases - always broken:
- sched: disallow noqueue for qdisc classes
- eth: ice: fix potential memory leak in ice_gnss_tty_write()
- eth: ixgbe: fix pci device refcount leak
- eth: mlx5:
- fix command stats access after free
- fix macsec possible null dereference when updating MAC security
entity (SecY)"
* tag 'net-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
r8152: add vendor/device ID pair for Microsoft Devkit
net: stmmac: add aux timestamps fifo clearance wait
bnxt: make sure we return pages to the pool
net: hns3: fix wrong use of rss size during VF rss config
ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()
net: sched: disallow noqueue for qdisc classes
iavf/iavf_main: actually log ->src mask when talking about it
igc: Fix PPS delta between two synchronized end-points
ixgbe: fix pci device refcount leak
octeontx2-pf: Fix resource leakage in VF driver unbind
selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure.
selftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns.
selftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad".
net/mlx5e: Fix macsec possible null dereference when updating MAC security entity (SecY)
net/mlx5e: Fix macsec ssci attribute handling in offload path
net/mlx5: E-switch, Coverity: overlapping copy
net/mlx5e: Don't support encap rules with gbp option
net/mlx5: Fix ptp max frequency adjustment range
net/mlx5e: Fix memory leak on updating vport counters
...
We are missing a ) when we attempt to complain about not having enough
configuration for clang, resulting in the rather inscrutable error:
../lib.mk:23: *** unterminated call to function 'error': missing ')'. Stop.
Add the required ) so we print the message we were trying to print.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The BTI selftests are built both with and without BTI support, validating
both the generation of BTI signals as expected for binaries without BTI
support. Both versions of the binary currently skip all their tests when
the system does not support BTI, however this is excessive since we do have
a defined ABI for how the programs should function in this case (especially
for the non-BTI binary). Update the test program to run all the tests
unconditionally, adding a runtime adjustment of the expected results on
systems that don't support BTI where we currently handle the build time
case.
The tests all use HINT space instructions, BTI itself is a HINT as is
are the PAC instructions that function as landing pads, so nothing in the
tests depends on support for BTI in the kernel or hardware.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230110-arm64-bti-selftest-skip-v1-2-143ecdc84567@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently when skipping tests in the BTI testsuite we assign the same
number to every test since we forget to increment the current test number
as we skip, causing warnings about not running the expected test count and
potentially otherwise confusing result parsers. Fix this by adding an
appropriate increment.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230110-arm64-bti-selftest-skip-v1-1-143ecdc84567@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As documented in issue C215 in the known issues list for DDI0487I.a [1] Arm
will be making a retroactive change to SVE to remove the possibility of
selecting non power of two vector lengths. This has no impact on existing
physical implementations but most virtual implementations have implemented
the full range of permissible vector lengths. Given how demanding fp-stress
is for these implementations update to only attempt to enumerate the power
of two vector lengths, reducing the load created on existing virtual
implementations and only exercising the functionality that will be seen in
physical implementations.
[1] https://developer.arm.com/documentation/102105/ia-00/
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221220-arm64-fp-stress-pow2-v1-1-d0ce756b57af@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As documented in issue C215 in the known issues list for DDI0487I.a [1] Arm
will be making a retroactive change to SVE to remove the possibility of
selecting non power of two vector lengths. This has no impact on existing
physical implementations but most virtual implementations have implemented
the full range of permissible vector lengths.
Since virtual implementations are noticeably slow in general and the larger
vector lengths amplify the issue there's a useful improvement in runtime
from only covering the vector lengths that will exist in practical systems,
adjust our enumeration accordingly. We have other tests that aim to cover
the enumeration interfaces.
For symmetry we apply the same change to the eumeration for SME vector
lengths, though the power of two restriction was already present for SME
so there is no impact on the set of vector lengths tested.
[1] https://developer.arm.com/documentation/102105/ia-00/
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-syscall-abi-sme-only-v1-4-4fabfbd62087@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently syscall-abi only covers SME in the case where the system supports
SVE however it is architecturally valid to support SME without SVE. Update
the program to cover this case, this requires adjustments in the code to
check for SVCR.SM being set when deciding if we're handling the FPSIMD or
SVE registers and the addition of new test cases for the SME only case.
Note that in the SME only case we should not save the SVE registers after a
syscall since even if we were in streaming mode and therefore set them the
syscall should have exited streaming mode, we check that we have done so by
looking at SVCR.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-syscall-abi-sme-only-v1-3-4fabfbd62087@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently syscall-abi not only enumerates the SVE VLs twice while working
out how many tests are planned, it also repeats the enumeration process
while doing the actual tests. Record the VLs when we enumerate and use that
list when we are performing the tests, removing some duplicated logic.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-syscall-abi-sme-only-v1-2-4fabfbd62087@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
SME does not mandate any specific VL so we may not have 128 bit SME but
the algorithm used for enumerating VLs assumes that we will. Add the
required check to ensure that the algorithm terminates.
Fixes: 43e3f85523 ("kselftest/arm64: Add SME support to syscall ABI test")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221223-arm64-syscall-abi-sme-only-v1-1-4fabfbd62087@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This adds test for sending message, bigger than peer's buffer size.
For SOCK_SEQPACKET socket it must fail, as this type of socket has
message size limit.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This updates message bound test making it more complex. Instead of
sending 1 bytes messages with one MSG_EOR bit, it sends messages of
random length(one half of messages are smaller than page size, second
half are bigger) with random number of MSG_EOR bits set. Receiver
also don't know total number of messages.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The mm_numa_cid related rseq patches from the series were not picked up
into the tip tree, so enabling the mm_numa_cid test needs to be
reverted.
This reverts commit b344b8f2d8.
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/oe-lkp/202301040903.2dd1e25b-oliver.sang@intel.com
Implement automatic switching of XDP programs and execution modes if
needed by a test. This makes it much simpler to write a test as it
only has to say what XDP program it needs if it is not the default
one. This also makes it possible to remove the initial explicit
attachment of the XDP program as well as the explicit mode switch in
the code. These are now handled by the same code that just checks if a
switch is necessary, so no special cases are needed.
The default XDP program for all tests is one that sends all packets to
the AF_XDP socket. If you need another one, please use the new
function test_spec_set_xdp_prog() to specify what XDP programs and
maps to use for this test.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-16-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Automatically restore the default packet stream if needed at the end
of each test. This so that test writers do not forget to do this.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-15-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make the thread dispatching code common by unifying the dual and
single thread dispatcher code. This so we do not have to add code in
two places in upcoming commits.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-14-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new test where some of the packets are not passed to the AF_XDP
socket and instead get a XDP_DROP verdict. This is important as it
tests the recycling mechanism of the buffer pool. If a packet is not
sent to the AF_XDP socket, the buffer the packet resides in is instead
recycled so it can be used again without the round-trip to user
space. The test introduces a new XDP program that drops every other
packet.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-13-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Get rid of the built-in XDP program that was part of the old libbpf
code in xsk.c and replace it with an eBPF program build using the
framework by all the other bpf selftests. This will form the base for
adding more programs in later commits.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-12-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Remove unnecessary code in the control path. This is located in the
file xsk.c that was moved from libbpf when the xsk support there was
deprecated. Some of the code there is not needed anymore as the
selftests are only guaranteed to run on the kernel it is shipped
with. Therefore, all the code that has to deal with compatibility of
older kernels can be dropped and also any other function that is not
of any use for the tests.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-11-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Load and attach the XDP program only once per XDP mode that is being
executed. Today, the XDP program is loaded and attached for every
test, then unloaded, which takes a long time on real NICs, since they
have to reconfigure their HW, in contrast to veth. The test suite now
completes in 21 seconds, instead of 207 seconds previously on my
machine. This is a speed-up of around 10x.
This is accomplished by moving the XDP loading from the worker threads
to the main thread and replacing the XDP loading interfaces of xsk.c
that was taken from the xsk support in libbpf, with something more
explicit that is more useful for these tests. Instead, the relevant
file descriptors and ifindexes are just passed down to the new
functions.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-10-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Remove the namespaces used as they fill no function. This will
simplify the code for speeding up the tests in the following commits.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-9-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replace our own homegrown assembly store/release and load/acquire
implementations with the HW agnositic atomic APIs C11 offers. This to
make the code more portable, easier to read, and reduce the
maintenance burden.
The original code used load-acquire and store-release barriers
hand-coded in assembly. Since C11, these kind of operations are
offered as built-ins in gcc and llvm. The load-acquire operation
prevents hoisting of non-atomic memory operations to before this
operation and it corresponds to the __ATOMIC_ACQUIRE operation in the
built-in atomics. The store-release operation prevents hoisting of
non-atomic memory operations to after this operation and it
corresponds to the __ATOMIC_RELEASE operation in the built-in atomics.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-8-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new option to the test_xsk.sh script that only creates the two
veth netdevs and the extra namespace, then exits without running any
tests. The failed test can then be executed in the debugger without
having to create the netdevs and namespace manually. For ease-of-use,
the veth netdevs to use are printed so they can be copied into the
debugger.
Here is an example how to use it:
> sudo ./test_xsk.sh -d
veth10 veth11
> gdb xskxceiver
In gdb:
run -i veth10 -i veth11
And now the test cases can be debugged with gdb.
If you want to debug the test suite on a real NIC in loopback mode,
there is no need to use this feature as you already know the netdev of
your NIC.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-7-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Print the correct error codes when exiting the test suite due to some
terminal error. Some of these had a switched sign and some of them
printed zero instead of errno.
Fixes: facb7cb2e9 ("selftests/bpf: Xsk selftests - SKB POLL, NOPOLL")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-5-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Submit the correct number of frames in the function
xsk_populate_fill_ring(). For the tests that set the flag
use_addr_for_fill, uninitialized buffers were sent to the fill ring
following the correct ones. This has no impact on the tests, since
they only use the ones that were initialized. But for correctness,
this should be fixed.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-4-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Do not close descriptors that have never been used. File descriptor
fields that are not in use are erroneously marked with the number 0,
which is a valid fd. Mark unused fds with -1 instead and do not close
these when deleting the socket.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-3-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Print the correct payload when the packet dump option is selected. The
network to host conversion was forgotten and the payload was
erronously declared to be an int instead of an unsigned int.
Fixes: facb7cb2e9 ("selftests/bpf: Xsk selftests - SKB POLL, NOPOLL")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230111093526.11682-2-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
vsyscall detection code uses direct call to the beginning of
the vsyscall page:
asm ("call %P0" :: "i" (0xffffffffff600000))
It generates "call rel32" instruction but it is not relocated if binary
is PIE, so binary segfaults into random userspace address and vsyscall
page status is detected incorrectly.
Do more direct:
asm ("call *%rax")
which doesn't do need any relocaltions.
Mark g_vsyscall as volatile for a good measure, I didn't find instruction
setting it to 0. Now the code is obviously correct:
xor eax, eax
mov rdi, rbp
mov rsi, rbp
mov DWORD PTR [rip+0x2d15], eax # g_vsyscall = 0
mov rax, 0xffffffffff600000
call rax
mov DWORD PTR [rip+0x2d02], 1 # g_vsyscall = 1
mov eax, DWORD PTR ds:0xffffffffff600000
mov DWORD PTR [rip+0x2cf1], 2 # g_vsyscall = 2
mov edi, [rip+0x2ceb] # exit(g_vsyscall)
call exit
Note: fixed proc-empty-vm test oopses 5.19.0-28-generic kernel
but this is separate story.
Link: https://lkml.kernel.org/r/Y7h2xvzKLg36DSq8@p183
Fixes: 5bc73bb345 ("proc: test how it holds up with mapping'less process")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The kselftest framework uses a default timeout of 45 seconds for
all test scripts.
Increase the timeout to two minutes for the netfilter tests, this
should hopefully be enough,
Make sure that, should the script be canceled, the net namespace and
the spawned ping instances are removed.
Fixes: 25d8bcedbf ("selftests: add script to stress-test nft packet path vs. control plane")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Test the getpagesize() function. Make sure it returns the correct
value.
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Use 'set -e' and an exit handler to stop the script if a command fails
and ensure the test environment is cleaned up in any case. Also, handle
the case where the script is interrupted by SIGINT.
The only command that's expected to fail is 'wait $ping_pid', since
it's killed by the script. Handle this case with '|| true' to make it
play well with 'set -e'.
Finally, return the Kselftest SKIP code (4) when the script breaks
because of an environment problem or a command line failure. The 0 and
1 return codes should now reliably indicate that all tests have been
run (0: all tests run and passed, 1: all tests run but at least one
failed, 4: test script didn't run completely).
Fixes: b690842d12 ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This selftest currently runs half in the current namespace and half in
a netns of its own. Therefore, the test can fail if the current
namespace is already configured with incompatible parameters (for
example if it already has a veth0 interface).
Adapt the script to put both ends of the veth pair in their own netns.
Now veth0 is created in NS0 instead of the current namespace, while
veth1 is set up in NS1 (instead of the 'testing' netns).
The user visible netns names are randomised to minimise the risk of
conflicts with already existing namespaces. The cleanup() function
doesn't need to remove the virtual interface anymore: deleting NS0 and
NS1 automatically removes the virtual interfaces they contained.
We can remove $ns, which was only used to run ip commands in the
'testing' netns (let's use the builtin "-netns" option instead).
However, we still need a similar functionality as ping and tcpdump
now need to run in NS0. So we now have $RUN_NS0 for that.
Fixes: b690842d12 ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The ping command can run before DAD completes. In that case, ping may
fail and break the selftest.
We don't need DAD here since we're working on isolated device pairs.
Fixes: b690842d12 ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reduces the size of init from ~600KB to ~1KB.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Add the required values to identify_qemu() and
identify_bootimage().
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Adjust size parameter in connect() to match the type of the parameter, to
fix "No such file or directory" error in selftests/net/af_unix/
test_oob_unix.c:127.
The existing code happens to work provided that the autogenerated pathname
is shorter than sizeof (struct sockaddr), which is why it hasn't been
noticed earlier.
Visible from the trace excerpt:
bind(3, {sa_family=AF_UNIX, sun_path="unix_oob_453059"}, 110) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa6a6577a10) = 453060
[pid <child>] connect(6, {sa_family=AF_UNIX, sun_path="unix_oob_45305"}, 16) = -1 ENOENT (No such file or directory)
BUG: The filename is trimmed to sizeof (struct sockaddr).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Cc: Florian Westphal <fw@strlen.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the function chk_msk_inuse() to diag.sh, which is used to check the
statistics of mptcp socket in use. As mptcp socket in listen state will
be closed randomly after 'accept', we need to get the count of listening
mptcp socket through 'ss' command.
All tests pass.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For now, mptcp_connect won't exit after receiving the 'SIGUSR1' signal
if '-r' is set. Fix this by skipping poll and sleep in copyfd_io_poll()
if 'quit' is set.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.
One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
# grep -E 'Node|spanned|present|managed' /proc/zoneinfo
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
# grep -E 'Node|spanned|present|managed' /proc/zoneinfo
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
* Fix kernel-doc for memblock_phys_free() to use correct names for the
counterpart allocation methods
* Fix compilation error in memblock tests
-----BEGIN PGP SIGNATURE-----
iQFMBAABCAA2FiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmO6bMYYHG1pa2UucmFw
b3BvcnRAZ21haWwuY29tAAoJEDkDhibLDv2RUt8H/Ayh8cO8kpKbJ2jH9eFwba6T
JPOv5P3hjKHbY+NMGG9MfAUTMw6Lxa59EiC+xGyPFWrtt/ZpKgaaBznxSC7dkVYy
XarzK3zulZmL4BdjBQearXXj5gSkZnuEHb9Pjs19GLDnGoRJczoHYG6rXlAWh1sg
w4H4g4QyHKYqqUlnbDi0GTIaaKY76Uprw6x6w8xwtmdUw9+1edqnoN9fRJxY9AwM
7ez7FR6rJ5g7AgcLfa/dTaCZgloLPKVdMI+q29lBXM5xDrUpxeEfXg5iU7Hax+rT
4o43HXGVGyiuWYlgdWd9B18M4Xma55K1Q+cQ3cELsAvq7XACvQcA6xi0QnFfgpU=
=DNKq
-----END PGP SIGNATURE-----
Merge tag 'fixes-2023-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
"Small fixes in kernel-doc and tests:
- Fix kernel-doc for memblock_phys_free() to use correct names for
the counterpart allocation methods
- Fix compilation error in memblock tests"
* tag 'fixes-2023-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: Fix doc for memblock_phys_free
memblock tests: Fix compilation error.
Keep track of previously issued registrations and compare the result
with MEMBARRIER_CMD_GET_REGISTRATIONS return value.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20221207164338.1535591-3-mclapinski@google.com
The cxl_test machinery builds modified versions of the modules in
drivers/cxl/ and intercepts some of their calls to allow cxl_test to
inject mock CXL topologies for test.
However, if cxl_test attempts the same with production modules,
fireworks ensue as Luis discovered [1]. Prevent that scenario by
arranging for cxl_test to check for a "watermark" symbol in each of the
modules it expects to be modified before the test can run. This turns
undefined runtime behavior or crashes into a safer failure to load the
cxl_test module.
Link: http://lore.kernel.org/r/20221209062919.1096779-1-mcgrof@kernel.org [1]
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons,
avoid null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught
by Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch
and kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal document
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmO3MLcACgkQMUZtbf5S
IrsEQBAAijPrpxsGMfX+VMqZ8RPKA3Qg8XF3ji2fSp4c0kiKv6lYI7PzPTR3u/fj
CAlhQMHv7z53uM6Zd7FdUVl23paaEycu8YnlwSubg9z+wSeh/RQ6iq94mSk1PV+K
LLVR/yop2N35Yp/oc5KZMb9fMLkxRG9Ci73QUVVYgvIrSd4Zdm13FjfVjL2C1MZH
Yp003wigMs9IkIHOpHjNqwn/5s//0yXsb1PgKxCsaMdMQsG0yC+7eyDmxshCqsji
xQm15mkGMjvWEYJaa4Tj4L3JW6lWbQzCu9nqPUX16KpmrnScr8S8Is+aifFZIBeW
GZeDYgvjSxNWodeOrJnD3X+fnbrR9+qfx7T9y7XighfytAz5DNm1LwVOvZKDgPFA
s+LlxOhzkDNEqbIsusK/LW+04EFc5gJyTI2iR6s4SSqmH3c3coJZQJeyRFWDZy/x
1oqzcCcq8SwGUTJ9g6HAmDQoVkhDWDT/ZcRKhpWG0nJub972lB2iwM7LrAu+HoHI
r8hyCkHpOi5S3WZKI9gPiGD+yOlpVAuG2wHg2IpjhKQvtd9DFUChGDhFeoB2rqJf
9uI3RJBBYTDkeNu3kpfy5uMh2XhvbIZntK5kwpJ4VettZWFMaOAzn7KNqk8iT4gJ
ASMrUrX59X0TAN0MgpJJm7uGtKbKZOu4lHNm74TUxH7V7bYn7dk=
=TlcN
-----END PGP SIGNATURE-----
Merge tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons, avoid
null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY7X/4wAKCRDbK58LschI
g7gzAQCjKsLtAWg1OplW+B7pvEPwkQ8g3O1+PYWlToCUACTlzQD+PEMrqGnxB573
oQAk6I2yOTwLgvlHkrm+TIdKSouI4gs=
=2hUY
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next 2023-01-04
We've added 45 non-merge commits during the last 21 day(s) which contain
a total of 50 files changed, 1454 insertions(+), 375 deletions(-).
The main changes are:
1) Fixes, improvements and refactoring of parts of BPF verifier's
state equivalence checks, from Andrii Nakryiko.
2) Fix a few corner cases in libbpf's BTF-to-C converter in particular
around padding handling and enums, also from Andrii Nakryiko.
3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better
support decap on GRE tunnel devices not operating in collect metadata,
from Christian Ehrig.
4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks,
from Dave Marchevsky.
5) Remove the need for trace_printk_lock for bpf_trace_printk
and bpf_trace_vprintk helpers, from Jiri Olsa.
6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps,
from Maryam Tahhan.
7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du.
8) Bigger batch of improvements to BPF tracing code samples,
from Daniel T. Lee.
9) Add LoongArch support to libbpf's bpf_tracing helper header,
from Hengqi Chen.
10) Fix a libbpf compiler warning in perf_event_open_probe on arm32,
from Khem Raj.
11) Optimize bpf_local_storage_elem by removing 56 bytes of padding,
from Martin KaFai Lau.
12) Use pkg-config to locate libelf for resolve_btfids build,
from Shen Jiamin.
13) Various libbpf improvements around API documentation and errno
handling, from Xin Liu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
libbpf: Return -ENODATA for missing btf section
libbpf: Add LoongArch support to bpf_tracing.h
libbpf: Restore errno after pr_warn.
libbpf: Added the description of some API functions
libbpf: Fix invalid return address register in s390
samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs
samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro
samples/bpf: Change _kern suffix to .bpf with syscall tracing program
samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program
samples/bpf: Use kyscall instead of kprobe in syscall tracing program
bpf: rename list_head -> graph_root in field info types
libbpf: fix errno is overwritten after being closed.
bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
bpf: perform byte-by-byte comparison only when necessary in regsafe()
bpf: reject non-exact register type matches in regsafe()
bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
bpf: reorganize struct bpf_reg_state fields
bpf: teach refsafe() to take into account ID remapping
bpf: Remove unused field initialization in bpf's ctl_table
selftests/bpf: Add jit probe_mem corner case tests to s390x denylist
...
====================
Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmO17/QACgkQ6rmadz2v
bTpFWBAAk85+qkRYyagEoAywkxPvRye6oYu321v69YjS7PdxqQ+sKSMyLR5brEaW
QiQSQP/elwcBRyQAf4L/Fe++TZndpvuZ6CB+KUidOowtarxXPllc/TgWeNgF7A5l
TuX3eQSLNdF5itdzoJzmiEQuBCHHXPce2L6UfNlP897yaWgxhawlkVEW+w8iB/UQ
ZQCNGoDpga917DheLLx2kId046v70CUgwm8CU1rcAeA7Zi7B518OCzLm9z37l2em
UWxPLOuQVA5u4O635dTRVUyluRPOvF1pjDei2gCmvC2JJS0X5eZbEEXCsZ04UbLd
pd3y0+2WVOjiJ6kaSg6/U47G0gAjHfUHTLAoPzDw4LvlaXzyWZxK8Kxr1wHZRiYX
q/iW1rUaWhsUI8XqROUnL3TA9UbFoKvSQIIQ8JEQN9jcd6enfMuBkwEDmZZIL9rR
rZp3qZj+sxqGcKRpxorW5buM4vslYuoUD8/+9Lc+hkQ7QMCeDqdNzKEWtSCo3qjT
ktbmZpNjCNnDhjyFZGuPsPxxK8I4QnuxGGeJvZeLZFlQq0Ft82APqFKzctdNAwFI
NER4ihF0ohXdACrdL4PwaNXY10MCboHrQwpUy7vd2PUOAH+VI8nudSuweVXmeYPf
ldH3w2N0G/wEy0QOvEo7SUEx3DiGTUNzHvu/ts4fDwUBqrBrzEM=
=vD3D
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
bpf 2023-01-04
We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 5 files changed, 112 insertions(+), 18 deletions(-).
The main changes are:
1) Always use maximal size for copy_array in the verifier to fix
KASAN tracking, from Kees.
2) Fix bpf task iterator walking through dead tasks, from Kui-Feng.
3) Make sure livepatch and bpf fexit can coexist, from Chuang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Always use maximal size for copy_array()
selftests/bpf: add a test for iter/task_vma for short-lived processes
bpf: keep a reference to the mm, in case the task is dead.
selftests/bpf: Temporarily disable part of btf_dump:var_data test.
bpf: Fix panic due to wrong pageattr of im->image
====================
Link: https://lore.kernel.org/r/20230104215500.79435-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CXL is using tracepoints for reporting RAS capability register payloads
for AER events, and has plans to use tracepoints for the output payload
of Get Poison List and Get Event Records commands. For organization
purposes it would be nice to keep those all under a single + local CXL
trace system. This also organization also potentially helps in the
future when CXL drivers expand beyond generic memory expanders, however
that would also entail a move away from the expander-specific
cxl_dev_state context, save that for later.
Note that the powerpc-specific drivers/misc/cxl/ also defines a 'cxl'
trace system, however, it is unlikely that a single platform will ever
load both drivers simultaneously.
Cc: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167051869176.436579.9728373544811641087.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Commit cf4694be2b ("tools: Add atomic_test_and_set_bit()") changed
tools/arch/x86/include/asm/atomic.h to include <asm/asm.h>, which causes
'make -C tools/testing/memblock' to fail with:
In file included from ../../include/asm/atomic.h:6,
from ../../include/linux/atomic.h:5,
from ./linux/mmzone.h:5,
from ../../include/linux/mm.h:5,
from ../../include/linux/pfn.h:5,
from ./linux/memory_hotplug.h:6,
from ./linux/init.h:7,
from ./linux/memblock.h:11,
from tests/common.h:8,
from tests/basic_api.h:5,
from main.c:2:
../../include/asm/../../arch/x86/include/asm/atomic.h:11:10: fatal error: asm/asm.h: No such file or directory
11 | #include <asm/asm.h>
| ^~~~~~~~~~~
Create a symlink to asm/asm.h in the same manner as the existing one to
asm/cmpxchg.h.
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/010101857c402765-96e2dbc6-b82b-47e2-a437-4834dbe0b96b-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
This commit upgrades the kvm.sh script's --kconfig parameter to accept
string-valued Kconfig options with double-quoted string values.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, the presence of any quoted-string Kconfig option in the
scenario files or the CFcommon file (aside from the special-cased
CONFIG_INITRAMFS_SOURCE option) will result in an "improperly set"
diagnostic. This commit updates configcheck.sh to strip double quotes
in order to permit string-valued Kconfig options to be handled correctly.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The latest version of grep is deprecating the egrep command, so that
its output contains warnings as follows:
egrep: warning: egrep is obsolescent; using grep -E
Fix this using "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/rcutorture`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Under some conditions, a given run's vmlinux file will be compressed,
so that it is named vmlinux.xz rather than vmlinux. in such cases,
kvm-find-errors.sh will complain about the nonexistence of vmlinux.
This commit therefore causes kvm-find-errors.sh to check for vmlinux.xz
as well as for vmlinux.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Add more coverage to our standard test cases:
- 8kHz mono and stereo to verify that the most common mono format is
clocked correctly.
- 44.1kHz stereo to verify that this different clock base is generated
accurately.
- 48kHz 6 channel to verify that 6 channel is clocked correctly.
- 96kHz stereo since that is a common audiophile rate.
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-alsa-pcm-test-hacks-v4-7-5a152e65b1e1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Help people understand what the standard tests are trying to cover by
providing descriptions which both serve as comments in the file and log
messages in the program's output.
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-alsa-pcm-test-hacks-v4-6-5a152e65b1e1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In order to help with the comprehensibility of tests it is useful for us to
document what the test is attempting to cover. We could just do this through
comments in the configuration files but in order to aid people looking at
the output of the program in logs let's provide support for an optional
'description' directive which we log prior to running each of the tests.
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-alsa-pcm-test-hacks-v4-5-5a152e65b1e1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since we don't know what the capabilities of an unknown card is any of our
standard tests may fail due to not being supported by the system. Set a
flag once we've configured the stream, just before we start data, to say
that the system accepted our stream configuration.
Since there shouldn't be a use case for tests that are specified for the
individual system failing for those tests we also add a new test which
fails if we are unable to configure the settings specified in the system
specific configuration file.
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-alsa-pcm-test-hacks-v4-4-5a152e65b1e1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Rather than allowing the system specific tests to replace the default set
of tests instead run them in addition to the standard set of tests. In
order to avoid name collisions in the reported tests we add an additional
test class element to the reported tests. Doing this means we always get a
consistent baseline of coverage no matter what card we run on but retain
the ability to specify specific coverage for a given system.
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-alsa-pcm-test-hacks-v4-3-5a152e65b1e1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Obtain all test parameters from the configuration files. The defaults
are defined in the pcm-test.conf file. The test count and parameters
may be variable per specific hardware.
Also, handle alt_formats field now (with the fixes in the format loop).
It replaces the original "automatic" logic which is not so universal.
The code may be further extended to skip various tests based
on the configuration hints, if the exact PCM hardware parameters
are not available for the given hardware.
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-alsa-pcm-test-hacks-v4-2-5a152e65b1e1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In preparation to adopting a better, more comprehensive approach to
adding the coverage that was just added using some changes from Jaroslav
which were sent at the same time the recently added improvements were
being applied drop what was applied. This reverts:
7d721baea1 "kselftest/alsa: Add more coverage of sample rates and channel counts"
ee12040dd5 "kselftest/alsa: Provide more meaningful names for tests"
ae95efd975 "kselftest/alsa: Don't any configuration in the sample config"
8370d9b00c Revert "kselftest/alsa: Report failures to set the requested channels as skips"
f944f8b539 "kselftest/alsa: Report failures to set the requested sample rate as skips"
22eeb8f531 "kselftest/alsa: Refactor pcm-test to list the tests to run in a struct"
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-alsa-pcm-test-hacks-v4-1-5a152e65b1e1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Return non-zero return value if there is any failure reported in this
script during the test. Otherwise it can only reflect the status of
the last command.
Fixes: f86ca07eb5 ("selftests: net: add arp_ndisc_evict_nocarrier")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cleanup_v6() will cause the arp_ndisc_evict_nocarrier script exit
with 255 (No such file or directory), even the tests are good:
# selftests: net: arp_ndisc_evict_nocarrier.sh
# run arp_evict_nocarrier=1 test
# RTNETLINK answers: File exists
# ok
# run arp_evict_nocarrier=0 test
# RTNETLINK answers: File exists
# ok
# run all.arp_evict_nocarrier=0 test
# RTNETLINK answers: File exists
# ok
# run ndisc_evict_nocarrier=1 test
# ok
# run ndisc_evict_nocarrier=0 test
# ok
# run all.ndisc_evict_nocarrier=0 test
# ok
not ok 1 selftests: net: arp_ndisc_evict_nocarrier.sh # exit=255
This is because it's trying to modify the parameter for ipv4 instead.
Also, tests for ipv6 (run_ndisc_evict_nocarrier_enabled() and
run_ndisc_evict_nocarrier_disabled() are working on veth1, reflect
this fact in cleanup_v6().
Fixes: f86ca07eb5 ("selftests: net: add arp_ndisc_evict_nocarrier")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This cmsg_so_mark.sh test will hang on non-amd64 systems because of the
infinity loop for argument parsing in cmsg_sender.
Variable "o" in cs_parse_args() for taking getopt() should be an int,
otherwise it will be 255 when getopt() returns -1 on non-amd64 system
and thus causing infinity loop.
Link: https://lore.kernel.org/lkml/CA+G9fYsM2k7mrF7W4V_TrZ-qDauWM394=8yEJ=-t1oUg8_40YA@mail.gmail.com/t/
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bhash2 split the bind() validation logic into wildcard and non-wildcard
cases. Let's add a test to catch future regression.
Before the previous patch:
# ./bind_timewait
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN bind_timewait.localhost.1 ...
# bind_timewait.c:87:1:Expected ret (0) == -1 (-1)
# 1: Test terminated by assertion
# FAIL bind_timewait.localhost.1
not ok 1 bind_timewait.localhost.1
# RUN bind_timewait.addrany.1 ...
# OK bind_timewait.addrany.1
ok 2 bind_timewait.addrany.1
# FAILED: 1 / 2 tests passed.
# Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0
After:
# ./bind_timewait
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN bind_timewait.localhost.1 ...
# OK bind_timewait.localhost.1
ok 1 bind_timewait.localhost.1
# RUN bind_timewait.addrany.1 ...
# OK bind_timewait.addrany.1
ok 2 bind_timewait.addrany.1
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
x86:
* Change tdp_mmu to a read-only parameter
* Separate TDP and shadow MMU page fault paths
* Enable Hyper-V invariant TSC control
selftests:
* Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a test for the newly introduced Hyper-V invariant TSC control feature:
- HV_X64_MSR_TSC_INVARIANT_CONTROL is not available without
HV_ACCESS_TSC_INVARIANT CPUID bit set and available with it.
- BIT(0) of HV_X64_MSR_TSC_INVARIANT_CONTROL controls the filtering of
architectural invariant TSC (CPUID.80000007H:EDX[8]) bit.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013095849.705943-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enhance 'hyperv_features' selftest by adding a check that KVM
preserves values written to PV MSRs. Two MSRs are, however, 'special':
- HV_X64_MSR_EOI as it is a 'write-only' MSR,
- HV_X64_MSR_RESET as it always reads as '0'.
The later doesn't require any special handling right now because the
test never writes anything besides '0' to the MSR, leave a TODO node
about the fact.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013095849.705943-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
hyperv_features test needs to set certain CPUID bits in Hyper-V feature
leaves but instead of open coding this, common KVM_X86_CPU_FEATURE()
infrastructure can be used.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013095849.705943-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It may not be clear what 'msr->available' means. The test actually
checks that accessing the particular MSR doesn't cause #GP, rename
the variable accordingly.
While on it, use 'true'/'false' instead of '1'/'0' for 'write'/
'fault_expected' as these are boolean.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013095849.705943-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's add some output here so that the user has some feedback
about what is being run.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20221004093131.40392-4-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The kvm_binary_stats_test test currently does not have any output (unless
one of the TEST_ASSERT statement fails), so it's hard to say for a user
how far it did proceed already. Thus let's make this a little bit more
user-friendly and include some TAP output via the kselftest.h interface.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Message-Id: <20221004093131.40392-2-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a task iterator traverses vma(s), it is possible task->mm might
become invalid in the middle of traversal and this may cause kernel
misbehave (e.g., crash)
This test case creates iterators repeatedly and forks short-lived
processes in the background to detect this bug. The test will last
for 3 seconds to get the chance to trigger the issue.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221216221855.4122288-3-kuifeng@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 7443b296e6 ("x86/percpu: Move cpu_number next to current_task")
moved global per_cpu variable 'cpu_number' into pcpu_hot structure.
Therefore this part of var_data test is no longer valid.
Disable it until better solution is found.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
x86:
* several fixes to nested VMX execution controls
* fixes and clarification to the documentation for Xen emulation
* do not unnecessarily release a pmu event with zero period
* MMU fixes
* fix Coverity warning in kvm_hv_flush_tlb()
selftests:
* fixes for the ucall mechanism in selftests
* other fixes mostly related to compilation with clang
Commit 8fda37cf3d ("KVM: selftests: Stuff RAX/RCX with 'safe' values
in vmmcall()/vmcall()", 2022-11-21) broke the svm_nested_soft_inject_test
because it placed a "pop rbp" instruction after vmmcall. While this is
correct and mimics what is done in the VMX case, this particular test
expects a ud2 instruction right after the vmmcall, so that it can skip
over it in the L1 part of the test.
Inline a suitably-modified version of vmmcall() to restore the
functionality of the test.
Fixes: 8fda37cf3d ("KVM: selftests: Stuff RAX/RCX with 'safe' values in vmmcall()/vmcall()"
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221130181147.9911-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While KVM_XEN_EVTCHN_RESET is usually called with no vCPUs running,
if that happened it could cause a deadlock. This is due to
kvm_xen_eventfd_reset() doing a synchronize_srcu() inside
a kvm->lock critical section.
To avoid this, first collect all the evtchnfd objects in an
array and free all of them once the kvm->lock critical section
is over and th SRCU grace period has expired.
Reported-by: Michal Luczaj <mhal@rbox.co>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adapt to the rseq.h API changes introduced by commits
"selftests/rseq: <arch>: Template memory ordering and percpu access mode".
Build a new param_test_mm_cid, param_test_mm_cid_benchmark, and
param_test_mm_cid_compare_twice executables to test the new "mm_cid"
rseq field.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-20-mathieu.desnoyers@efficios.com
Adapt to the rseq.h API changes introduced by commits
"selftests/rseq: <arch>: Template memory ordering and percpu access mode".
Build a new basic_percpu_ops_mm_cid_test to test the new "mm_cid" rseq
field.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-19-mathieu.desnoyers@efficios.com
Introduce a rseq-riscv-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-18-mathieu.desnoyers@efficios.com
Introduce a rseq-s390-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-17-mathieu.desnoyers@efficios.com
Introduce a rseq-ppc-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-16-mathieu.desnoyers@efficios.com
Introduce a rseq-mips-bits.h template header which is internally
included to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-15-mathieu.desnoyers@efficios.com
Introduce a rseq-arm64-bits.h template header which is internally
included to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-14-mathieu.desnoyers@efficios.com
Introduce a rseq-arm-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-13-mathieu.desnoyers@efficios.com
Introduce a rseq-x86-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
This introduces changes to the rseq.h selftests API which require to
update the rseq selftest programs. Similar API/templating changes need
to be done for other architectures.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-12-mathieu.desnoyers@efficios.com
This code is not currently build by the test Makefile, adds complexity,
and is not overall useful considering that the abort handling loops to
retry the fast-path.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-10-mathieu.desnoyers@efficios.com
Test the NUMA node id extension rseq field. Compare it against the value
returned by the getcpu(2) system call while pinned on a specific core.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-7-mathieu.desnoyers@efficios.com
When linking the selftests against a libc which does not handle rseq
registration (before 2.35), rseq thread registration silently succeed
even with CONFIG_RSEQ=n because it erroneously thinks that libc is
handling rseq registration.
This is caused by setting the rseq ownership flag only after the
rseq_available() check. It should rather be set before the
rseq_available() check.
Set the rseq_size to 0 (error value) immediately after the
rseq_available() check fails rather than in the thread registration
functions.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-2-mathieu.desnoyers@efficios.com
The loop marks vaddr as mapped after incrementing it by page size,
thereby marking the *next* page as mapped. Set the bit in vpages_mapped
first instead.
Fixes: 56fc773203 ("KVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221209015307.1781352-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently the ucall MMIO hole is placed immediately after slot0, which
is a relatively safe address in the PA space. However, it is possible
that the same address has already been used for something else (like the
guest program image) in the VA space. At least in my own testing,
building the vgic_irq test with clang leads to the MMIO hole appearing
underneath gicv3_ops.
Stop identity mapping the MMIO hole and instead find an unused VA to map
to it. Yet another subtle detail of the KVM selftests library is that
virt_pg_map() does not update vm->vpages_mapped. Switch over to
virt_map() instead to guarantee that the chosen VA isn't to something
else.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221209015307.1781352-6-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explain the meaning of the bit manipulations of vm_vaddr_populate_bitmap.
These correspond to the "canonical addresses" of x86 and other
architectures, but that is not obvious.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a magic value to signal a ucall_alloc() failure instead of simply
doing GUEST_ASSERT(). GUEST_ASSERT() relies on ucall_alloc() and so a
failure puts the guest into an infinite loop.
Use -1 as the magic value, as a real ucall struct should never wrap.
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Disable gnu-variable-sized-type-not-at-end so that tests and libraries
can create overlays of variable sized arrays at the end of structs when
using a fixed number of entries, e.g. to get/set a single MSR.
It's possible to fudge around the warning, e.g. by defining a custom
struct that hardcodes the number of entries, but that is a burden for
both developers and readers of the code.
lib/x86_64/processor.c:664:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs header;
^
lib/x86_64/processor.c:772:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs header;
^
lib/x86_64/processor.c:787:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs header;
^
3 warnings generated.
x86_64/hyperv_tlb_flush.c:54:18: warning: field 'hv_vp_set' with variable sized type 'struct hv_vpset'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct hv_vpset hv_vp_set;
^
1 warning generated.
x86_64/xen_shinfo_test.c:137:25: warning: field 'info' with variable sized type 'struct kvm_irq_routing'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_irq_routing info;
^
1 warning generated.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Include lib.mk before consuming $(CC) and document that lib.mk overwrites
$(CC) unless make was invoked with -e or $(CC) was specified after make
(which makes the environment override the Makefile). Including lib.mk
after using it for probing, e.g. for -no-pie, can lead to weirdness.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly disable the compiler's builtin memcmp(), memcpy(), and
memset(). Because only lib/string_override.c is built with -ffreestanding,
the compiler reserves the right to do what it wants and can try to link the
non-freestanding code to its own crud.
/usr/bin/x86_64-linux-gnu-ld: /lib/x86_64-linux-gnu/libc.a(memcmp.o): in function `memcmp_ifunc':
(.text+0x0): multiple definition of `memcmp'; tools/testing/selftests/kvm/lib/string_override.o:
tools/testing/selftests/kvm/lib/string_override.c:15: first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 6b6f71484b ("KVM: selftests: Implement memcmp(), memcpy(), and memset() for guest use")
Reported-by: Aaron Lewis <aaronlewis@google.com>
Reported-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Probe -no-pie with the actual set of CFLAGS used to compile the tests,
clang whines about -no-pie being unused if the tests are compiled with
-static.
clang: warning: argument unused during compilation: '-no-pie'
[-Wunused-command-line-argument]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make the main() functions in the probing code proper prototypes so that
compiling the probing code with more strict flags won't generate false
negatives.
<stdin>:1:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-8-seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename UNAME_M to ARCH_DIR and explicitly set it directly for x86. At
this point, the name of the arch directory really doesn't have anything
to do with `uname -m`, and UNAME_M is unnecessarily confusing given that
its purpose is purely to identify the arch specific directory.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix a == vs. = typo in kvm_get_cpu_address_width() that results in
@pa_bits being left unset if the CPU doesn't support enumerating its
MAX_PHY_ADDR. Flagged by clang's unusued-value warning.
lib/x86_64/processor.c:1034:51: warning: expression result unused [-Wunused-value]
*pa_bits == kvm_cpu_has(X86_FEATURE_PAE) ? 36 : 32;
Fixes: 3bd396353d ("KVM: selftests: Add X86_FEATURE_PAE and use it calc "fallback" MAXPHYADDR")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213001653.3852042-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use pattern matching to exclude everything except .c, .h, .S, and .sh
files from Git. Manually adding every test target has an absurd
maintenance cost, is comically error prone, and leads to bikeshedding
over whether or not the targets should be listed in alphabetical order.
Deliberately do not include the one-off assets, e.g. config, settings,
.gitignore itself, etc as Git doesn't ignore files that are already in
the repository. Adding the one-off assets won't prevent mistakes where
developers forget to --force add files that don't match the "allowed".
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check that the number of pages per slot is non-zero in get_max_slots()
prior to computing the remaining number of pages. clang generates code
that uses an actual DIV for calculating the remaining, which causes a #DE
if the total number of pages is less than the number of slots.
traps: memslot_perf_te[97611] trap divide error ip:4030c4 sp:7ffd18ae58f0
error:0 in memslot_perf_test[401000+cb000]
Fixes: a69170c65a ("KVM: selftests: memslot_perf_test: Report optimal memory slots")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213001653.3852042-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Delete an unused struct definition in x86_64/vmx_tsc_adjust_test.c.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221213001653.3852042-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define a literal '0' asm input constraint to aarch64/page_fault_test's
guest_cas() as an unsigned long to make clang happy.
tools/testing/selftests/kvm/aarch64/page_fault_test.c:120:16: error:
value size does not match register size specified by the constraint
and modifier [-Werror,-Wasm-operand-widths]
:: "r" (0), "r" (TEST_DATA), "r" (guest_test_memory));
^
tools/testing/selftests/kvm/aarch64/page_fault_test.c:119:15: note:
use constraint modifier "w"
"casal %0, %1, [%2]\n"
^~
%w0
Fixes: 35c5810157 ("KVM: selftests: aarch64: Add aarch64/page_fault_test")
Cc: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213001653.3852042-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY6YkXgAKCRDbK58LschI
g25kAP4jYi+YomSlmGUzN/fUbEIHkXXyh85Yh2/yHGYdVuIuvwEA0uXeC7JHQTca
dkcyYvgY6zJwFBV0lAVnhTRzFirFkQk=
=THs1
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 231 insertions(+), 3 deletions(-).
The main changes are:
1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to
misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case
from Martin Lau.
2) Fix BPF verifier's nullness propagation when registers are of
type PTR_TO_BTF_ID, from Hao Sun.
3) Fix bpftool build for JIT disassembler under statically built
libllvm, from Anton Protopopov.
4) Fix warnings reported by resolve_btfids when building vmlinux
with CONFIG_SECURITY_NETWORK disabled, from Hou Tao.
5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shows up when cross-compiling:
HOST_SCRATCH_DIR := $(OUTPUT)/host-tools
vs
SCRATCH_DIR := $(OUTPUT)/tools
HOST_SCRATCH_DIR := $(SCRATCH_DIR)
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221222213958.2302320-1-sdf@google.com
Zero out the valid_bank_mask when using the fast variant of
HVCALL_SEND_IPI_EX to send IPIs to all vCPUs. KVM requires the "var_cnt"
and "valid_bank_mask" inputs to be consistent even when targeting all
vCPUs. See commit bd1ba5732b ("KVM: x86: Get the number of Hyper-V
sparse banks from the VARHEAD field").
Fixes: 998489245d ("KVM: selftests: Hyper-V PV IPI selftest")
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221219220416.395329-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Verify that nullness information is not porpagated in the branches
of register to register JEQ and JNE operations if one of them is
PTR_TO_BTF_ID. Implement this in C level so we can use CO-RE.
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221222024414.29539-2-sunhao.th@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
BPF CI fails for s390x with the following result:
[...]
All error logs:
libbpf: prog 'test_jit_probe_mem': BPF program load failed: ERROR: strerror_r(524)=22
libbpf: prog 'test_jit_probe_mem':
-- BEGIN PROG LOAD LOG --
JIT does not support calling kernel function
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'test_jit_probe_mem': failed to load: -524
libbpf: failed to load object 'jit_probe_mem'
libbpf: failed to load BPF skeleton 'jit_probe_mem': -524
test_jit_probe_mem:FAIL:jit_probe_mem__open_and_load unexpected error: -524
#89 jit_probe_mem:FAIL
[...]
Add the test to the deny list.
Fixes: 59fe41b525 ("selftests/bpf: Add verifier test exercising jit PROBE_MEM logic")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- New "symstr" type for dynamic events that writes the name of the
function+offset into the ring buffer and not just the address
- Prevent kernel symbol processing on addresses in user space probes
(uprobes).
- And minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5yAHxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qoWoAP9ZLmqgIqlH3Zcms31SR250kLXxsxT3
JHe82hiuI1I3fAD/Z93QLHw9wngLqIMx/wXsdFjTNOGGWdxfclSWI2qI6Q0=
=KaJg
-----END PGP SIGNATURE-----
Merge tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace probes updates from Steven Rostedt:
- New "symstr" type for dynamic events that writes the name of the
function+offset into the ring buffer and not just the address
- Prevent kernel symbol processing on addresses in user space probes
(uprobes).
- And minor fixes and clean ups
* tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Reject symbol/symstr type for uprobe
tracing/probes: Add symstr type for dynamic events
kprobes: kretprobe events missing on 2-core KVM guest
kprobes: Fix check for probe enabled in kill_kprobe()
test_kprobes: Fix implicit declaration error of test_kprobes
tracing: Fix race where eprobes can be called before the event
When the bpf_skb_adjust_room() shrinks the skb such that its csum_start
is invalid, the skb->ip_summed should be reset from CHECKSUM_PARTIAL to
CHECKSUM_NONE.
The commit 54c3f1a814 ("bpf: pull before calling skb_postpull_rcsum()")
fixed it.
This patch adds a test to ensure the skb->ip_summed changed from
CHECKSUM_PARTIAL to CHECKSUM_NONE after bpf_skb_adjust_room().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221221185653.1589961-1-martin.lau@linux.dev
This patch adds a test exercising logic that was fixed / improved in
the previous patch in the series, as well as general sanity checking for
jit's PROBE_MEM logic which should've been unaffected by the previous
patch.
The added verifier test does the following:
* Acquire a referenced kptr to struct prog_test_ref_kfunc using
existing net/bpf/test_run.c kfunc
* Helper returns ptr to a specific prog_test_ref_kfunc whose first
two fields - both ints - have been prepopulated w/ vals 42 and
108, respectively
* kptr_xchg the acquired ptr into an arraymap
* Do a direct map_value load of the just-added ptr
* Goal of all this setup is to get an unreferenced kptr pointing to
struct with ints of known value, which is the result of this step
* Using unreferenced kptr obtained in previous step, do loads of
prog_test_ref_kfunc.a (offset 0) and .b (offset 4)
* Then incr the kptr by 8 and load prog_test_ref_kfunc.a again (this
time at offset -8)
* Add all the loaded ints together and return
Before the PROBE_MEM fixes in previous patch, the loads at offset 0 and
4 would succeed, while the load at offset -8 would incorrectly fail
runtime check emitted by the JIT and 0 out dst reg as a result. This
confirmed by retval of 150 for this test before previous patch - since
second .a read is 0'd out - and a retval of 192 with the fixed logic.
The test exercises the two optimizations to fixed logic added in last
patch as well:
* First load, with insn "r8 = *(u32 *)(r9 + 0)" exercises "insn->off
is 0, no need to add / sub from src_reg" optimization
* Third load, with insn "r9 = *(u32 *)(r9 - 8)" exercises "src_reg ==
dst_reg, no need to restore src_reg after load" optimization
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221216214319.3408356-2-davemarchevsky@fb.com
Remove the empty vmlinux.h if bpftool failed to dump btf info.
The empty vmlinux.h can hide real error when reading output
of make.
This is done by adding .DELETE_ON_ERROR special target in related
makefiles.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20221217223509.88254-3-changbin.du@gmail.com
Here are 2 small updates for LICENSES and some kernel files that add the
Copyleft-next license and use it in a SPDX tag as a dual-license for
some kernel files.
These have been discussed thoroughly in public on the linux-spdx mailing
list, and have the needed acks on them, as well as having been in
linux-next with no reported issues for quite some time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY6F1Qg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynGWwCfVJ+Z1CVWSFC8KaaGNiFu/gXmgNUAoKy11gWJ
8igpSNEkOiGiaGA+AvN+
=j8iu
-----END PGP SIGNATURE-----
Merge tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX/License additions from Greg KH:
"Here are two small updates for LICENSES and some kernel files that add
the Copyleft-next license and use it in a SPDX tag as a dual-license
for some kernel files.
These have been discussed thoroughly in public on the linux-spdx
mailing list, and have the needed acks on them, as well as having been
in linux-next with no reported issues for quite some time"
* tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
testing: use the copyleft-next-0.3.1 SPDX tag
LICENSES: Add the copyleft-next-0.3.1 license
This patch adds a selftest simulating a GRE sender and receiver using
tunnel headers without tunnel keys. It validates if packets encapsulated
using BPF_F_NO_TUNNEL_KEY are decapsulated by a GRE receiver not
configured with tunnel keys.
Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221218051734.31411-2-cehrig@cloudflare.com
- Add powerpc qspinlock implementation optimised for large system scalability and
paravirt. See the merge message for more details.
- Enable objtool to be built on powerpc to generate mcount locations.
- Use a temporary mm for code patching with the Radix MMU, so the writable mapping is
restricted to the patching CPU.
- Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI.
- Sanitise user registers on interrupt entry on 64-bit Book3S.
- Many other small features and fixes.
Thanks to: Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen
Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin
Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven,
Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain,
Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao,
Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure,
Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang
Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, Wolfram Sang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmOfrj8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIWtD/9mGF/ze2k+qFTo+30fb7bO8WJIDgsR
dIASnZjXV7q/45elvymhUdkQv4R7xL3pzC40P1+ZKtWzGTNe+zWUQLoALNwRK85j
8CsxZbqefGNKE5Z6ZHo9s37wsu3+jJu9yEQpGFo1LINyzeclCn5St5oqfRam+Hd/
cPF+VfvREwZ0+YOKGBhJ2EgC+Gc9xsFY7DLQsoYlu71iZZr6Z6rgZW/EY5h3RMGS
YKBoVwDsWaU0FpFWrr/rYTI6DqSr3AHr1+ftDg7ncCZMD6vQva6aMCCt94aLB1aE
vC+DNdhZlA558bXGa5yA7Wr//7aUBUIwyC60DogOeZ6vw3kD9tdEd1fbH5hmqNKY
K5bfqm28XU2959CTE8RDgsYYZvwDcfrjBIML14WZGdCQOTcGKpgOGp22o6yNb1Pq
JKpHHnVpvu2PZ/p2XdKSm9+etr2yI6lXZAEVTS7ehdtMukButjSHEVbSCEZ8tlWz
KokQt2J23BMHuSrXK6+67wWQBtdsLEk+LBOQmweiwarMocqvL/Zjz/5J7DR2DtH8
wlY3wOtB1+E5j7xZ+RgK3c3jNg5dH39ZwvFsSATWTI3P+iq6OK/bbk4q4LmZt2l9
ZIfH/CXPf9BvGCHzHa3AAd3UBbJLFwj17btMEv1wFVPS0T4LPUzkgTNTNUYeP6zL
h1e5QfgUxvKPuQ==
=7k3p
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add powerpc qspinlock implementation optimised for large system
scalability and paravirt. See the merge message for more details
- Enable objtool to be built on powerpc to generate mcount locations
- Use a temporary mm for code patching with the Radix MMU, so the
writable mapping is restricted to the patching CPU
- Add an option to build the 64-bit big-endian kernel with the ELFv2
ABI
- Sanitise user registers on interrupt entry on 64-bit Book3S
- Many other small features and fixes
Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.
* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
powerpc/code-patching: Fix oops with DEBUG_VM enabled
powerpc/qspinlock: Fix 32-bit build
powerpc/prom: Fix 32-bit build
powerpc/rtas: mandate RTAS syscall filtering
powerpc/rtas: define pr_fmt and convert printk call sites
powerpc/rtas: clean up includes
powerpc/rtas: clean up rtas_error_log_max initialization
powerpc/pseries/eeh: use correct API for error log size
powerpc/rtas: avoid scheduling in rtas_os_term()
powerpc/rtas: avoid device tree lookups in rtas_os_term()
powerpc/rtasd: use correct OF API for event scan rate
powerpc/rtas: document rtas_call()
powerpc/pseries: unregister VPA when hot unplugging a CPU
powerpc/pseries: reset the RCU watchdogs after a LPM
powerpc: Take in account addition CPU node when building kexec FDT
powerpc: export the CPU node count
powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
powerpc/dts/fsl: Fix pca954x i2c-mux node names
cxl: Remove unnecessary cxl_pci_window_alignment()
selftests/powerpc: Fix resource leaks
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY5yc9AAKCRDbK58LschI
g3n4AP4heagqBHPH+WcC0N/Vc4K2dmPmil4ZTuZ/Xt+EMrX0MQEAygWsa272V2C9
vx51DOu5/D+DPC20/1+mRpEnC4JIqAA=
=utn+
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2022-12-16
We've added 7 non-merge commits during the last 2 day(s) which contain
a total of 9 files changed, 119 insertions(+), 36 deletions(-).
1) Fix for recent syzkaller XDP dispatcher update splat, from Jiri Olsa.
2) Fix BPF program refcount leak in LSM attachment failure path,
from Milan Landaverde.
3) Fix BPF program type in map compatibility check for fext,
from Toke Høiland-Jørgensen.
4) Fix a BPF selftest compilation error under !CONFIG_SMP config,
from Yonghong Song.
5) Fix CI to enable CONFIG_FUNCTION_ERROR_INJECTION after it got changed
to a prompt, from Song Liu.
6) Various BPF documentation fixes for socket local storage,
from Donald Hunter.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program
bpf: Resolve fext program type when checking map compatibility
bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func
bpf: prevent leak of lsm program after failed attach
selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION
selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n
docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE
====================
Link: https://lore.kernel.org/r/20221216174540.16598-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NetworkManager (and other daemons) may bring the interface up
and cause failures in quiescence checks. Print a helpful warning,
and take the interface down again.
I seem to forget about this every time I run these tests on a new VM.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
$number + > bash means redirect FD $number, e.g. commonly
used 2> redirects stderr (fd 2). The test uses 8192> to
write the number 8192 to a file, this results in:
./devlink.sh: line 499: 8192: Bad file descriptor
Oddly the test also papers over this issue by checking
for failure (expecting an error rather than success)
so it passes, anyway.
Fixes: ff18176ad8 ("selftests: Add a test of large binary to devlink health test")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the span to the year of the development.
Link: https://lkml.kernel.org/r/20221025173709.2718725-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix bug in btf_dump's logic of determining if a given struct type is
packed or not. The notion of "natural alignment" is not needed and is
even harmful in this case, so drop it altogether. The biggest difference
in btf_is_struct_packed() compared to its original implementation is
that we don't really use btf__align_of() to determine overall alignment
of a struct type (because it could be 1 for both packed and non-packed
struct, depending on specifci field definitions), and just use field's
actual alignment to calculate whether any field is requiring packing or
struct's size overall necessitates packing.
Add two simple test cases that demonstrate the difference this change
would make.
Fixes: ea2ce1ba99 ("libbpf: Fix BTF-to-C converter's padding logic")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20221215183605.4149488-1-andrii@kernel.org
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
* Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
* Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d:
"Fix a number of issues with MTE, such as races on the tags being
initialised vs the PG_mte_tagged flag as well as the lack of support
for VM_SHARED when KVM is involved. Patches from Catalin Marinas and
Peter Collingbourne").
* Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
* Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
* Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
* Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
* Second batch of the lazy destroy patches
* First batch of KVM changes for kernel virtual != physical address support
* Removal of a unused function
x86:
* Allow compiling out SMM support
* Cleanup and documentation of SMM state save area format
* Preserve interrupt shadow in SMM state save area
* Respond to generic signals during slow page faults
* Fixes and optimizations for the non-executable huge page errata fix.
* Reprogram all performance counters on PMU filter change
* Cleanups to Hyper-V emulation and tests
* Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
running on top of a L1 Hyper-V hypervisor)
* Advertise several new Intel features
* x86 Xen-for-KVM:
** Allow the Xen runstate information to cross a page boundary
** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
** Add support for 32-bit guests in SCHEDOP_poll
* Notable x86 fixes and cleanups:
** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
** Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
** Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
** Advertise (on AMD) that the SMM_CTL MSR is not supported
** Remove unnecessary exports
Generic:
* Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
* Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
* Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
* Introduce actual atomics for clear/set_bit() in selftests
* Add support for pinning vCPUs in dirty_log_perf_test.
* Rename the so called "perf_util" framework to "memstress".
* Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress tests.
* Add a common ucall implementation; code dedup and pre-work for running
SEV (and beyond) guests in selftests.
* Provide a common constructor and arch hook, which will eventually be
used by x86 to automatically select the right hypercall (AMD vs. Intel).
* A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
breakpoints, stage-2 faults and access tracking.
* x86-specific selftest changes:
** Clean up x86's page table management.
** Clean up and enhance the "smaller maxphyaddr" test, and add a related
test to cover generic emulation failure.
** Clean up the nEPT support checks.
** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
** Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
Documentation:
* Remove deleted ioctls from documentation
* Clean up the docs for the x86 MSR filter.
* Various fixes
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
=QAYX
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on (see merge
commit 382b5b87a97d: "Fix a number of issues with MTE, such as
races on the tags being initialised vs the PG_mte_tagged flag as
well as the lack of support for VM_SHARED when KVM is involved.
Patches from Catalin Marinas and Peter Collingbourne").
- Merge the pKVM shadow vcpu state tracking that allows the
hypervisor to have its own view of a vcpu, keeping that state
private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB
pages only) as a prefix of the oncoming support for 4kB and 16kB
pages.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
- Second batch of the lazy destroy patches
- First batch of KVM changes for kernel virtual != physical address
support
- Removal of a unused function
x86:
- Allow compiling out SMM support
- Cleanup and documentation of SMM state save area format
- Preserve interrupt shadow in SMM state save area
- Respond to generic signals during slow page faults
- Fixes and optimizations for the non-executable huge page errata
fix.
- Reprogram all performance counters on PMU filter change
- Cleanups to Hyper-V emulation and tests
- Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
guest running on top of a L1 Hyper-V hypervisor)
- Advertise several new Intel features
- x86 Xen-for-KVM:
- Allow the Xen runstate information to cross a page boundary
- Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
- Add support for 32-bit guests in SCHEDOP_poll
- Notable x86 fixes and cleanups:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
a few years back when eliminating unnecessary barriers when
switching between vmcs01 and vmcs02.
- Clean up vmread_error_trampoline() to make it more obvious that
params must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL
irrespective of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM
incorrectly thinking a guest needs TSC scaling when running on a
CPU with a constant TSC, but no hardware-enumerated TSC
frequency.
- Advertise (on AMD) that the SMM_CTL MSR is not supported
- Remove unnecessary exports
Generic:
- Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
- Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
- Fix build errors that occur in certain setups (unsure exactly what
is unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
- Introduce actual atomics for clear/set_bit() in selftests
- Add support for pinning vCPUs in dirty_log_perf_test.
- Rename the so called "perf_util" framework to "memstress".
- Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress
tests.
- Add a common ucall implementation; code dedup and pre-work for
running SEV (and beyond) guests in selftests.
- Provide a common constructor and arch hook, which will eventually
be used by x86 to automatically select the right hypercall (AMD vs.
Intel).
- A bunch of added/enabled/fixed selftests for ARM64, covering
memslots, breakpoints, stage-2 faults and access tracking.
- x86-specific selftest changes:
- Clean up x86's page table management.
- Clean up and enhance the "smaller maxphyaddr" test, and add a
related test to cover generic emulation failure.
- Clean up the nEPT support checks.
- Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
- Fix an ordering issue in the AMX test introduced by recent
conversions to use kvm_cpu_has(), and harden the code to guard
against similar bugs in the future. Anything that tiggers
caching of KVM's supported CPUID, kvm_cpu_has() in this case,
effectively hides opt-in XSAVE features if the caching occurs
before the test opts in via prctl().
Documentation:
- Remove deleted ioctls from documentation
- Clean up the docs for the x86 MSR filter.
- Various fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
KVM: x86: Add proper ReST tables for userspace MSR exits/flags
KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
KVM: arm64: selftests: Align VA space allocator with TTBR0
KVM: arm64: Fix benign bug with incorrect use of VA_BITS
KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
KVM: x86: Advertise that the SMM_CTL MSR is not supported
KVM: x86: remove unnecessary exports
KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
perf tools: Use dedicated non-atomic clear/set bit helpers
tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
KVM: arm64: selftests: Enable single-step without a "full" ucall()
KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
KVM: Remove stale comment about KVM_REQ_UNHALT
KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
KVM: Reference to kvm_userspace_memory_region in doc and comments
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
...
This adds a simple test for inserting an XDP program into a cpumap that is
"owned" by an XDP program that was loaded as PROG_TYPE_EXT (as libxdp
does). Prior to the kernel fix this would fail because the map type
ownership would be set to PROG_TYPE_EXT instead of being resolved to
PROG_TYPE_XDP.
v5:
- Fix a few nits from Andrii, add his ACK
v4:
- Use skeletons for selftest
v3:
- Update comment to better explain the cause
- Add Yonghong's ACK
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221214230254.790066-2-toke@redhat.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Since uprobe's argument must contain the user-space data, that
should not be converted to kernel symbols. Reject if user
specifies these types on uprobe events. e.g.
/sys/kernel/debug/tracing # echo 'p /bin/sh:10 %ax:symbol' >> uprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # echo 'p /bin/sh:10 %ax:symstr' >> uprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 1783.134883] trace_uprobe: error: Unknown type is specified
Command: p /bin/sh:10 %ax:symbol
^
[ 1792.201120] trace_uprobe: error: Unknown type is specified
Command: p /bin/sh:10 %ax:symstr
^
Link: https://lore.kernel.org/all/166679931679.1528100.15540755370726009882.stgit@devnote3/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Turns out that btf_dump API doesn't handle a bunch of tricky corner
cases, as reported by Per, and further discovered using his testing
Python script ([0]).
This patch revamps btf_dump's padding logic significantly, making it
more correct and also avoiding unnecessary explicit padding, where
compiler would pad naturally. This overall topic turned out to be very
tricky and subtle, there are lots of subtle corner cases. The comments
in the code tries to give some clues, but comments themselves are
supposed to be paired with good understanding of C alignment and padding
rules. Plus some experimentation to figure out subtle things like
whether `long :0;` means that struct is now forced to be long-aligned
(no, it's not, turns out).
Anyways, Per's script, while not completely correct in some known
situations, doesn't show any obvious cases where this logic breaks, so
this is a nice improvement over the previous state of this logic.
Some selftests had to be adjusted to accommodate better use of natural
alignment rules, eliminating some unnecessary padding, or changing it to
`type: 0;` alignment markers.
Note also that for when we are in between bitfields, we emit explicit
bit size, while otherwise we use `: 0`, this feels much more natural in
practice.
Next patch will add few more test cases, found through randomized Per's
script.
[0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/
Reported-by: Per Sundström XP <per.xp.sundstrom@ericsson.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-6-andrii@kernel.org
* add tests that trigger reallocation of memblock structures from
memblock itself via memblock_double_array()
* add tests for memblock_alloc_exact_nid_raw() that verify that requested
node and memory range constraints are respected.
-----BEGIN PGP SIGNATURE-----
iQFMBAABCAA2FiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmOYL14YHG1pa2UucmFw
b3BvcnRAZ21haWwuY29tAAoJEDkDhibLDv2RZdcH/2AE447oXzVO2lzOgkqQH1EX
xJdaa7hu00h2Euzv2lgcOHroHGXDP8wYjUV2cEyNZMP0WOMiO8i6rwIKmrzWufcm
R+ZoKPQV/Nc+7rIycpW455yLxcgsVIpUILK2BQEkDCGYugSHKb7IYdcA9KDJwtmR
xIG9j8nsuwWJtmtAuQqNOBmsc5FzKNYFa/RtDiJoMFmQNK3UqB8G8VCASdP0DYvH
7MXPcyRmlwpmOsKoNKi2/wQBsiag8/PLgcZv5vYg+E6no1tMG6u7pgDS12Sn6ZvA
I8gThJ8HNAo0d1O2SnbkicMx2CqrPFSub3QXaEFjCZF5mdBcirxHc/VBKj50TXU=
=iXEA
-----END PGP SIGNATURE-----
Merge tag 'memblock-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
"Extend test coverage:
- add tests that trigger reallocation of memblock structures from
memblock itself via memblock_double_array()
- add tests for memblock_alloc_exact_nid_raw() that verify that
requested node and memory range constraints are respected"
* tag 'memblock-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: remove completed TODO item
memblock tests: add generic NUMA tests for memblock_alloc_exact_nid_raw
memblock tests: add bottom-up NUMA tests for memblock_alloc_exact_nid_raw
memblock tests: add top-down NUMA tests for memblock_alloc_exact_nid_raw
memblock tests: introduce range tests for memblock_alloc_exact_nid_raw
memblock test: Update TODO list
memblock test: Add test to memblock_reserve() 129th region
memblock test: Add test to memblock_add() 129th region
BPF selftests require CONFIG_FUNCTION_ERROR_INJECTION to work. However,
CONFIG_FUNCTION_ERROR_INJECTION is no longer 'y' by default after recent
changes. As a result, we are seeing errors like the following from BPF CI:
bpf_testmod_test_read() is not modifiable
__x64_sys_setdomainname is not sleepable
__x64_sys_getpgid is not sleepable
Fix this by explicitly selecting CONFIG_FUNCTION_ERROR_INJECTION in the
selftest config.
Fixes: a4412fdd49 ("error-injection: Add prompt for function error injection")
Reported-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/20221213220500.3427947-1-song@kernel.org
Kernel test robot reported bpf selftest build failure when CONFIG_SMP
is not set. The error message looks below:
>> progs/rcu_read_lock.c:256:34: error: no member named 'last_wakee' in 'struct task_struct'
last_wakee = task->real_parent->last_wakee;
~~~~~~~~~~~~~~~~~ ^
1 error generated.
When CONFIG_SMP is not set, the field 'last_wakee' is not available in struct
'task_struct'. Hence the above compilation failure. To fix the issue, let us
choose another field 'group_leader' which is available regardless of
CONFIG_SMP set or not.
Fixes: fe147956fc ("bpf/selftests: Add selftests for new task kfuncs")
Fixes: 48671232fc ("selftests/bpf: Add tests for bpf_rcu_read_lock()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221213012224.379581-1-yhs@fb.com
iommufd is the user API to control the IOMMU subsystem as it relates to
managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended features, some being highly IOMMU device
specific:
- Binding iommu_domain's to PASID/SSID
- Userspace IO page tables, for ARM, x86 and S390
- Kernel bypassed invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
Many of these HW features exist to support VM use cases - for instance the
combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a
guest. Dirty tracking enables VM live migration with SRIOV devices and
PASID support allow creating "scalable IOV" devices, among other things.
As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs, which
is currently VFIO and VDPA.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF
YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl
s7fAu+CQ1pr9+9NKGevD+frw8Solsw4=
=jJkd
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd implementation from Jason Gunthorpe:
"iommufd is the user API to control the IOMMU subsystem as it relates
to managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended features, some being highly IOMMU
device specific:
- Binding iommu_domain's to PASID/SSID
- Userspace IO page tables, for ARM, x86 and S390
- Kernel bypassed invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
Many of these HW features exist to support VM use cases - for instance
the combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
Dirty tracking enables VM live migration with SRIOV devices and PASID
support allow creating "scalable IOV" devices, among other things.
As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs,
which is currently VFIO and VDPA"
For more background, see the extended explanations in Jason's pull request:
https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
iommufd: Change the order of MSI setup
iommufd: Improve a few unclear bits of code
iommufd: Fix comment typos
vfio: Move vfio group specific code into group.c
vfio: Refactor dma APIs for emulated devices
vfio: Wrap vfio group module init/clean code into helpers
vfio: Refactor vfio_device open and close
vfio: Make vfio_device_open() truly device specific
vfio: Swap order of vfio_device_container_register() and open_device()
vfio: Set device->group in helper function
vfio: Create wrappers for group register/unregister
vfio: Move the sanity check of the group to vfio_create_group()
vfio: Simplify vfio_create_group()
iommufd: Allow iommufd to supply /dev/vfio/vfio
vfio: Make vfio_container optionally compiled
vfio: Move container related MODULE_ALIAS statements into container.c
vfio-iommufd: Support iommufd for emulated VFIO devices
vfio-iommufd: Support iommufd for physical VFIO devices
vfio-iommufd: Allow iommufd to be used in place of a container fd
vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
...
- More userfaultfs work from Peter Xu.
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
- Some filemap cleanups from Vishal Moola.
- David Hildenbrand added the ability to selftest anon memory COW handling.
- Some cpuset simplifications from Liu Shixin.
- Addition of vmalloc tracing support by Uladzislau Rezki.
- Some pagecache folioifications and simplifications from Matthew Wilcox.
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword. This series shold have been in the
non-MM tree, my bad.
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages.
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages.
- Peter Xu utilized the PTE marker code for handling swapin errors.
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient.
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand.
- zram support for multiple compression streams from Sergey Senozhatsky.
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway.
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations.
- Vishal Moola removed the try_to_release_page() wrapper.
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache.
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking.
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend.
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range().
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen.
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect.
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages().
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting.
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines.
- Many singleton patches, as usual.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
CodAgiA51qwzId3GRytIo/tfWZSezgA=
=d19R
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- More userfaultfs work from Peter Xu
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying
- Some filemap cleanups from Vishal Moola
- David Hildenbrand added the ability to selftest anon memory COW
handling
- Some cpuset simplifications from Liu Shixin
- Addition of vmalloc tracing support by Uladzislau Rezki
- Some pagecache folioifications and simplifications from Matthew
Wilcox
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
it
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword.
This series should have been in the non-MM tree, my bad
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages
- Peter Xu utilized the PTE marker code for handling swapin errors
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand
- zram support for multiple compression streams from Sergey Senozhatsky
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations
- Vishal Moola removed the try_to_release_page() wrapper
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range()
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages()
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines
- Many singleton patches, as usual
* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
mm: mmu_gather: allow more than one batch of delayed rmaps
mm: fix typo in struct pglist_data code comment
kmsan: fix memcpy tests
mm: add cond_resched() in swapin_walk_pmd_entry()
mm: do not show fs mm pc for VM_LOCKONFAULT pages
selftests/vm: ksm_functional_tests: fixes for 32bit
selftests/vm: cow: fix compile warning on 32bit
selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
mm,thp,rmap: fix races between updates of subpages_mapcount
mm: memcg: fix swapcached stat accounting
mm: add nodes= arg to memory.reclaim
mm: disable top-tier fallback to reclaim on proactive reclaim
selftests: cgroup: make sure reclaim target memcg is unprotected
selftests: cgroup: refactor proactive reclaim code to reclaim_until()
mm: memcg: fix stale protection of reclaim target memcg
mm/mmap: properly unaccount memory on mas_preallocate() failure
omfs: remove ->writepage
jfs: remove ->writepage
...
Add a test for bonding prio option. Here is the test result:
]# ./option_prio.sh
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=2) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=2) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=2) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=2) [ OK ]
Signed-off-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core
----
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations.
- Add inet drop monitor support.
- A few GRO performance improvements.
- Add infrastructure for atomic dev stats, addressing long standing
data races.
- De-duplicate common code between OVS and conntrack offloading
infrastructure.
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements.
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up
the workload with the number of available CPUs.
- Add the helper support for connection-tracking OVS offload.
BPF
---
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF.
- Make cgroup local storage available to non-cgroup attached BPF
programs.
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers.
- A relevant bunch of BPF verifier fixes and improvements.
- Veristat tool improvements to support custom filtering, sorting,
and replay of results.
- Add LLVM disassembler as default library for dumping JITed code.
- Lots of new BPF documentation for various BPF maps.
- Add bpf_rcu_read_{,un}lock() support for sleepable programs.
- Add RCU grace period chaining to BPF to wait for the completion
of access from both sleepable and non-sleepable BPF programs.
- Add support storing struct task_struct objects as kptrs in maps.
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values.
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions.
Protocols
---------
- TCP: implement Protective Load Balancing across switch links.
- TCP: allow dynamically disabling TCP-MD5 static key, reverting
back to fast[er]-path.
- UDP: Introduce optional per-netns hash lookup table.
- IPv6: simplify and cleanup sockets disposal.
- Netlink: support different type policies for each generic
netlink operation.
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support.
- MPTCP: add netlink notification support for listener sockets
events.
- SCTP: add VRF support, allowing sctp sockets binding to VRF
devices.
- Add bridging MAC Authentication Bypass (MAB) support.
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios.
- More work for Wi-Fi 7 support, comprising conversion of all
the existing drivers to internal TX queue usage.
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading.
- IPSec: extended ack support for more descriptive XFRM error
reporting.
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking.
- IEEE 802154: synchronous send frame and extended filtering
support, initial support for scanning available 15.4 networks.
- Tun: bump the link speed from 10Mbps to 10Gbps.
- Tun/VirtioNet: implement UDP segmentation offload support.
Driver API
----------
- PHY/SFP: improve power level switching between standard
level 1 and the higher power levels.
- New API for netdev <-> devlink_port linkage.
- PTP: convert existing drivers to new frequency adjustment
implementation.
- DSA: add support for rx offloading.
- Autoload DSA tagging driver when dynamically changing protocol.
- Add new PCP and APPTRUST attributes to Data Center Bridging.
- Add configuration support for 800Gbps link speed.
- Add devlink port function attribute to enable/disable RoCE and
migratable.
- Extend devlink-rate to support strict prioriry and weighted fair
queuing.
- Add devlink support to directly reading from region memory.
- New device tree helper to fetch MAC address from nvmem.
- New big TCP helper to simplify temporary header stripping.
New hardware / drivers
----------------------
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches.
- Marvel Prestera AC5X Ethernet Switch.
- WangXun 10 Gigabit NIC.
- Motorcomm yt8521 Gigabit Ethernet.
- Microchip ksz9563 Gigabit Ethernet Switch.
- Microsoft Azure Network Adapter.
- Linux Automation 10Base-T1L adapter.
- PHY:
- Aquantia AQR112 and AQR412.
- Motorcomm YT8531S.
- PTP:
- Orolia ART-CARD.
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices.
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices.
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets.
- Realtek RTL8852BE and RTL8723DS.
- Cypress.CYW4373A0 WiFi + Bluetooth combo device.
Drivers
-------
- CAN:
- gs_usb: bus error reporting support.
- kvaser_usb: listen only and bus error reporting support.
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping.
- implement devlink-rate support.
- support direct read from memory.
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate.
- Support for enhanced events compression.
- extend H/W offload packet manipulation capabilities.
- implement IPSec packet offload mode.
- nVidia/Mellanox (mlx4):
- better big TCP support.
- Netronome Ethernet NICs (nfp):
- IPsec offload support.
- add support for multicast filter.
- Broadcom:
- RSS and PTP support improvements.
- AMD/SolarFlare:
- netlink extened ack improvements.
- add basic flower matches to offload, and related stats.
- Virtual NICs:
- ibmvnic: introduce affinity hint support.
- small / embedded:
- FreeScale fec: add initial XDP support.
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood.
- TI am65-cpsw: add suspend/resume support.
- Mediatek MT7986: add RX wireless wthernet dispatch support.
- Realtek 8169: enable GRO software interrupt coalescing per
default.
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP.
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support.
- add ip6gre support.
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support.
- enable flow offload support.
- Renesas:
- add rswitch R-Car Gen4 gPTP support.
- Microchip (lan966x):
- add full XDP support.
- add TC H/W offload via VCAP.
- enable PTP on bridge interfaces.
- Microchip (ksz8):
- add MTU support for KSZ8 series.
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan.
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support.
- add ack signal support.
- enable coredump support.
- remain_on_channel support.
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities.
- 320 MHz channels support.
- RealTek WiFi (rtw89):
- new dynamic header firmware format support.
- wake-over-WLAN support.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmOYXUcSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk8zQP/R7BZtbJMTPiWkRnSoKHnAyupDVwrz5U
ktukLkwPsCyJuEbAjgxrxf4EEEQ9uq2FFlxNSYuKiiQMqIpFxV6KED7LCUygn4Tc
kxtkp0Q+5XiqisWlQmtfExf2OjuuPqcjV9tWCDBI6GebKUbfNwY/eI44RcMu4BSv
DzIlW5GkX/kZAPqnnuqaLsN3FudDTJHGEAD7NbA++7wJ076RWYSLXlFv0Z+SCSPS
H8/PEG0/ZK/65rIWMAFRClJ9BNIDwGVgp0GrsIvs1gqbRUOlA1hl1rDM21TqtNFf
5QPQT7sIfTcCE/nerxKJD5JE3JyP+XRlRn96PaRw3rt4MgI6I/EOj/HOKQ5tMCNc
oPiqb7N70+hkLZyr42qX+vN9eDPjp2koEQm7EO2Zs+/534/zWDs24Zfk/Aa1ps0I
Fa82oGjAgkBhGe/FZ6i5cYoLcyxqRqZV1Ws9XQMl72qRC7/BwvNbIW6beLpCRyeM
yYIU+0e9dEm+wHQEdh2niJuVtR63hy8tvmPx56lyh+6u0+pondkwbfSiC5aD3kAC
ikKsN5DyEsdXyiBAlytCEBxnaOjQy4RAz+3YXSiS0eBNacXp03UUrNGx4Pzpu/D0
QLFJhBnMFFCgy5to8/DvKnrTPgZdSURwqbIUcZdvU21f1HLR8tUTpaQnYffc/Whm
V8gnt1EL+0cc
=CbJC
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing
data races
- De-duplicate common code between OVS and conntrack offloading
infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the
workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF
programs
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting,
and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of
access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back
to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink
operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the
existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error
reporting
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking
- IEEE 802154: synchronous send frame and extended filtering support,
initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and
the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment
implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and
migratable
- Extend devlink-rate to support strict prioriry and weighted fair
queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches
- Marvel Prestera AC5X Ethernet Switch
- WangXun 10 Gigabit NIC
- Motorcomm yt8521 Gigabit Ethernet
- Microchip ksz9563 Gigabit Ethernet Switch
- Microsoft Azure Network Adapter
- Linux Automation 10Base-T1L adapter
- PHY:
- Aquantia AQR112 and AQR412
- Motorcomm YT8531S
- PTP:
- Orolia ART-CARD
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets
- Realtek RTL8852BE and RTL8723DS
- Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN:
- gs_usb: bus error reporting support
- kvaser_usb: listen only and bus error reporting support
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping
- implement devlink-rate support
- support direct read from memory
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate
- Support for enhanced events compression
- extend H/W offload packet manipulation capabilities
- implement IPSec packet offload mode
- nVidia/Mellanox (mlx4):
- better big TCP support
- Netronome Ethernet NICs (nfp):
- IPsec offload support
- add support for multicast filter
- Broadcom:
- RSS and PTP support improvements
- AMD/SolarFlare:
- netlink extened ack improvements
- add basic flower matches to offload, and related stats
- Virtual NICs:
- ibmvnic: introduce affinity hint support
- small / embedded:
- FreeScale fec: add initial XDP support
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
- TI am65-cpsw: add suspend/resume support
- Mediatek MT7986: add RX wireless wthernet dispatch support
- Realtek 8169: enable GRO software interrupt coalescing per
default
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support
- add ip6gre support
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support
- enable flow offload support
- Renesas:
- add rswitch R-Car Gen4 gPTP support
- Microchip (lan966x):
- add full XDP support
- add TC H/W offload via VCAP
- enable PTP on bridge interfaces
- Microchip (ksz8):
- add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support
- add ack signal support
- enable coredump support
- remain_on_channel support
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
- RealTek WiFi (rtw89):
- new dynamic header firmware format support
- wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
ipvs: fix type warning in do_div() on 32 bit
net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
net: ipa: add IPA v4.7 support
dt-bindings: net: qcom,ipa: Add SM6350 compatible
bnxt: Use generic HBH removal helper in tx path
IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
selftests: forwarding: Add bridge MDB test
selftests: forwarding: Rename bridge_mdb test
bridge: mcast: Support replacement of MDB port group entries
bridge: mcast: Allow user space to specify MDB entry routing protocol
bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
bridge: mcast: Add support for (*, G) with a source list and filter mode
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
bridge: mcast: Add a flag for user installed source entries
bridge: mcast: Expose __br_multicast_del_group_src()
bridge: mcast: Expose br_multicast_new_group_src()
bridge: mcast: Add a centralized error path
bridge: mcast: Place netlink policy before validation functions
bridge: mcast: Split (*, G) and (S, G) addition into different functions
bridge: mcast: Do not derive entry type from its filter mode
...
This looks like a relatively calm development cycle; there have been
only few changes in ALSA and ASoC core sides while we get lots of
device-specific fixes and updates as usual. Most of commits are about
ASoC, including Intel SOF/AVS and many device tree updates.
Below are some highlights:
Core:
- Improvement in memalloc helper for fallback allocations
- More cleanups of ASoC DAPM code
ASoC:
- Factoring out of mapping hw_params onto SoundWire configuration
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740
- Lots of device tree conversions and fixups
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
ALSA:
- Addition of PCM kselftest; still minimalistic but can be extended
in future
- Fixes for corner-case XRUNs with USB-audio implicit feedback mode
- Usual device-specific quirk updates for USB- and HD-audio
- FireWire DICE updates
Also, this PR also contains a few cross-tree updates:
- Some OMAP board file updates for removal of relevant OMAP platforms
- A new I2C API update for I2C probe API adaption
- A DRM update for the further hdmi-codec updates
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmOR6TEOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE9p4w/8CoC/jEVFoOFVeH4/ur3MSGv93iDlPlA9sg1B
BMtUEsa+yUtlPjZfw/ZdnUvWGGkvSTIAA3Tyc+yrx+WYAJeoWsL6vpkjQcoKBFLV
oOo/dLROqeK6kS3cir0Z5VzaTg29XNz7iwe2wMp2q0irjbVZVy0+TUa2bzNOAdbs
Hupu5Vwx2lKINSKjWVbN+3g4LiMW+VyEavNZf7bZNxI5h/4p1oaOj/lJrsHCEX5y
rj1+d4EJntaFHToPf+4YkrMjLji0Yj9qsIWeXWy0Q5aUCyNr4zA3LrSszyM5cYfC
dBPPrFatvXt+N0SVTURX7VnKgYzLlG8TNwXPUJbfnTGzvXHzd5q08MHWm2ZyF2tf
3wDR+Lrw97WLWvGKQjHpg1ZFWmqSTC6D+9ihGCNRq0pMW6EtmxHtkDxhD45WF1Wq
UQJNYHWbpSQye+wwio1/JZCZ55x89utapaRXD9cTZCDoCBKOcaUsr71hNt56HL/3
5dT6fx1pJwyaR+SPJg7DQlnPGnm4J8cJhwi+WuHME9IECjO10b9o5ThcxaNWY3W7
ysVCk2jLJHOZTG4FDun5mEqyWEmnjrUAH9UZtCSQZdhYCk8E8C9B2trKUAh9nb/p
bUCrNdoopY5SpUCadPT7HtDiNXNWYMnpd7ktUun2z7V0u8pZNnhNUVvOuzFc/gT1
ypWJp+0=
=QV3a
-----END PGP SIGNATURE-----
Merge tag 'sound-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This looks like a relatively calm development cycle; there have been
only few changes in ALSA and ASoC core sides while we get lots of
device-specific fixes and updates as usual. Most of commits are about
ASoC, including Intel SOF/AVS and many device tree updates.
Below are some highlights:
Core:
- Improvement in memalloc helper for fallback allocations
- More cleanups of ASoC DAPM code
ASoC:
- Factoring out of mapping hw_params onto SoundWire configuration
- The ever ongoing overhauls of the Intel DSP code continue,
including support for loading libraries and probes with IPC4 on
SOF.
- Support for more sample formats on JZ4740
- Lots of device tree conversions and fixups
- Support for Allwinner D1, a range of AMD and Intel systems,
Mediatek systems with multiple DMICs, Nuvoton NAU8318, NXP
fsl_rpmsg and i.MX93, Qualcomm AudioReach Enable, MFC and SAL,
RealTek RT1318 and Rockchip RK3588
ALSA:
- Addition of PCM kselftest; still minimalistic but can be extended
in future
- Fixes for corner-case XRUNs with USB-audio implicit feedback mode
- Usual device-specific quirk updates for USB- and HD-audio
- FireWire DICE updates
This also contains a few cross-tree updates:
- Some OMAP board file updates for removal of relevant OMAP platforms
- A new I2C API update for I2C probe API adaption
- A DRM update for the further hdmi-codec updates"
* tag 'sound-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (417 commits)
ALSA: mts64: fix possible null-ptr-defer in snd_mts64_interrupt
ALSA: patch_realtek: Fix Dell Inspiron Plus 16
ALSA: hda/cirrus: Add extra 10 ms delay to allow PLL settle and lock.
ASoC: dt-bindings: Correct Alexandre Belloni email
ASoC: dt-bindings: maxim,max98504: Convert to DT schema
ASoC: dt-bindings: maxim,max98357a: Convert to DT schema
ASoC: dt-bindings: Reference common DAI properties
ASoC: dt-bindings: Extend name-prefix.yaml into common DAI properties
ASoC: rt715: Make read-only arrays capture_reg_H and capture_reg_L static const
ASoC: uniphier: aio-core: Make some read-only arrays static const
ASoC: wcd938x: Make read-only array minCode_param static const
ASoC: qcom: lpass-sc7280: Add maybe_unused tag for system PM ops
ASoC : SOF: amd: Add support for IPC and DSP dumps
ASoC: SOF: amd: Use poll function instead to read ACP_SHA_DSP_FW_QUALIFIER
ALSA: usb-audio: Workaround for XRUN at prepare
ALSA: pcm: Handle XRUN at trigger START
ALSA: pcm: Set missing stop_operating flag at undoing trigger start
drm: tda99x: Don't advertise non-existent capture support
ASoC: hdmi-codec: Allow playback and capture to be disabled
kselftest/alsa: Add more coverage of sample rates and channel counts
...
-----BEGIN PGP SIGNATURE-----
iIYEABYIAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCY5b27RAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSg9YA/0K10H+VsGt1+qqR4+w9SM7SFzbgszrV3Yw9
rwiPgaPVAP9rxXPr2bD2hAk7/Lv9LeJ2kfM9RzMErP1A6UsC5YVbDA==
=mAG7
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This adds file truncation support to Landlock, contributed by Günther
Noack. As described by Günther [1], the goal of these patches is to
work towards a more complete coverage of file system operations that
are restrictable with Landlock.
The known set of currently unsupported file system operations in
Landlock is described at [2]. Out of the operations listed there,
truncate is the only one that modifies file contents, so these patches
should make it possible to prevent the direct modification of file
contents with Landlock.
The new LANDLOCK_ACCESS_FS_TRUNCATE access right covers both the
truncate(2) and ftruncate(2) families of syscalls, as well as open(2)
with the O_TRUNC flag. This includes usages of creat() in the case
where existing regular files are overwritten.
Additionally, this introduces a new Landlock security blob associated
with opened files, to track the available Landlock access rights at
the time of opening the file. This is in line with Unix's general
approach of checking the read and write permissions during open(), and
associating this previously checked authorization with the opened
file. An ongoing patch documents this use case [3].
In order to treat truncate(2) and ftruncate(2) calls differently in an
LSM hook, we split apart the existing security_path_truncate hook into
security_path_truncate (for truncation by path) and
security_file_truncate (for truncation of previously opened files)"
Link: https://lore.kernel.org/r/20221018182216.301684-1-gnoack3000@gmail.com [1]
Link: https://www.kernel.org/doc/html/v6.1/userspace-api/landlock.html#filesystem-flags [2]
Link: https://lore.kernel.org/r/20221209193813.972012-1-mic@digikod.net [3]
* tag 'landlock-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
samples/landlock: Document best-effort approach for LANDLOCK_ACCESS_FS_REFER
landlock: Document Landlock's file truncation support
samples/landlock: Extend sample tool to support LANDLOCK_ACCESS_FS_TRUNCATE
selftests/landlock: Test ftruncate on FDs created by memfd_create(2)
selftests/landlock: Test FD passing from restricted to unrestricted processes
selftests/landlock: Locally define __maybe_unused
selftests/landlock: Test open() and ftruncate() in multiple scenarios
selftests/landlock: Test file truncation support
landlock: Support file truncation
landlock: Document init_layer_masks() helper
landlock: Refactor check_access_path_dual() into is_access_to_paths_allowed()
security: Create file_truncate hook from path_truncate hook
- A ptrace API cleanup series from Sergey Shtylyov
- Fixes and cleanups for kexec from ye xingchen
- nilfs2 updates from Ryusuke Konishi
- squashfs feature work from Xiaoming Ni: permit configuration of the
filesystem's compression concurrency from the mount command line.
- A series from Akinobu Mita which addresses bound checking errors when
writing to debugfs files.
- A series from Yang Yingliang to address rapido memory leaks
- A series from Zheng Yejian to address possible overflow errors in
encode_comp_t().
- And a whole shower of singleton patches all over the place.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5efRgAKCRDdBJ7gKXxA
jgvdAP0al6oFDtaSsshIdNhrzcMwfjt6PfVxxHdLmNhF1hX2dwD/SVluS1bPSP7y
0sZp7Ustu3YTb8aFkMl96Y9m9mY1Nwg=
=ga5B
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- A ptrace API cleanup series from Sergey Shtylyov
- Fixes and cleanups for kexec from ye xingchen
- nilfs2 updates from Ryusuke Konishi
- squashfs feature work from Xiaoming Ni: permit configuration of the
filesystem's compression concurrency from the mount command line
- A series from Akinobu Mita which addresses bound checking errors when
writing to debugfs files
- A series from Yang Yingliang to address rapidio memory leaks
- A series from Zheng Yejian to address possible overflow errors in
encode_comp_t()
- And a whole shower of singleton patches all over the place
* tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits)
ipc: fix memory leak in init_mqueue_fs()
hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
rapidio: devices: fix missing put_device in mport_cdev_open
kcov: fix spelling typos in comments
hfs: Fix OOB Write in hfs_asc2mac
hfs: fix OOB Read in __hfs_brec_find
relay: fix type mismatch when allocating memory in relay_create_buf()
ocfs2: always read both high and low parts of dinode link count
io-mapping: move some code within the include guarded section
kernel: kcsan: kcsan_test: build without structleak plugin
mailmap: update email for Iskren Chernev
eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD
rapidio: fix possible UAF when kfifo_alloc() fails
relay: use strscpy() is more robust and safer
cpumask: limit visibility of FORCE_NR_CPUS
acct: fix potential integer overflow in encode_comp_t()
acct: fix accuracy loss for input value of encode_comp_t()
linux/init.h: include <linux/build_bug.h> and <linux/stringify.h>
rapidio: rio: fix possible name leak in rio_register_mport()
rapidio: fix possible name leaks when rio_add_device() fails
...
- Fix minconfig test to unset the config and not relying on
olddefconfig to do it, as some configs are set to default y
- Fix reading grub2 menus for handling submenus
- Add new ${shell <cmd>} to execute shell commands that will be useful
for setting variables like: HOSTNAME := ${shell hostname}
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5erBBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqeoAQDzt97bAEfWfPGoKBWjtVs/TVIrpyVt
WGbrRwJzdgIrigD/SUBHq4irLD85UpGSG3EiHZRcyJxn8Wuv7npNgtpexQA=
=oPSv
-----END PGP SIGNATURE-----
Merge tag 'ktest-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
- Fix minconfig test to unset the config and not relying on
olddefconfig to do it, as some configs are set to default y
- Fix reading grub2 menus for handling submenus
- Add new ${shell <cmd>} to execute shell commands that will be useful
for setting variables like: HOSTNAME := ${shell hostname}
* tag 'ktest-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Add shell commands to variables
kest.pl: Fix grub2 menu handling for rebooting
ktest.pl minconfig: Unset configs instead of just removing them
This KUnit next update for Linux 6.2-rc1 consists of several enhancements,
fixes, clean-ups, documentation updates, improvements to logging and KTAP
compliance of KUnit test output:
- log numbers in decimal and hex
- parse KTAP compliant test output
- allow conditionally exposing static symbols to tests
when KUNIT is enabled
- make static symbols visible during kunit testing
- clean-ups to remove unused structure definition
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmOXnPYACgkQCwJExA0N
Qxwf9RAAwdBKxgPZuKZ40v69Jm8YhaO3vyKUkyYRH59/HQGFUHMA2f2ONez4krEX
iXPgBFQ+7pB63FdgQi2HSg2z/u3xY02AaGgZGXDuNJDmg2xYjNDfZ0GjN6tuavlN
Liz01DGZkjZoVVXM6oV2xT8woBg/0BbdkKNL1OBO9RBZFHzwDryRzfXmQb8cKlNr
S+tkeZTlCA/s7UW2LNj4VlTzn6wgni4Y9gSk4wbQmSGWn3OX3rHaqAb7GiZ/yPGb
1WjbMeE8FwyydLU40aOZZ8V6AJRiw5VGPJyFzWJyWZ21xOgN9Z95b+I36z8RXraA
i/wnazO/FJsrhzvKL83rQkrSW6bpmVY+jGvk+L6deFM6Ro/vEWHJ4DgyKsIdMiJy
gUM1Q69szptq+ZRHGrZWPlVONBkBXMOL+fePbCbGcMzlaEAS/zsFYW9IBKcvLzwP
uHzzMS/cMmSUq52ZIyl9jhHQFVSoErCpJwQjAaZBQpYXPmE7yLcZItxnCaSUQTay
bRwyps5ph5md0oJTTFJKZ4Zx5FJ2ItjbC4y9BIexb9gYRDdRq723ivDoVENZl/Zk
DFIV95AY+mSxadS5vFagwWwX0ZN0KFKxeM8Tw7VTimal/0Sbglqp+oflsuKFD6JQ
b5HUixYifKMbWxkH5xrUb8NdjmBj561TYa8U4N+j3oOiaPYu5Ss=
=UQNn
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"Several enhancements, fixes, clean-ups, documentation updates,
improvements to logging and KTAP compliance of KUnit test output:
- log numbers in decimal and hex
- parse KTAP compliant test output
- allow conditionally exposing static symbols to tests when KUNIT is
enabled
- make static symbols visible during kunit testing
- clean-ups to remove unused structure definition"
* tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
Documentation: dev-tools: Clarify requirements for result description
apparmor: test: make static symbols visible during kunit testing
kunit: add macro to allow conditionally exposing static symbols to tests
kunit: tool: make parser preserve whitespace when printing test log
Documentation: kunit: Fix "How Do I Use This" / "Next Steps" sections
kunit: tool: don't include KTAP headers and the like in the test log
kunit: improve KTAP compliance of KUnit test output
kunit: tool: parse KTAP compliant test output
mm: slub: test: Use the kunit_get_current_test() function
kunit: Use the static key when retrieving the current test
kunit: Provide a static key to check if KUnit is actively running tests
kunit: tool: make --json do nothing if --raw_ouput is set
kunit: tool: tweak error message when no KTAP found
kunit: remove KUNIT_INIT_MEM_ASSERTION macro
Documentation: kunit: Remove redundant 'tips.rst' page
Documentation: KUnit: reword description of assertions
Documentation: KUnit: make usage.rst a superset of tips.rst, remove duplication
kunit: eliminate KUNIT_INIT_*_ASSERT_STRUCT macros
kunit: tool: remove redundant file.close() call in unit test
kunit: tool: unit tests all check parser errors, standardize formatting a bit
...
This Kselftest update for Linux 6.2-rc1 consists of several fixes
and enhancements to existing tests and a few new tests:
- adds new amd-pstate and fixes and enhances existing ones
- adds new watchdog tests and enhances existing ones to improve coverage
- fixes to ftrace, splice_read, rtc, and efivars tests
- fixes to handle egrep obsolescence in the latest grep release
- miscellaneous spelling and SPDX fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmOXdT4ACgkQCwJExA0N
Qxy06RAAvQN6kpzCGJjLz5R9Lx3QpG4BiCO7vdZs1QTwzo8kTGQDq3JD6m+ychDx
CgLnH7RrPlIYz4oExnV7JPE2tFEarV/zFh2V8LjKfGePZVtNeDASlC7F3lWYUnM/
n3+6H/JbZ1BgGE9DE5/DAOOAsN0CY2QPJWRDN1wYH7/gLXulPlSt+BV/ZFj3LG/0
Qne2SR7kc+hKPOFNl+BWKOU2a4mNOmoxaROgQraKdeQMQoTAwz/7lfylYZD9nU0r
nyVxHTr0n+/XX3Q93arAS/chOyFBJrAESciUPY4E2oF97uiE0TqHdKA/qfPNRr7N
wSOdWxYSuNaz0tkzO01EzeGGr+mw0WlCNoo6NzsUvqzRXDf0F0cWe32tmIZHJAzS
CqxpKd6I8XPkEeyy5kL12q+akxe30zDGaKdaYGkZ7SjbwG6ygzSSW5MYfojvbtr9
Nfb6OnkPC1aZzC9jtiJO1EHd9f+PdeUVKNQsvzseT4b9xhmpxBqlrzgB5GakDoE6
uo3cXyz5gOzqJD6FT+CqKa/16NaHATw/U7/Y0gXj5ELKEmuYBmnl1T9svDnSIVfF
hgS/3UkFYiw3R2oW35wv988w2JsXrkItOyNdAm47ihvAHF/uCumcSeea5k3+QYrH
7bM4PzJsMMcOhWWQ/04Q+LQCWWem/Vhk22BlIr6IiuGd6L03pc8=
=4cDX
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
"Several fixes and enhancements to existing tests and a few new tests:
- add new amd-pstate tests and fix and enhance existing ones
- add new watchdog tests and enhance existing ones to improve
coverage
- fixes to ftrace, splice_read, rtc, and efivars tests
- fixes to handle egrep obsolescence in the latest grep release
- miscellaneous spelling and SPDX fixes"
* tag 'linux-kselftest-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (24 commits)
selftests/ftrace: Use long for synthetic event probe test
selftests/tpm2: Split async tests call to separate shell script runner
selftests: splice_read: Fix sysfs read cases
selftests: ftrace: Use "grep -E" instead of "egrep"
selftests: gpio: Use "grep -E" instead of "egrep"
selftests: kselftest_deps: Use "grep -E" instead of "egrep"
selftests/efivarfs: Add checking of the test return value
cpufreq: amd-pstate: fix spdxcheck warnings for amd-pstate-ut.c
selftests: rtc: skip when RTC is not present
selftests/ftrace: event_triggers: wait longer for test_event_enable
selftests/vDSO: Add riscv getcpu & gettimeofday test
Documentation: amd-pstate: Add tbench and gitsource test introduction
selftests: amd-pstate: Trigger gitsource benchmark and test cpus
selftests: amd-pstate: Trigger tbench benchmark and test cpus
selftests: amd-pstate: Split basic.sh into run.sh and basic.sh.
selftests: amd-pstate: Rename amd-pstate-ut.sh to basic.sh.
selftests/ftrace: Convert tracer tests to use 'requires' to specify program dependency
selftests/ftrace: Add check for ping command for trigger tests
selftests/watchdog: Fix spelling mistake "Temeprature" -> "Temperature"
selftests/watchdog: add test for WDIOC_GETTEMP
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
/ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
=QRhK
-----END PGP SIGNATURE-----
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it,
there is now a new family of functions that uses fast rejection
sampling to choose properly uniformly random numbers within an
interval:
get_random_u32_below(ceil) - [0, ceil)
get_random_u32_above(floor) - (floor, U32_MAX]
get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of
prandom_u32_max(), as well as many open-coded patterns, resulting in
improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused
prandom_u32_max() function, just in case any other trees add a new
use case of it that needs to converted. According to linux-next,
there may be two trivial cases of prandom_u32_max() reintroductions
that are fixable with a 's/.../.../'. So I'll have for you a final
conversion patch doing that alongside the removal patch during the
second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and
simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and
wasn't entirely useful, so this has been replaced by code that works
in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI
variables, refreshing a variable with a fresh seed when the RNG is
initialized. The RNG GUID namespace is then hidden from efivarfs to
prevent accidental leakage.
These changes are split into random.c infrastructure code used in the
EFI subsystem, in this pull request, and related support inside of
EFISTUB, in Ard's EFI tree. These are co-dependent for full
functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for
an improvement to the way vsprintf initializes its siphash key,
replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c
input function, add_hwgenerator_randomness(), rather than sometimes
going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork
handler, but is a no-op when the latent entropy gcc plugin isn't
used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed
in beside the latent entropy variable. So now, if the latent entropy
gcc plugin isn't enabled, add_latent_entropy() will expand to a call
to add_device_randomness(NULL, 0), which adds a cycle counter,
without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand
when used. Always running from a worker allows it to make use of the
CPU RNG on platforms like S390x, whose instructions are too slow to
do so from interrupts. It also has the effect of adding in new inputs
more frequently with more regularity, amounting to a long term
transcript of random values. Plus, it helps a bit with the upcoming
vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different
CPUs, round-robining, in hopes of hitting even more memory latencies
and other unpredictable effects. It also will mix in a cycle counter
when the entropy timer fires, in addition to being mixed in from the
main loop, to account more explicitly for fluctuations in that timer
firing. And the state it touches is now kept within the same cache
line, so that it's assured that the different execution contexts will
cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
random: include <linux/once.h> in the right header
random: align entropy_timer_state to cache line
random: mix in cycle counter when jitter timer fires
random: spread out jitter callback to different CPUs
random: remove extraneous period and add a missing one in comments
efi: random: refresh non-volatile random seed when RNG is initialized
vsprintf: initialize siphash key using notifier
random: add back async readiness notifier
random: reseed in delayed work rather than on-demand
random: always mix cycle counter in add_latent_entropy()
hw_random: use add_hwgenerator_randomness() for early entropy
random: modernize documentation comment on get_random_bytes()
random: adjust comment to account for removed function
random: remove early archrandom abstraction
random: use random.trust_{bootloader,cpu} command line option only
stackprotector: actually use get_random_canary()
stackprotector: move get_random_canary() into stackprotector.h
treewide: use get_random_u32_inclusive() when possible
treewide: use get_random_u32_{above,below}() instead of manual loop
treewide: use get_random_u32_below() instead of deprecated function
...
Nothing too interesting.
* Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations kprobable.
* A couple cpuset optimizations.
* Other misc changes including doc and test updates.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCY5bHvg4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGcYrAQCfrlzrbWw6gTQ7fmr0Avxjy5FxLjsdzEGPcmGY
ByEMhgD/VdUf3zI/Khr91Gsi5JXQxQf7a5caD369xupRWUWjqA8=
=Nf+E
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Nothing too interesting:
- Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations
kprobable
- A couple cpuset optimizations
- Other misc changes including doc and test updates"
* tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: remove rcu_read_lock()/rcu_read_unlock() in critical section of spin_lock_irq()
cgroup/cpuset: Improve cpuset_css_alloc() description
kselftest/cgroup: Add cleanup() to test_cpuset_prs.sh
cgroup/cpuset: Optimize cpuset_attach() on v2
cgroup/cpuset: Skip spread flags update on v2
kselftest/cgroup: Fix gathering number of CPUs
cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REF
cgroup: Implement DEBUG_CGROUP_REF
Add a selftests that includes the following test cases:
1. Configuration tests. Both valid and invalid configurations are
tested across all entry types (e.g., L2, IPv4).
2. Forwarding tests. Both host and port group entries are tested across
all entry types.
3. Interaction between user installed MDB entries and IGMP / MLD control
packets.
Example output:
INFO: # Host entries configuration tests
TEST: Common host entries configuration tests (IPv4) [ OK ]
TEST: Common host entries configuration tests (IPv6) [ OK ]
TEST: Common host entries configuration tests (L2) [ OK ]
INFO: # Port group entries configuration tests - (*, G)
TEST: Common port group entries configuration tests (IPv4 (*, G)) [ OK ]
TEST: Common port group entries configuration tests (IPv6 (*, G)) [ OK ]
TEST: IPv4 (*, G) port group entries configuration tests [ OK ]
TEST: IPv6 (*, G) port group entries configuration tests [ OK ]
INFO: # Port group entries configuration tests - (S, G)
TEST: Common port group entries configuration tests (IPv4 (S, G)) [ OK ]
TEST: Common port group entries configuration tests (IPv6 (S, G)) [ OK ]
TEST: IPv4 (S, G) port group entries configuration tests [ OK ]
TEST: IPv6 (S, G) port group entries configuration tests [ OK ]
INFO: # Port group entries configuration tests - L2
TEST: Common port group entries configuration tests (L2 (*, G)) [ OK ]
TEST: L2 (*, G) port group entries configuration tests [ OK ]
INFO: # Forwarding tests
TEST: IPv4 host entries forwarding tests [ OK ]
TEST: IPv6 host entries forwarding tests [ OK ]
TEST: L2 host entries forwarding tests [ OK ]
TEST: IPv4 port group "exclude" entries forwarding tests [ OK ]
TEST: IPv6 port group "exclude" entries forwarding tests [ OK ]
TEST: IPv4 port group "include" entries forwarding tests [ OK ]
TEST: IPv6 port group "include" entries forwarding tests [ OK ]
TEST: L2 port entries forwarding tests [ OK ]
INFO: # Control packets tests
TEST: IGMPv3 MODE_IS_INCLUE tests [ OK ]
TEST: MLDv2 MODE_IS_INCLUDE tests [ OK ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test is only concerned with host MDB entries and not with MDB
entries as a whole. Rename the test to reflect that.
Subsequent patches will add a more general test that will contain the
test cases for host MDB entries and remove the current test.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Incorrect error check in nft_expr_inner_parse(), from Dan Carpenter.
2) Add DATA_SENT state to SCTP connection tracking helper, from
Sriram Yagnaraman.
3) Consolidate nf_confirm for ipv4 and ipv6, from Florian Westphal.
4) Add bitmask support for ipset, from Vishwanath Pai.
5) Handle icmpv6 redirects as RELATED, from Florian Westphal.
6) Add WARN_ON_ONCE() to impossible case in flowtable datapath,
from Li Qiong.
7) A large batch of IPVS updates to replace timer-based estimators by
kthreads to scale up wrt. CPUs and workload (millions of estimators).
Julian Anastasov says:
This patchset implements stats estimation in kthread context.
It replaces the code that runs on single CPU in timer context every 2
seconds and causing latency splats as shown in reports [1], [2], [3].
The solution targets setups with thousands of IPVS services,
destinations and multi-CPU boxes.
Spread the estimation on multiple (configured) CPUs and multiple
time slots (timer ticks) by using multiple chains organized under RCU
rules. When stats are not needed, it is recommended to use
run_estimation=0 as already implemented before this change.
RCU Locking:
- As stats are now RCU-locked, tot_stats, svc and dest which
hold estimator structures are now always freed from RCU
callback. This ensures RCU grace period after the
ip_vs_stop_estimator() call.
Kthread data:
- every kthread works over its own data structure and all
such structures are attached to array. For now we limit
kthreads depending on the number of CPUs.
- even while there can be a kthread structure, its task
may not be running, eg. before first service is added or
while the sysctl var is set to an empty cpulist or
when run_estimation is set to 0 to disable the estimation.
- the allocated kthread context may grow from 1 to 50
allocated structures for timer ticks which saves memory for
setups with small number of estimators
- a task and its structure may be released if all
estimators are unlinked from its chains, leaving the
slot in the array empty
- every kthread data structure allows limited number
of estimators. Kthread 0 is also used to initially
calculate the max number of estimators to allow in every
chain considering a sub-100 microsecond cond_resched
rate. This number can be from 1 to hundreds.
- kthread 0 has an additional job of optimizing the
adding of estimators: they are first added in
temp list (est_temp_list) and later kthread 0
distributes them to other kthreads. The optimization
is based on the fact that newly added estimator
should be estimated after 2 seconds, so we have the
time to offload the adding to chain from controlling
process to kthread 0.
- to add new estimators we use the last added kthread
context (est_add_ktid). The new estimators are linked to
the chains just before the estimated one, based on add_row.
This ensures their estimation will start after 2 seconds.
If estimators are added in bursts, common case if all
services and dests are initially configured, we may
spread the estimators to more chains and as result,
reducing the initial delay below 2 seconds.
Many thanks to Jiri Wiesner for his valuable comments
and for spending a lot of time reviewing and testing
the changes on different platforms with 48-256 CPUs and
1-8 NUMA nodes under different cpufreq governors.
The new IPVS estimators do not use workqueue infrastructure
because:
- The estimation can take long time when using multiple IPVS rules (eg.
millions estimator structures) and especially when box has multiple
CPUs due to the for_each_possible_cpu usage that expects packets from
any CPU. With est_nice sysctl we have more control how to prioritize the
estimation kthreads compared to other processes/kthreads that have
latency requirements (such as servers). As a benefit, we can see these
kthreads in top and decide if we will need some further control to limit
their CPU usage (max number of structure to estimate per kthread).
- with kthreads we run code that is read-mostly, no write/lock
operations to process the estimators in 2-second intervals.
- work items are one-shot: as estimators are processed every
2 seconds, they need to be re-added every time. This again
loads the timers (add_timer) if we use delayed works, as there are
no kthreads to do the timings.
[1] Report from Yunhong Jiang:
https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/
[2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2
[3] Report from Dust:
https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
ipvs: run_estimation should control the kthread tasks
ipvs: add est_cpulist and est_nice sysctl vars
ipvs: use kthreads for stats estimation
ipvs: use u64_stats_t for the per-cpu counters
ipvs: use common functions for stats allocation
ipvs: add rcu protection to stats
netfilter: flowtable: add a 'default' case to flowtable datapath
netfilter: conntrack: set icmpv6 redirects as RELATED
netfilter: ipset: Add support for new bitmask parameter
netfilter: conntrack: merge ipv4+ipv6 confirm functions
netfilter: conntrack: add sctp DATA_SENT state
netfilter: nft_inner: fix IS_ERR() vs NULL check
====================
Link: https://lore.kernel.org/r/20221211101204.1751-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Fix up ptrace interface to protection keys register (PKRU)
* Avoid undefined compiler behavior with TYPE_ALIGN
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYisACgkQaDWVMHDJ
krAJkA//QRChRwyKi1syinXt2SGoSa3mTzP23SyV0TunOfKBiBUreFJ2mMFjsX0h
V7SJcu82sCWLHAY6LZRdyiF8zK3Cfzpbgb1QfzBCefE/gU801FhCypqNbQO5Lpdr
PEo+naaDOzwDWDt0A6OkAArgb0zfaOGL+OBhuwT7mcUtBz6gCakFqG2BMgOzqD1z
SAp0RraoSsFnKFl5Gv44+gkThq8/8yL5tyrJtnGv1jAsbhw9zmloaOue6MNMPJhH
3sFQnML3qeNRozquWWeCPu/hxWuFDitPhwdmNRZrnQ3DyRdDhCZPOjv+tQmxI3EO
5c+UIkMIsRh2nZLwHcM+iO5cWE7lyiAWpgqqArB+r2CFXWK5q2lplhXngBodE9Kr
ki/NZ6oEitT3+bLXhCwyc7WKxohl2IlmclJ4AD3Qrp4bzPhfsZebL6nNs/3bxWuF
CxJWIKzjtIcgNSEJaDOzFA5CAImq74r/kCW4e11ZXwmOnx6PX1YG6p0C1yknrZYJ
bvy8WxureO7OJEcVZfwxpXLYbb+7Q/k/l2DkUdVAvKSCB81uWR4JzEp4oooDxf2j
6x9qT5Mi95FhAHOCmlxwkQJTBCB36LkVF/3ESEOqJmun4F5ghPbMX2JzpBa6jPCS
lzkBrzA8MAdmaLHhDO+nd5m8HVY3QBSXDVtRTycmuloeoSeyBno=
=An0n
-----END PGP SIGNATURE-----
Merge tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Dave Hansen:
"There are two little fixes in here, one to give better XSAVE warnings
and another to address some undefined behavior in offsetof().
There is also a collection of patches to fix some issues with ptrace
and the protection keys register (PKRU). PKRU is a real oddity because
it is exposed in the XSAVE-related ABIs, but it is generally managed
without using XSAVE in the kernel. This fix thankfully came with a
selftest to ward off future regressions.
Summary:
- Clarify XSAVE consistency warnings
- Fix up ptrace interface to protection keys register (PKRU)
- Avoid undefined compiler behavior with TYPE_ALIGN"
* tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace
x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not set
x86/fpu: Allow PKRU to be (once again) written by ptrace.
x86/fpu: Add a pkru argument to copy_uabi_to_xstate()
x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate().
x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate()
x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnostics
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYmwACgkQaDWVMHDJ
krD8hg/+J0hUTfljmlCctwGZyqVR3Y2E722wL9oTvbgYiUAtFrARzfPF0WNwvHi5
Ywvod5hQ4unPoluthdVAD/uJqcPVhjIZ7CvNTGrS8J7ED5x5ydGLNWAL3Rn+9s6O
xkz/DsV4zl+cPQ60XLsO+3Mc6RhwVs9DUthpUovl22epmgmRPCovkHWkvQsZajJq
ceF/78ThfrkG4dDouaIXi1gsmKLLzU4KdHeBATMg0bgPQXFJZSGBCLaeJXWmLapq
7N3SznUqDMn4Plr/IuP4XuMA6VTVojrakCcBmw5SGVqhkVWGM1/FMg7jHSQS7Z5V
5uG7CkhTBqh17v9xKwDMPh34D51TLtNifA7jbecyL5155czFkj7BoSwEFINU/wCz
agUO9NvK9j1chUnA2UGqGQigM3nWGZHMwaQjfgBWyq5gqF8HURUUrjx6XuunOfmB
1byyrDu0g48u/zaQ/RpNfewz1ZY+WylDPcqOhYaVWF1PYThStML/VMBKpdsl1Ovw
nytUdQsaBIjFHQdB+snizaF93+/0FG+FTGAlDnHYmey/8plL2LYuzrcDnDYnGEXa
tN3HFd2lAi4JBLmvmgF39gH+BLXuKTLweIhwTXZTn91cfire3yxiXAnLd0tuptMP
aXFddxKMdMpxTqzy2X+8gJjqCr2lZ9gZkxaPsWwrBM+xrJf0p2w=
=JGnq
-----END PGP SIGNATURE-----
Merge tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tdx updates from Dave Hansen:
"This includes a single chunk of new functionality for TDX guests which
allows them to talk to the trusted TDX module software and obtain an
attestation report.
This report can then be used to prove the trustworthiness of the guest
to a third party and get access to things like storage encryption
keys"
* tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/tdx: Test TDX attestation GetReport support
virt: Add TDX guest driver
x86/tdx: Add a wrapper to get TDREPORT0 from the TDX Module
- Add the cpu_cache_invalidate_memregion() API for cache flushing in
response to physical memory reconfiguration, or memory-side data
invalidation from operations like secure erase or memory-device unlock.
- Add a facility for the kernel to warn about collisions between kernel
and userspace access to PCI configuration registers
- Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1)
- Add handling and reporting of CXL errors reported via the PCIe AER
mechanism
- Add support for CXL Persistent Memory Security commands
- Add support for the "XOR" algorithm for CXL host bridge interleave
- Rework / simplify CXL to NVDIMM interactions
- Miscellaneous cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY5UpyAAKCRDfioYZHlFs
Z0ttAP4uxCjIibKsFVyexpSgI4vaZqQ9yt9NesmPwonc0XookwD+PlwP6Xc0d0Ox
t0gJ6+pwdh11NRzhcNE1pAaPcJZU4gs=
=HAQk
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl updates from Dan Williams:
"Compute Express Link (CXL) updates for 6.2.
While it may seem backwards, the CXL update this time around includes
some focus on CXL 1.x enabling where the work to date had been with
CXL 2.0 (VH topologies) in mind.
First generation CXL can mostly be supported via BIOS, similar to DDR,
however it became clear there are use cases for OS native CXL error
handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x
hosts (Restricted CXL Host (RCH) topologies). So, this update brings
RCH topologies into the Linux CXL device model.
In support of the ongoing CXL 2.0+ enabling two new core kernel
facilities are added.
One is the ability for the kernel to flag collisions between userspace
access to PCI configuration registers and kernel accesses. This is
brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware
mailbox over config-cycles.
The other is a cpu_cache_invalidate_memregion() API that maps to
wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest
VMs and architectures that do not support it yet. The CXL paths that
need it, dynamic memory region creation and security commands (erase /
unlock), are disabled when it is not present.
As for the CXL 2.0+ this cycle the subsystem gains support Persistent
Memory Security commands, error handling in response to PCIe AER
notifications, and support for the "XOR" host bridge interleave
algorithm.
Summary:
- Add the cpu_cache_invalidate_memregion() API for cache flushing in
response to physical memory reconfiguration, or memory-side data
invalidation from operations like secure erase or memory-device
unlock.
- Add a facility for the kernel to warn about collisions between
kernel and userspace access to PCI configuration registers
- Add support for Restricted CXL Host (RCH) topologies (formerly CXL
1.1)
- Add handling and reporting of CXL errors reported via the PCIe AER
mechanism
- Add support for CXL Persistent Memory Security commands
- Add support for the "XOR" algorithm for CXL host bridge interleave
- Rework / simplify CXL to NVDIMM interactions
- Miscellaneous cleanups and fixes"
* tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits)
cxl/region: Fix memdev reuse check
cxl/pci: Remove endian confusion
cxl/pci: Add some type-safety to the AER trace points
cxl/security: Drop security command ioctl uapi
cxl/mbox: Add variable output size validation for internal commands
cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size
cxl/security: Fix Get Security State output payload endian handling
cxl: update names for interleave ways conversion macros
cxl: update names for interleave granularity conversion macros
cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry
tools/testing/cxl: Require cache invalidation bypass
cxl/acpi: Fail decoder add if CXIMS for HBIG is missing
cxl/region: Fix spelling mistake "memergion" -> "memregion"
cxl/regs: Fix sparse warning
cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support
tools/testing/cxl: Add an RCH topology
cxl/port: Add RCD endpoint port enumeration
cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem
tools/testing/cxl: Add XOR Math support to cxl_test
cxl/acpi: Support CXL XOR Interleave Math (CXIMS)
...
Currently, kunit_parser.py is stripping all leading whitespace to make
parsing easier. But this means we can't accurately show kernel output
for failing tests or when the kernel crashes.
Embarassingly, this affects even KUnit's own output, e.g.
[13:40:46] Expected 2 + 1 == 2, but
[13:40:46] 2 + 1 == 3 (0x3)
[13:40:46] not ok 1 example_simple_test
[13:40:46] [FAILED] example_simple_test
After this change, here's what the output in context would look like
[13:40:46] =================== example (4 subtests) ===================
[13:40:46] # example_simple_test: initializing
[13:40:46] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
[13:40:46] Expected 2 + 1 == 2, but
[13:40:46] 2 + 1 == 3 (0x3)
[13:40:46] [FAILED] example_simple_test
[13:40:46] [SKIPPED] example_skip_test
[13:40:46] [SKIPPED] example_mark_skipped_test
[13:40:46] [PASSED] example_all_expect_macros_test
[13:40:46] # example: initializing suite
[13:40:46] # example: pass:1 fail:1 skip:2 total:4
[13:40:46] # Totals: pass:1 fail:1 skip:2 total:4
[13:40:46] ===================== [FAILED] example =====================
This example shows one minor cosmetic defect this approach has.
The test counts lines prevent us from dedenting the suite-level output.
But at the same time, any form of non-KUnit output would do the same
unless it happened to be indented as well.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We print the "test log" on failure.
This is meant to be all the kernel output that happened during the test.
But we also include the special KTAP lines in it, which are often
redundant.
E.g. we include the "not ok" line in the log, right before we print
that the test case failed...
[13:51:48] Expected 2 + 1 == 2, but
[13:51:48] 2 + 1 == 3 (0x3)
[13:51:48] not ok 1 example_simple_test
[13:51:48] [FAILED] example_simple_test
More full example after this patch:
[13:51:48] =================== example (4 subtests) ===================
[13:51:48] # example_simple_test: initializing
[13:51:48] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
[13:51:48] Expected 2 + 1 == 2, but
[13:51:48] 2 + 1 == 3 (0x3)
[13:51:48] [FAILED] example_simple_test
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Change the KUnit parser to be able to parse test output that complies with
the KTAP version 1 specification format found here:
https://kernel.org/doc/html/latest/dev-tools/ktap.html. Ensure the parser
is able to parse tests with the original KUnit test output format as
well.
KUnit parser now accepts any of the following test output formats:
Original KUnit test output format:
TAP version 14
1..1
# Subtest: kunit-test-suite
1..3
ok 1 - kunit_test_1
ok 2 - kunit_test_2
ok 3 - kunit_test_3
# kunit-test-suite: pass:3 fail:0 skip:0 total:3
# Totals: pass:3 fail:0 skip:0 total:3
ok 1 - kunit-test-suite
KTAP version 1 test output format:
KTAP version 1
1..1
KTAP version 1
1..3
ok 1 kunit_test_1
ok 2 kunit_test_2
ok 3 kunit_test_3
ok 1 kunit-test-suite
New KUnit test output format (changes made in the next patch of
this series):
KTAP version 1
1..1
KTAP version 1
# Subtest: kunit-test-suite
1..3
ok 1 kunit_test_1
ok 2 kunit_test_2
ok 3 kunit_test_3
# kunit-test-suite: pass:3 fail:0 skip:0 total:3
# Totals: pass:3 fail:0 skip:0 total:3
ok 1 kunit-test-suite
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When --raw_output is set (to any value), we don't actually parse the
test results. So asking to print the test results as json doesn't make
sense.
We internally create a fake test with one passing subtest, so --json
would actually print out something misleading.
This patch:
* Rewords the flag descriptions so hopefully this is more obvious.
* Also updates --raw_output's description to note the default behavior
is to print out only "KUnit" results (actually any KTAP results)
* also renames and refactors some related logic for clarity (e.g.
test_result => test, it's a kunit_parser.Test object).
Notably, this patch does not make it an error to specify --json and
--raw_output together. This is an edge case, but I know of at least one
wrapper around kunit.py that always sets --json. You'd never be able to
use --raw_output with that wrapper.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We currently tell people we "couldn't find any KTAP output" with no
indication as to what this might mean.
After this patch, we get:
$ ./tools/testing/kunit/kunit.py parse /dev/null
============================================================
[ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
============================================================
Testing complete. Ran 0 tests: errors: 1
Note: we could try and generate a more verbose message like
> Please check .kunit/test.log to see the raw kernel output.
or the like, but we'd need to know what the build dir was to know where
test.log actually lives.
This patch tries to make a more minimal improvement.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We're using a `with` block above, so the file object is already closed.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Let's verify that the parser isn't reporting any errors for valid
inputs.
This change also
* does result.status checking on one line
* makes sure we consistently do it outside of the `with` block
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since we're using Python 3.7+, we can use dataclasses to tersen the
code.
It also lets us create pre-populated TestCounts() objects and compare
them in our unit test. (Before, you could only create empty ones).
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
x86 Xen-for-KVM:
* Allow the Xen runstate information to cross a page boundary
* Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
* add support for 32-bit guests in SCHEDOP_poll
x86 fixes:
* One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
* Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
* Clean up the MSR filter docs.
* Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
* Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
* Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
* Advertise (on AMD) that the SMM_CTL MSR is not supported
* Remove unnecessary exports
Selftests:
* Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
* Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
* Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
* Introduce actual atomics for clear/set_bit() in selftests
Documentation:
* Remove deleted ioctls from documentation
* Various fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmOWgtsACgkQ6rmadz2v
bTpT2g//WzQRsODtPVVmg87fEo1GSTXvoXq/fhg95OKNZrVKgx1N6EVlFSLSqEjL
TAmOuv5cZT28ZpMPMNjnU/c/lFf/6/UWbbTusA+F3MtSCBSbP5DPsWDD0yvNT9DL
EZbGoQDSyt1M+BakZLzwOV6HPn9oDhj5p/4lMw+gptTY+3IeYUbS50DinM8eLz+Q
067aF01p3ROF6LNUx9Az0cLPdU05oHzL2MvRsj/F7h/sWoSW5B/1Kx/m1vsT9lwn
T2vbm6r4Jo0m0ZvpEMeRyKNZgVKIc64C7NH9CV7V66giJaONmxvLwkc0zWFwbXJ2
V9aPQbbBUx/CZXoC72LEsvVcoAFl7LAL1IALm2HVt1iQjpj1yDlWw3WV0PMQ9Rn7
xRVDOfQNGZ6jnkv6LB2j7V1z7hVENWQQwM48dgO2pAnJwYmUW9wZaAGE5kadUrZf
eCD4c1U+qcZkSk4vwvpr8ubJ0PWPMUZqI0FrHUxfPxqkdy78c1h3qNQufZvAHWff
Ca9NZqraFACTx58ZBsN1V5Xzv7azoK8Zgr9+JwVNahpFxclrbL8xuceThkC4smBl
fiZJC9fClD9ATquIdj177jNMVC8F4B5yrKF/ehJDcNQhcqUdWx9Sbj461enf+3HI
nfTP+77ZzyIJ76iRXJBV/jr9wkaPWhAZVeBGxmw5clTvB9/RBbU=
=fzwv
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2022-12-11
We've added 74 non-merge commits during the last 11 day(s) which contain
a total of 88 files changed, 3362 insertions(+), 789 deletions(-).
The main changes are:
1) Decouple prune and jump points handling in the verifier, from Andrii.
2) Do not rely on ALLOW_ERROR_INJECTION for fmod_ret, from Benjamin.
Merged from hid tree.
3) Do not zero-extend kfunc return values. Necessary fix for 32-bit archs,
from Björn.
4) Don't use rcu_users to refcount in task kfuncs, from David.
5) Three reg_state->id fixes in the verifier, from Eduard.
6) Optimize bpf_mem_alloc by reusing elements from free_by_rcu, from Hou.
7) Refactor dynptr handling in the verifier, from Kumar.
8) Remove the "/sys" mount and umount dance in {open,close}_netns
in bpf selftests, from Martin.
9) Enable sleepable support for cgrp local storage, from Yonghong.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (74 commits)
selftests/bpf: test case for relaxed prunning of active_lock.id
selftests/bpf: Add pruning test case for bpf_spin_lock
bpf: use check_ids() for active_lock comparison
selftests/bpf: verify states_equal() maintains idmap across all frames
bpf: states_equal() must build idmap for all function frames
selftests/bpf: test cases for regsafe() bug skipping check_id()
bpf: regsafe() must not skip check_ids()
docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE
selftests/bpf: Add test for dynptr reinit in user_ringbuf callback
bpf: Use memmove for bpf_dynptr_{read,write}
bpf: Move PTR_TO_STACK alignment check to process_dynptr_func
bpf: Rework check_func_arg_reg_off
bpf: Rework process_dynptr_func
bpf: Propagate errors from process_* checks in check_func_arg
bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func
bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true
bpf: Reuse freed element in free_by_rcu during allocation
selftests/bpf: Bring test_offload.py back to life
bpf: Fix comment error in fixup_kfunc_call function
bpf: Do not zero-extend kfunc return values
...
====================
Link: https://lore.kernel.org/r/20221212024701.73809-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ACPI:
* Enable FPDT support for boot-time profiling
* Fix CPU PMU probing to work better with PREEMPT_RT
* Update SMMUv3 MSI DeviceID parsing to latest IORT spec
* APMT support for probing Arm CoreSight PMU devices
CPU features:
* Advertise new SVE instructions (v2.1)
* Advertise range prefetch instruction
* Advertise CSSC ("Common Short Sequence Compression") scalar
instructions, adding things like min, max, abs, popcount
* Enable DIT (Data Independent Timing) when running in the kernel
* More conversion of system register fields over to the generated
header
CPU misfeatures:
* Workaround for Cortex-A715 erratum #2645198
Dynamic SCS:
* Support for dynamic shadow call stacks to allow switching at
runtime between Clang's SCS implementation and the CPU's
pointer authentication feature when it is supported (complete
with scary DWARF parser!)
Tracing and debug:
* Remove static ftrace in favour of, err, dynamic ftrace!
* Seperate 'struct ftrace_regs' from 'struct pt_regs' in core
ftrace and existing arch code
* Introduce and implement FTRACE_WITH_ARGS on arm64 to replace
the old FTRACE_WITH_REGS
* Extend 'crashkernel=' parameter with default value and fallback
to placement above 4G physical if initial (low) allocation
fails
SVE:
* Optimisation to avoid disabling SVE unconditionally on syscall
entry and just zeroing the non-shared state on return instead
Exceptions:
* Rework of undefined instruction handling to avoid serialisation
on global lock (this includes emulation of user accesses to the
ID registers)
Perf and PMU:
* Support for TLP filters in Hisilicon's PCIe PMU device
* Support for the DDR PMU present in Amlogic Meson G12 SoCs
* Support for the terribly-named "CoreSight PMU" architecture
from Arm (and Nvidia's implementation of said architecture)
Misc:
* Tighten up our boot protocol for systems with memory above
52 bits physical
* Const-ify static keys to satisty jump label asm constraints
* Trivial FFA driver cleanups in preparation for v1.1 support
* Export the kernel_neon_* APIs as GPL symbols
* Harden our instruction generation routines against
instrumentation
* A bunch of robustness improvements to our arch-specific selftests
* Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmOPLFAQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPRcCACLyDTvkimiqfoPxzzgdkx/6QOvw9s3/mXg
UcTORSZBR1VnYkiMYEKVz/tTfG99dnWtD8/0k/rz48NbhBfsF2sN4ukyBBXVf0zR
fjnaVyVC11LUgBgZKPo6maV+jf/JWf9hJtpPl06KTiPb2Hw2JX4DXg+PeF8t2hGx
NLH4ekQOrlDM8mlsN5mc0YsHbiuO7Xe/NRuet8TsgU4bEvLAwO6bzOLVUMqDQZNq
bQe2ENcGVAzAf7iRJb38lj9qB/5hrQTHRXqLXMSnJyyVjQEwYca0PeJMa7x30bXF
ZZ+xQ8Wq0mxiffZraf6SE34yD4gaYS4Fziw7rqvydC15vYhzJBH1
=hV+2
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The highlights this time are support for dynamically enabling and
disabling Clang's Shadow Call Stack at boot and a long-awaited
optimisation to the way in which we handle the SVE register state on
system call entry to avoid taking unnecessary traps from userspace.
Summary:
ACPI:
- Enable FPDT support for boot-time profiling
- Fix CPU PMU probing to work better with PREEMPT_RT
- Update SMMUv3 MSI DeviceID parsing to latest IORT spec
- APMT support for probing Arm CoreSight PMU devices
CPU features:
- Advertise new SVE instructions (v2.1)
- Advertise range prefetch instruction
- Advertise CSSC ("Common Short Sequence Compression") scalar
instructions, adding things like min, max, abs, popcount
- Enable DIT (Data Independent Timing) when running in the kernel
- More conversion of system register fields over to the generated
header
CPU misfeatures:
- Workaround for Cortex-A715 erratum #2645198
Dynamic SCS:
- Support for dynamic shadow call stacks to allow switching at
runtime between Clang's SCS implementation and the CPU's pointer
authentication feature when it is supported (complete with scary
DWARF parser!)
Tracing and debug:
- Remove static ftrace in favour of, err, dynamic ftrace!
- Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
and existing arch code
- Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
old FTRACE_WITH_REGS
- Extend 'crashkernel=' parameter with default value and fallback to
placement above 4G physical if initial (low) allocation fails
SVE:
- Optimisation to avoid disabling SVE unconditionally on syscall
entry and just zeroing the non-shared state on return instead
Exceptions:
- Rework of undefined instruction handling to avoid serialisation on
global lock (this includes emulation of user accesses to the ID
registers)
Perf and PMU:
- Support for TLP filters in Hisilicon's PCIe PMU device
- Support for the DDR PMU present in Amlogic Meson G12 SoCs
- Support for the terribly-named "CoreSight PMU" architecture from
Arm (and Nvidia's implementation of said architecture)
Misc:
- Tighten up our boot protocol for systems with memory above 52 bits
physical
- Const-ify static keys to satisty jump label asm constraints
- Trivial FFA driver cleanups in preparation for v1.1 support
- Export the kernel_neon_* APIs as GPL symbols
- Harden our instruction generation routines against instrumentation
- A bunch of robustness improvements to our arch-specific selftests
- Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
arm64: Prohibit instrumentation on arch_stack_walk()
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
arm64: alternatives: add __init/__initconst to some functions/variables
arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
arm64/sysreg: Convert MVFR2_EL1 to automatic generation
arm64/sysreg: Convert MVFR1_EL1 to automatic generation
arm64/sysreg: Convert MVFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
...
- Add timens support (when switching mm). This version has survived
in -next for the entire cycle (Andrei Vagin).
- Various small bug fixes, refactoring, and readability improvements
(Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin).
- Remove FOLL_FORCE for stack setup (Kees Cook).
- Whilespace cleanups (Rolf Eike Beer, Kees Cook).
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOOjsgWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqwED/9mjtKL2GwHOYKsfhtc0m4HVGBw
gxTEKuyo5mRwaRLg2bfuWe1OQfeGWQd9+IZ83Kr2ijzm4R16Gslv9i69Iwdf2tce
iFf2R+iR7On+zNokHxaNflRH9fMsZLobVFqzLvB73BUF82ybJlTR3WMnQhS6HZQB
Gse8jRfueOnVgKldRLlgdxIucPVsXYSoBS4B0nvIUuQn3aNzDNuuctMe/5NFK0ud
+TWMXtKzS3B9pcLTXy3e0bPk/Ptio18CBUEI+iLMAHswtNCoxx1ZCcuvnEcrd5Qr
h2WGaRvYJ7oSUXeEsqPKuDdhqEJQH2AQoX8FzvD+hyIutQJCJzVYlHvwGCqn/Km6
0Dalng9Pjb6z2LEie/N42LDXEQmLZO2WtJ4otpORJlsJ7ZkrLjB4u+hDU1JA/Q14
YPWvth3fMA5vAFKvGCtpEc7YdHmghmXCW+YGXOBm625fPYnwFSXOarHfow1RKNE5
MOM4l60WwzLIHgmr8AFUaLf8TbutXN+BKvbMRh2ToWzDYXEoywxAedHDyo4LVwEy
mZEca/3izT1ynBcyZg1t8shf4htgLjcPHqM0B+Hq0iNMIrwtecqAcYL/Oj6XssPx
OuQYv341KF9fV/hMy84GM2HMr0ygUmrP7b9x+PEvCwzWf/2Glaw6Z4rtCdYC+TjW
8ZWqPqEY+LRsZsL18Q==
=ZDYk
-----END PGP SIGNATURE-----
Merge tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
"Most are small refactorings and bug fixes, but three things stand out:
switching timens (which got reverted before) looks solid now,
FOLL_FORCE has been removed (no failures seen yet across several weeks
in -next), and some whitespace cleanups (which are long overdue).
- Add timens support (when switching mm). This version has survived
in -next for the entire cycle (Andrei Vagin)
- Various small bug fixes, refactoring, and readability improvements
(Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin)
- Remove FOLL_FORCE for stack setup (Kees Cook)
- Whitespace cleanups (Rolf Eike Beer, Kees Cook)"
* tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_misc: fix shift-out-of-bounds in check_special_flags
binfmt: Fix error return code in load_elf_fdpic_binary()
exec: Remove FOLL_FORCE for stack setup
binfmt_elf: replace IS_ERR() with IS_ERR_VALUE()
binfmt_elf: simplify error handling in load_elf_phdrs()
binfmt_elf: fix documented return value for load_elf_phdrs()
exec: simplify initial stack size expansion
binfmt: Fix whitespace issues
exec: Add comments on check_unsafe_exec() fs counting
ELF uapi: add spaces before '{'
selftests/timens: add a test for vfork+exit
fs/exec: switch timens when a task gets a new mm
This branch further improves nolibc testing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKoFQTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jLQwD/sEbXzDOC+CeqKrVXHjYPwM5Y+GDoPZ
hJCaT8RSvgKszaTp2wgUCQxHDbIzxGm29os5GR3+LNoWDFFc/+KHnMzkPyBQq8Kq
id3PBUFOhm2ZwEMCfAGqTeDyopKxl3mMwnOQEQsA+wSxgagc8DwQfESmNUzPQW67
K6kUaLpirn7wGuvh/5co2NzgQ9E/AO02ZGhAWrmJiInlE3Uvdo3zw3r/7OqG0MFO
uAEz+DDggOerHA3cdQNHJxPVorkpYEs5a5pJD2tzPcVfan/o41nMY9EZGAMn2vdy
ZJLXyR8HBRVJS5ovKDwTc1beb4wTxjNsYbxvv9G54FvgWJS8GzIBxEjn+cFwx9LS
frkhvPMZPsl7tHNlMQtxArxJWqomNGPnPxm5fVeaBD8khxureJCADB8xmtP39l8M
fQOPW5uBIU7dL0jMH+V7x++7/ToBZab+SZqvEbtgDV8B7ZcnEynqThkvLkVIXEPR
jWhag9M+IesNzux/nxt9Sgvrx32k38S/n44Y3wQURayTwPKQeu5Hbbu6++p4cuj/
v4pHjjhey6brT4tKtokfsAtTQqEFcjKgsUkAJtbT52grXrvJ27FHbOm5jUjHSXP+
qGvjL+UfUURTgb98W5g6RSo0IPJuBen+blDCJz9K6ThWdz8XbzksCpb7dghwlWgz
Q1grIIH62QObZg==
=lTqG
-----END PGP SIGNATURE-----
Merge tag 'nolibc.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Further improvements to nolibc testing
* tag 'nolibc.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
selftests/nolibc: Always rebuild the sysroot when running a test
selftests/nolibc: Add 7 tests for memcmp()
This pull request contains the following branches:
doc.2022.10.20a: Documentation updates. This is the second
in a series from an ongoing review of the RCU documentation.
fixes.2022.10.21a: Miscellaneous fixes.
lazy.2022.11.30a: Introduces a default-off Kconfig option that depends
on RCU_NOCB_CPU that, on CPUs mentioned in the nohz_full or
rcu_nocbs boot-argument CPU lists, causes call_rcu() to introduce
delays. These delays result in significant power savings on
nearly idle Android and ChromeOS systems. These savings range
from a few percent to more than ten percent.
This series also includes several commits that change call_rcu()
to a new call_rcu_hurry() function that avoids these delays in
a few cases, for example, where timely wakeups are required.
Several of these are outside of RCU and thus have acks and
reviews from the relevant maintainers.
srcunmisafe.2022.11.09a: Creates an srcu_read_lock_nmisafe() and an
srcu_read_unlock_nmisafe() for architectures that support NMIs,
but which do not provide NMI-safe this_cpu_inc(). These NMI-safe
SRCU functions are required by the upcoming lockless printk()
work by John Ogness et al.
That printk() series depends on these commits, so if you pull
the printk() series before this one, you will have already
pulled in this branch, plus two more SRCU commits:
0cd7e350ab ("rcu: Make SRCU mandatory")
51f5f78a4f ("srcu: Make Tiny synchronize_srcu() check for readers")
These two commits appear to work well, but do not have
sufficient testing exposure over a long enough time for me to
feel comfortable pushing them unless something in mainline is
definitely going to use them immediately, and currently only
the new printk() work uses them.
torture.2022.10.18c: Changes providing minor but important increases
in test coverage for the new RCU polled-grace-period APIs.
torturescript.2022.10.20a: Changes that avoid redundant kernel builds,
thus providing about a 30% speedup for the torture.sh acceptance
test.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKnS8THHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jCMiD/4weraRjmcLhZ3tz2vgTI8ZsXdIiCfU
vCln0AOKroVo37S4BhViVfryV2D4VFfEb1UY6EgxNFu7Jd3z0seQShZh/5r8bFMU
p0E6TC8PwyKUpQstTOwOynkw6BWGW1qeL620PpBNRAy4MkxL8AGv40tHRIHEeAzc
cCTax2+xW9ae0ZtAZHDDCUAzpYpcjScIf4OZ3tkSaFCcpWZijg+dN60dnsZ9l7h9
DtqKH61rszXAtxkmN9Fs9OY5MPCXi9Es6LVYq6KN06jqxwJRqmYf+pai3apmNIOf
P8isXOQG58tbhBLpNCG58UBSkjI2GG8Lcq6hYr6d/7Ukm7RF49q8eL7OQlVrJMuQ
Zi2DVTEAu2U3pzdTC14gi3RvqP7dO+psBs+LpGXtj4RxYvAP99e9KSRcG14j/Wwa
L52AetBzBXTCS5nhPOG8RP22d8HRZLxMe9x7T8iVCDuwH4M1zTF5cVzLeEdgPAD7
tdX4eV16PLt1AvhCEuHU/2v520gc2K9oGXLI1A6kzquXh7FflcPWl5WS+sYUbB/p
gBsblz7C3I5GgSoW4aAMnkukZiYgSvVql8ZyRwQuRzvLpYcofMpoanZbcufDjuw9
N5QzAaMmzHnBu3hOJS2WaSZRZ73fed3NO8jo8q8EMfYeWK3NAHybBdaQqSTgsO8i
s+aN+LZ4s5MnRw==
=eMOr
-----END PGP SIGNATURE-----
Merge tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates. This is the second in a series from an ongoing
review of the RCU documentation.
- Miscellaneous fixes.
- Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU
that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument
CPU lists, causes call_rcu() to introduce delays.
These delays result in significant power savings on nearly idle
Android and ChromeOS systems. These savings range from a few percent
to more than ten percent.
This series also includes several commits that change call_rcu() to a
new call_rcu_hurry() function that avoids these delays in a few
cases, for example, where timely wakeups are required. Several of
these are outside of RCU and thus have acks and reviews from the
relevant maintainers.
- Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe()
for architectures that support NMIs, but which do not provide
NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required
by the upcoming lockless printk() work by John Ogness et al.
- Changes providing minor but important increases in torture test
coverage for the new RCU polled-grace-period APIs.
- Changes to torturescript that avoid redundant kernel builds, thus
providing about a 30% speedup for the torture.sh acceptance test.
* tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
net: devinet: Reduce refcount before grace period
net: Use call_rcu_hurry() for dst_release()
workqueue: Make queue_rcu_work() use call_rcu_hurry()
percpu-refcount: Use call_rcu_hurry() for atomic switch
scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
rcu/rcutorture: Use call_rcu_hurry() where needed
rcu/rcuscale: Use call_rcu_hurry() for async reader test
rcu/sync: Use call_rcu_hurry() instead of call_rcu
rcuscale: Add laziness and kfree tests
rcu: Shrinker for lazy rcu
rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
rcu: Make call_rcu() lazy to save power
rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
srcu: Debug NMI safety even on archs that don't require it
srcu: Explain the reason behind the read side critical section on GP start
srcu: Warn when NMI-unsafe API is used in NMI
arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
rcu-tasks: Make grace-period-age message human-readable
...
The test currently fails on 32bit. Fixing the "-1ull" vs. "-1ul" seems
to make the test pass and the compiler happy.
Note: This test is not in mm-stable yet. This fix should be squashed into
"selftests/vm: add KSM unmerge tests".
Link: https://lkml.kernel.org/r/20221205193716.276024-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The compiler complains about the conversion of a pointer to an int of
different width.
Link: https://lkml.kernel.org/r/20221205193716.276024-4-david@redhat.com
Fixes: 6f1405efc6 ("selftests/vm: anon_cow: add R/O longterm tests via gup_test")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The tests fail to compile in some environments (e.g., Debian 11.5 on x86).
Let's simply conditionally define MADV_POPULATE_(READ|WRITE) if not
already defined, similar to how the khugepaged.c test handles it.
Link: https://lkml.kernel.org/r/20221205193716.276024-3-david@redhat.com
Fixes: 39b2e5cae4 ("selftests/vm: make MADV_POPULATE_(READ|WRITE) use in-tree headers")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make sure that we ignore protection of a memcg that is the target of memcg
reclaim.
Link: https://lkml.kernel.org/r/20221202031512.1365483-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Refactor the code that drives writing to memory.reclaim (retrying, error
handling, etc) from test_memcg_reclaim() to a helper called
reclaim_until(), which proactively reclaims from a memcg until its usage
reaches a certain value.
While we are at it, refactor and simplify the reclaim loop.
This will be used in a following patch in another test.
Link: https://lkml.kernel.org/r/20221202031512.1365483-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Suggested-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A DAMON sysfs user could start DAMON with a scheme, remove the sysfs
directory for the scheme, and then ask stats or schemes tried regions
update. The related logic were not aware of the already removed directory
situation, so it was able to results in invalid memory accesses. The fix
has made with commit 8468b48661 ("mm/damon/sysfs-schemes: skip stats
update if the scheme directory is removed"), though. Add a selftest to
prevent such kinds of bugs from being introduced again.
Link: https://lkml.kernel.org/r/20221201170834.62823-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add a test to measure performance of KSM breaking not triggered via
COW, but triggered by disabling KSM on an area filled with KSM pages via
MADV_UNMERGEABLE.
Link: https://lkml.kernel.org/r/20221021101141.84170-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/ksm: break_ksm() cleanups and fixes", v2.
This series cleans up and fixes break_ksm(). In summary, we no longer use
fake write faults to break COW but instead FAULT_FLAG_UNSHARE. Further,
we move away from using follow_page() --- that we can hopefully remove
completely at one point --- and use new walk_page_range_vma() instead.
Fortunately, we can get rid of VM_FAULT_WRITE and FOLL_MIGRATION in common
code now.
Extend the existing ksm tests by an unmerge benchmark, and a some new
unmerge tests.
Also, add a selftest to measure MADV_UNMERGEABLE performance. In my setup
(AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance
on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a
performance degradation of ~6% -- 7% (old: ~5250 MiB/s, new: ~4900 MiB/s).
I don't think we particularly care for now, but it's good to be aware of
the implication.
This patch (of 9):
Let's add three unmerge tests (MADV_UNMERGEABLE unmerging all pages in the
range).
test_unmerge(): basic unmerge tests
test_unmerge_discarded(): have some pte_none() entries in the range
test_unmerge_uffd_wp(): protect the merged pages using uffd-wp
ksm_tests.c currently contains a mixture of benchmarks and tests, whereby
each test is carried out by executing the ksm_tests binary with specific
parameters. Let's add new ksm_functional_tests.c that performs multiple,
smaller functional tests all at once.
Link: https://lkml.kernel.org/r/20221021101141.84170-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221021101141.84170-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Our memory management kernel CI testing at Red Hat uses the VM
selftests and we have run into two problems:
First, our LTP tests overlap with the VM selftests.
We want to avoid unhelpful redundancy in our testing practices.
Second, we have observed the current run_vmtests.sh to report overall
failure/ambiguous results in the case that a machine lacks the necessary
hardware to perform one or more of the tests. E.g. ksm tests that
require more than one numa node.
We want to be able to run the vm selftests suitable to particular hardware.
Add the ability to run one or more groups of vm tests via run_vmtests.sh
instead of simply all-or-none in order to solve these problems.
Preserve existing default behavior of running all tests when the script
is invoked with no arguments.
Documentation of test groups is included in the patch as follows:
# ./run_vmtests.sh [ -h || --help ]
usage: ./tools/testing/selftests/vm/run_vmtests.sh [ -h | -t "<categories>"]
-t: specify specific categories to tests to run
-h: display this message
The default behavior is to run all tests.
Alternatively, specific groups tests can be run by passing a string
to the -t argument containing one or more of the following categories
separated by spaces:
- mmap
tests for mmap(2)
- gup_test
tests for gup using gup_test interface
- userfaultfd
tests for userfaultfd(2)
- compaction
a test for the patch "Allow compaction of unevictable pages"
- mlock
tests for mlock(2)
- mremap
tests for mremap(2)
- hugevm
tests for very large virtual address space
- vmalloc
vmalloc smoke tests
- hmm
hmm smoke tests
- madv_populate
test memadvise(2) MADV_POPULATE_{READ,WRITE} options
- memfd_secret
test memfd_secret(2)
- process_mrelease
test process_mrelease(2)
- ksm
ksm tests that do not require >=2 NUMA nodes
- ksm_numa
ksm tests that require >=2 NUMA nodes
- pkey
memory protection key tests
- soft_dirty
test soft dirty page bit semantics
- anon_cow
test anonymous copy-on-write semantics
example: ./run_vmtests.sh -t "hmm mmap ksm"
Link: https://lkml.kernel.org/r/20221018231222.1884715-1-jsavitz@redhat.com
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
post-6.0 issues.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5Ur2AAKCRDdBJ7gKXxA
jsGmAQDWSq6z9fVgk30XpMr/X7t5c6NTPw5GocVpdwG8iqch3gEAjEs5/Kcd/mx4
d1dLaJFu1u3syessp8nJrNr1HANIog8=
=L8zu
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes.
Six for MM, three for other areas. Four of these patches address
post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
memcg: fix possible use-after-free in memcg_write_event_control()
MAINTAINERS: update Muchun Song's email
mm/gup: fix gup_pud_range() for dax
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
tmpfs: fix data loss from failed fallocate
kselftests: cgroup: update kmem test precision tolerance
mm: do not BUG_ON missing brk mapping, because userspace can unmap it
mailmap: update Matti Vaittinen's email address
Check that verifier.c:states_equal() uses check_ids() to match
consistent active_lock/map_value configurations. This allows to prune
states with active spin locks even if numerical values of
active_lock ids do not match across compared states.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test that when reg->id is not same for the same register of type
PTR_TO_MAP_VALUE between current and old explored state, we currently
return false from regsafe and continue exploring.
Without the fix in prior commit, the test case fails.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A test case that would erroneously pass verification if
verifier.c:states_equal() maintains separate register ID mappings for
call frames.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Under certain conditions it was possible for verifier.c:regsafe() to
skip check_id() call. This commit adds negative test cases previously
errorneously accepted as safe.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The Makefile was assuming that headers_install was already done in
the top source directory, and was searching for installed uapi headers
there.
Unfortunately this is not the case and we need to manually call that step.
To do so, reorder the declaration of the variables, and reuses top_srcdir
provided by lib.mk
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202212060216.a6X8Py5H-lkp@intel.com/#t
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/20221206145936.922196-6-benjamin.tissoires@redhat.com
DYNAMIC_FTRACE_WITH_DIRECT_CALLS is implicit on x86_64 but is still a
WIP on aarm64. Ensure we get it selected to not have any surprises.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/20221206145936.922196-5-benjamin.tissoires@redhat.com
1813e51eec ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
the batch size while this test case has been left behind. This has led
to a test failure reported by test bot:
not ok 2 selftests: cgroup: test_kmem # exit=1
Update the tolerance for the pcp charges to reflect the
MEMCG_CHARGE_BATCH change to fix this.
[akpm@linux-foundation.org: update comments, per Roman]
Link: https://lkml.kernel.org/r/Y4m8Unt6FhWKC6IH@dhcp22.suse.cz
Fixes: 1813e51eec ("memcg: increase MEMCG_CHARGE_BATCH to 64")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212010958.c1053bd3-yujie.liu@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MEM_REGION_TEST_DATA is meant to hold data explicitly used by a
selftest, not implicit allocations due to the selftests infrastructure.
Allocate the ucall pool from MEM_REGION_DATA much like the rest of the
selftests library allocations.
Fixes: 426729b2cf ("KVM: selftests: Add ucall pool based implementation")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-5-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
An interesting feature of the Arm architecture is that the stage-1 MMU
supports two distinct VA regions, controlled by TTBR{0,1}_EL1. As KVM
selftests on arm64 only uses TTBR0_EL1, the VA space is constrained to
[0, 2^(va_bits-1)). This is different from other architectures that
allow for addressing low and high regions of the VA space from a single
page table.
KVM selftests' VA space allocator presumes the valid address range is
split between low and high memory based the MSB, which of course is a
poor match for arm64's TTBR0 region.
Allow architectures to override the default VA space layout. Make use of
the override to align vpages_valid with the behavior of TTBR0 on arm64.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOODb0PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDztsQAInRnsgLl57/SpqhZzExNCllN6AT/bdeB3uz
rnw3ScJOV174uNKp8lnPWoTvu2YUGiVtBp6tFHhDI8le7zHX438ZT8KE5mcs8p5i
KfFKnb8SHV2DDpqkcy24c0Xl/6vsg1qkKrdfJb49yl5ZakRITDpynW/7tn6dXsxX
wASeGFdCYeW4g2xMQzsCbtx6LgeQ8uomBmzRfPrOtZHYYxAn6+4Mj4595EC1sWxM
AQnbp8tW3Vw46saEZAQvUEOGOW9q0Nls7G21YqQ52IA+ZVDK1LmAF2b1XY3edjkk
pX8EsXOURfqdasBxfSfF3SgnUazoz9GHpSzp1cTVTktrPp40rrT7Ldtml0ktq69d
1malPj47KVMDsIq0kNJGnMxciXFgAHw+VaCQX+k4zhIatNwviMbSop2fEoxj22jc
4YGgGOxaGrnvmAJhreCIbr4CkZk5CJ8Zvmtfg+QM6npIp8BY8896nvORx/d4i6tT
H4caadd8AAR56ANUyd3+KqF3x0WrkaU0PLHJLy1tKwOXJUUTjcpvIfahBAAeUlSR
qEFrtb+EEMPgAwLfNOICcNkPZR/yyuYvM+FiUQNVy5cNiwFkpztpIctfOFaHySGF
K07O2/a1F6xKL0OKRUg7hGKknF9ecmux4vHhiUMuIk9VOgNTWobHozBDorLKXMzC
aWa6oGVC
=iIPT
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
Allow variables to execute shell commands. Note, these are processed when
they are first seen while parsing the config file. This is useful if you
have the same config file used for multiple hosts (as they may be in a git
repository).
HOSTNAME := ${shell hostname}
DEFAULTS IF "${HOSTNAME}" == "frodo"
Link: https://lkml.kernel.org/r/20221207212944.277ee850@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The BPF Makefile in net/bpf did incorrect path substitution for O=dir
builds, e.g.
make O=/tmp/kselftest headers
make O=/tmp/kselftest -C tools/testing/selftests
would fail in selftest builds [1] net/ with
clang-16: error: no such file or directory: 'kselftest/net/bpf/nat6to4.c'
clang-16: error: no input files
Add a pattern prerequisite and an order-only-prerequisite (for
creating the directory), to resolve the issue.
[1] https://lore.kernel.org/all/202212060009.34CkQmCN-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 837a3d66d6 ("selftests: net: Add cross-compilation support for BPF programs")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20221206102838.272584-1-bjorn@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that Spectrum-1 gained ip6gre support we can move the test out of
the Spectrum-2 directory.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original support for bpf_user_ringbuf_drain callbacks simply
short-circuited checks for the dynptr state, allowing users to pass
PTR_TO_DYNPTR (now CONST_PTR_TO_DYNPTR) to helpers that initialize a
dynptr. This bug would have also surfaced with other dynptr helpers in
the future that changed dynptr view or modified it in some way.
Include test cases for all cases, i.e. both bpf_dynptr_from_mem and
bpf_ringbuf_reserve_dynptr, and ensure verifier rejects both of them.
Without the fix, both of these programs load and pass verification.
While at it, remove sys_nanosleep target from failure cases' SEC
definition, as there is no such tracepoint.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While check_func_arg_reg_off is the place which performs generic checks
needed by various candidates of reg->type, there is some handling for
special cases, like ARG_PTR_TO_DYNPTR, OBJ_RELEASE, and
ARG_PTR_TO_RINGBUF_MEM.
This commit aims to streamline these special cases and instead leave
other things up to argument type specific code to handle. The function
will be restrictive by default, and cover all possible cases when
OBJ_RELEASE is set, without having to update the function again (and
missing to do that being a bug).
This is done primarily for two reasons: associating back reg->type to
its argument leaves room for the list getting out of sync when a new
reg->type is supported by an arg_type.
The other case is ARG_PTR_TO_RINGBUF_MEM. The problem there is something
we already handle, whenever a release argument is expected, it should
be passed as the pointer that was received from the acquire function.
Hence zero fixed and variable offset.
There is nothing special about ARG_PTR_TO_RINGBUF_MEM, where technically
its target register type PTR_TO_MEM | MEM_RINGBUF can already be passed
with non-zero offset to other helper functions, which makes sense.
Hence, lift the arg_type_is_release check for reg->off and cover all
possible register types, instead of duplicating the same kind of check
twice for current OBJ_RELEASE arg_types (alloc_mem and ptr_to_btf_id).
For the release argument, arg_type_is_dynptr is the special case, where
we go to actual object being freed through the dynptr, so the offset of
the pointer still needs to allow fixed and variable offset and
process_dynptr_func will verify them later for the release argument case
as well.
This is not specific to ARG_PTR_TO_DYNPTR though, we will need to make
this exception for any future object on the stack that needs to be
released. In this sense, PTR_TO_STACK as a candidate for object on stack
argument is a special case for release offset checks, and they need to
be done by the helper releasing the object on stack.
Since the check has been lifted above all register type checks, remove
the duplicated check that is being done for PTR_TO_BTF_ID.
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
for use in callback state, because in case of user ringbuf helpers,
there is no dynptr on the stack that is passed into the callback. To
reflect such a state, a special register type was created.
However, some checks have been bypassed incorrectly during the addition
of this feature. First, for arg_type with MEM_UNINIT flag which
initialize a dynptr, they must be rejected for such register type.
Secondly, in the future, there are plans to add dynptr helpers that
operate on the dynptr itself and may change its offset and other
properties.
In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
to such helpers, however the current code simply returns 0.
The rejection for helpers that release the dynptr is already handled.
For fixing this, we take a step back and rework existing code in a way
that will allow fitting in all classes of helpers and have a coherent
model for dealing with the variety of use cases in which dynptr is used.
First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
fact. To make the distinction clear, use MEM_RDONLY flag to indicate
that the helper only operates on the memory pointed to by the dynptr,
not the dynptr itself. In C parlance, it would be equivalent to taking
the dynptr as a point to const argument.
When either of these flags are not present, the helper is allowed to
mutate both the dynptr itself and also the memory it points to.
Currently, the read only status of the memory is not tracked in the
dynptr, but it would be trivial to add this support inside dynptr state
of the register.
With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
better reflect its usage, it can no longer be passed to helpers that
initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
A note to reviewers is that in code that does mark_stack_slots_dynptr,
and unmark_stack_slots_dynptr, we implicitly rely on the fact that
PTR_TO_STACK reg is the only case that can reach that code path, as one
cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
both cases such helpers won't be setting that flag.
The next patch will add a couple of selftest cases to make sure this
doesn't break.
Fixes: 2057156738 ("bpf: Add bpf_user_ringbuf_drain() helper")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ARG_PTR_TO_DYNPTR is akin to ARG_PTR_TO_TIMER, ARG_PTR_TO_KPTR, where
the underlying register type is subjected to more special checks to
determine the type of object represented by the pointer and its state
consistency.
Move dynptr checks to their own 'process_dynptr_func' function so that
is consistent and in-line with existing code. This also makes it easier
to reuse this code for kfunc handling.
Then, reuse this consolidated function in kfunc dynptr handling too.
Note that for kfuncs, the arg_type constraint of DYNPTR_TYPE_LOCAL has
been lifted.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Bpftool has new extra libbpf_det_bind probing map we need to exclude.
Also skip trying to load netdevsim modules if it's already loaded (builtin).
v2:
- drop iproute2->bpftool changes (Toke)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221206232739.2504890-1-sdf@google.com
grub2 has submenus where to use grub-reboot, it requires:
grub-reboot X>Y
where X is the main index and Y is the submenu. Thus if you have:
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux ...
[...]
}
submenu 'Advanced options for Debian GNU/Linux' $menuentry_id_option ...
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64 (recovery mode)' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux test' --class debian --class gnu-linux ...
[...]
}
And wanted to boot to the "Linux test" kernel, you need to run:
# grub-reboot 1>2
As 1 is the second top menu (the submenu) and 2 is the third of the sub
menu entries.
Have the grub.cfg parsing for grub2 handle such cases.
Cc: stable@vger.kernel.org
Fixes: a15ba91361 ("ktest: Add support for grub2")
Reviewed-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
After a full run of a make_min_config test, I noticed there were a lot of
CONFIGs still enabled that really should not be. Looking at them, I
noticed they were all defined as "default y". The issue is that the test
simple removes the config and re-runs make oldconfig, which enables it
again because it is set to default 'y'. Instead, explicitly disable the
config with writing "# CONFIG_FOO is not set" to the file to keep it from
being set again.
With this change, one of my box's minconfigs went from 768 configs set,
down to 521 configs set.
Link: https://lkml.kernel.org/r/20221202115936.016fce23@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 0a05c769a9 ("ktest: Added config_bisect test type")
Reviewed-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Convert big chunks of dynptr and map_kptr subtests to use generic
verification_tester. They are switched from using manually maintained
tables of test cases, specifying program name and expected error
verifier message, to btf_decl_tag-based annotations directly on
corresponding BPF programs: __failure to specify that BPF program is
expected to fail verification, and __msg() to specify expected log
message.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207201648.2990661-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It's become a common pattern to have a collection of small BPF programs
in one BPF object file, each representing one test case. On user-space
side of such tests we maintain a table of program names and expected
failure or success, along with optional expected verifier log message.
This works, but each set of tests reimplement this mundane code over and
over again, which is a waste of time for anyone trying to add a new set
of tests. Furthermore, it's quite error prone as it's way too easy to miss
some entries in these manually maintained test tables (as evidences by
dynptr_fail tests, in which ringbuf_release_uninit_dynptr subtest was
accidentally missed; this is fixed in next patch).
So this patch implements generic test_loader, which accepts skeleton
name and handles the rest of details: opens and loads BPF object file,
making sure each program is tested in isolation. Optionally each test
case can specify expected BPF verifier log message. In case of failure,
tester makes sure to report verifier log, but it also reports verifier
log in verbose mode unconditionally.
Now, the interesting deviation from existing custom implementations is
the use of btf_decl_tag attribute to specify expected-to-fail vs
expected-to-succeed markers and, optionally, expected log message
directly next to BPF program source code, eliminating the need to
manually create and update table of tests.
We define few macros wrapping btf_decl_tag with a convention that all
values of btf_decl_tag start with "comment:" prefix, and then utilizing
a very simple "just_some_text_tag" or "some_key_name=<value>" pattern to
define things like expected success/failure, expected verifier message,
extra verifier log level (if necessary). This approach is demonstrated
by next patch in which two existing sets of failure tests are converted.
Tester supports both expected-to-fail and expected-to-succeed programs,
though this patch set didn't convert any existing expected-to-succeed
programs yet, as existing tests couple BPF program loading with their
further execution through attach or test_prog_run. One way to allow
testing scenarios like this would be ability to specify custom callback,
executed for each successfully loaded BPF program. This is left for
follow up patches, after some more analysis of existing test cases.
This test_loader is, hopefully, a start of a test_verifier-like runner,
but integrated into test_progs infrastructure. It will allow much better
"user experience" of defining low-level verification tests that can take
advantage of all the libbpf-provided nicety features on BPF side: global
variables, declarative maps, etc. All while having a choice of defining
it in C or as BPF assembly (through __attribute__((naked)) functions and
using embedded asm), depending on what makes most sense in each
particular case. This will be explored in follow up patches as well.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221207201648.2990661-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cited commit added the table ID to the FIB info structure, but did not
properly initialize it when table ID 0 is used. This can lead to a route
in the default VRF with a preferred source address not being flushed
when the address is deleted.
Consider the following example:
# ip address add dev dummy1 192.0.2.1/28
# ip address add dev dummy1 192.0.2.17/28
# ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100
# ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Both routes are installed in the default VRF, but they are using two
different FIB info structures. One with a metric of 100 and table ID of
254 (main) and one with a metric of 200 and table ID of 0. Therefore,
when the preferred source address is deleted from the default VRF,
the second route is not flushed:
# ip address del dev dummy1 192.0.2.17/28
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Fix by storing a table ID of 254 instead of 0 in the route configuration
structure.
Add a test case that fails before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [FAIL]
Tests passed: 8
Tests failed: 1
And passes after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [ OK ]
Tests passed: 9
Tests failed: 0
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Reported-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cited commit added the table ID to the FIB info structure, but did not
prevent structures with different table IDs from being consolidated.
This can lead to routes being flushed from a VRF when an address is
deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB
info. This is already done for FIB info structures backed by a nexthop
object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [FAIL]
TEST: Route in default VRF not removed [ OK ]
RTNETLINK answers: File exists
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6
Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8
Tests failed: 0
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A series of prior patches added some kfuncs that allow struct
task_struct * objects to be used as kptrs. These kfuncs leveraged the
'refcount_t rcu_users' field of the task for performing refcounting.
This field was used instead of 'refcount_t usage', as we wanted to
leverage the safety provided by RCU for ensuring a task's lifetime.
A struct task_struct is refcounted by two different refcount_t fields:
1. p->usage: The "true" refcount field which task lifetime. The
task is freed as soon as this refcount drops to 0.
2. p->rcu_users: An "RCU users" refcount field which is statically
initialized to 2, and is co-located in a union with
a struct rcu_head field (p->rcu). p->rcu_users
essentially encapsulates a single p->usage
refcount, and when p->rcu_users goes to 0, an RCU
callback is scheduled on the struct rcu_head which
decrements the p->usage refcount.
Our logic was that by using p->rcu_users, we would be able to use RCU to
safely issue refcount_inc_not_zero() a task's rcu_users field to
determine if a task could still be acquired, or was exiting.
Unfortunately, this does not work due to p->rcu_users and p->rcu sharing
a union. When p->rcu_users goes to 0, an RCU callback is scheduled to
drop a single p->usage refcount, and because the fields share a union,
the refcount immediately becomes nonzero again after the callback is
scheduled.
If we were to split the fields out of the union, this wouldn't be a
problem. Doing so should also be rather non-controversial, as there are
a number of places in struct task_struct that have padding which we
could use to avoid growing the structure by splitting up the fields.
For now, so as to fix the kfuncs to be correct, this patch instead
updates bpf_task_acquire() and bpf_task_release() to use the p->usage
field for refcounting via the get_task_struct() and put_task_struct()
functions. Because we can no longer rely on RCU, the change also guts
the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions
pending a resolution on the above problem.
In addition, the task fixes the kfunc and rcu_read_lock selftests to
expect this new behavior.
Fixes: 90660309b0 ("bpf: Add kfuncs for storing struct task_struct * as a kptr")
Fixes: fca1aa7551 ("bpf: Handle MEM_RCU type properly")
Reported-by: Matus Jokay <matus.jokay@stuba.sk>
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When installing the selftests using
"make -C tools/testing/selftests install", we need to make sure
all the required files to run the selftests are installed. Let's
make sure this is the case.
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221205131618.1524337-2-daan.j.demeyer@gmail.com
It is useful to use vmlinux.h in the xfrm_info test like other kfunc
tests do. In particular, it is common for kfunc bpf prog that requires
to use other core kernel structures in vmlinux.h
Although vmlinux.h is preferred, it needs a ___local flavor of
struct bpf_xfrm_info in order to build the bpf selftests
when CONFIG_XFRM_INTERFACE=[m|n].
Cc: Eyal Birger <eyal.birger@gmail.com>
Fixes: 90a3a05eb3 ("selftests/bpf: add xfrm_info tests")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221206193554.1059757-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* for-next/selftests:
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
kselftest/arm64: Set test names prior to starting children
kselftest/arm64: Use preferred form for predicate load/stores
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: Print ASCII version of unknown signal frame magic values
kselftest/arm64: Remove validation of extra_context from TODO
kselftest/arm64: Provide progress messages when signalling children
kselftest/arm64: Check that all children are producing output in fp-stress
This is a fairly sedate release for the core code, but there's been a
lot of driver work especially around the x86 platforms and device tree
updates:
- More cleanups of the DAPM code from Morimoto-san.
- Factoring out of mapping hw_params onto SoundWire configuration by
Charles Keepax.
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740.
- Lots of device tree conversions and fixups.
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
There's more cross tree updates than usual, though all fairly minor:
- Some OMAP board file updates that were depedencies for removing their
providers in ASoC, as part of a wider effort removing the support for
the relevant OMAP platforms.
- A new I2C API required for updates to the new I2C probe API.
- A DRM update making use of a new API for fixing the capabilities
advertised via hdmi-codec.
Since this is being sent early I might send some more stuff if you've
not yet sent your pull request and there's more come in.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmOOO08ACgkQJNaLcl1U
h9BiIAf+O8zqcxtokSTS3wsoFHDp2zprTTUk3RZbxaYMFTmFrALbQVUx5EdJ179X
Dr9rL9T2iNBN7YGhDvmzoVvLrDncJEerMk7cgbc88a5tkB7ZipI7E9QliZt92QaP
ihrXAyD7ekLC6QLXrHFNmzwUeFspgtsBkT3MakcfcucuMN53UQbFwD1r0ziCJOMZ
gFnkZKBemUDsURxWR+LCveB/WYKNrZ6Bjokdg8HY1yenACMiXaoWedmIFJ2SxId3
NNS8vqUDUSEFhDaeJVBo1Ow9m8fe+dZBUeVk9Ej7bGEXxYptwtu3aODXknySa8Eb
YVlalR9qQPvaI1688WRleRWp+oyL4Q==
=x1AC
-----END PGP SIGNATURE-----
Merge tag 'asoc-v6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Updates for v6.2
This is a fairly sedate release for the core code, but there's been a
lot of driver work especially around the x86 platforms and device tree
updates:
- More cleanups of the DAPM code from Morimoto-san.
- Factoring out of mapping hw_params onto SoundWire configuration by
Charles Keepax.
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740.
- Lots of device tree conversions and fixups.
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
There's more cross tree updates than usual, though all fairly minor:
- Some OMAP board file updates that were depedencies for removing their
providers in ASoC, as part of a wider effort removing the support for
the relevant OMAP platforms.
- A new I2C API required for updates to the new I2C probe API.
- A DRM update making use of a new API for fixing the capabilities
advertised via hdmi-codec.
Since this is being sent early I might send some more stuff if you've
not yet sent your pull request and there's more come in.
Test the xfrm_info kfunc helpers.
The test setup creates three name spaces - NS0, NS1, NS2.
XFRM tunnels are setup between NS0 and the two other NSs.
The kfunc helpers are used to steer traffic from NS0 to the other
NSs based on a userspace populated bpf global variable and validate
that the return traffic had arrived from the desired NS.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20221203084659.1837829-5-eyal.birger@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The typical environment where cxl_test is run, QEMU, does not support
cpu_cache_invalidate_memregion(). Add the 'test' bypass symbols to the
configuration check.
Reported-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167026948179.3527561.4535373655515827457.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pick up support for "XOR" interleave math when parsing ACPI CFMWS window
structures. Fix up conflicts with the RCH emulation already pending in
cxl/next.
In an RCH topology a CXL host-bridge as Root Complex Integrated Endpoint
the represents the memory expander. Unlike a VH topology there is no
CXL/PCIE Root Port that host the endpoint. The CXL subsystem maps this
as the CXL root object (ACPI0017 on ACPI based systems) targeting the
host-bridge as a dport, per usual, but then that dport directly hosts
the endpoint port.
Mock up that configuration with a 4th host-bridge that has a 'cxl_rcd'
device instance as its immediate child.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/166993046170.1882361.12460762475782283638.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* kvm-arm64/dirty-ring:
: .
: Add support for the "per-vcpu dirty-ring tracking with a bitmap
: and sprinkles on top", courtesy of Gavin Shan.
:
: This branch drags the kvmarm-fixes-6.1-3 tag which was already
: merged in 6.1-rc4 so that the branch is in a working state.
: .
KVM: Push dirty information unconditionally to backup bitmap
KVM: selftests: Automate choosing dirty ring size in dirty_log_test
KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
KVM: selftests: Use host page size to map ring buffer in dirty_log_test
KVM: arm64: Enable ring-based dirty memory tracking
KVM: Support dirty ring in conjunction with bitmap
KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/access-tracking:
: .
: Small series to add support for arm64 to access_tracking_perf_test and
: correct a couple bugs along the way.
:
: Patches courtesy of Oliver Upton.
: .
KVM: selftests: Build access_tracking_perf_test for arm64
KVM: selftests: Have perf_test_util signal when to stop vCPUs
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/s2-faults:
: .
: New KVM/arm64 selftests exercising various sorts of S2 faults, courtesy
: of Ricardo Koller. From the cover letter:
:
: "This series adds a new aarch64 selftest for testing stage 2 fault handling
: for various combinations of guest accesses (e.g., write, S1PTW), backing
: sources (e.g., anon), and types of faults (e.g., read on hugetlbfs with a
: hole, write on a readonly memslot). Each test tries a different combination
: and then checks that the access results in the right behavior (e.g., uffd
: faults with the right address and write/read flag). [...]"
: .
KVM: selftests: aarch64: Add mix of tests into page_fault_test
KVM: selftests: aarch64: Add readonly memslot tests into page_fault_test
KVM: selftests: aarch64: Add dirty logging tests into page_fault_test
KVM: selftests: aarch64: Add userfaultfd tests into page_fault_test
KVM: selftests: aarch64: Add aarch64/page_fault_test
KVM: selftests: Use the right memslot for code, page-tables, and data allocations
KVM: selftests: Fix alignment in virt_arch_pgd_alloc() and vm_vaddr_alloc()
KVM: selftests: Add vm->memslots[] and enum kvm_mem_region_type
KVM: selftests: Stash backing_src_type in struct userspace_mem_region
tools: Copy bitfield.h from the kernel sources
KVM: selftests: aarch64: Construct DEFAULT_MAIR_EL1 using sysreg.h macros
KVM: selftests: Add missing close and munmap in __vm_mem_region_delete()
KVM: selftests: aarch64: Add virt_get_pte_hva() library function
KVM: selftests: Add a userfaultfd library
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/linked-bps:
: .
: Additional selftests for the arm64 breakpoints/watchpoints,
: courtesy of Reiji Watanabe. From the cover letter:
:
: "This series adds test cases for linked {break,watch}points to the
: debug-exceptions test, and expands {break,watch}point tests to
: use non-zero {break,watch}points (the current test always uses
: {break,watch}point#0)."
: .
KVM: arm64: selftests: Test with every breakpoint/watchpoint
KVM: arm64: selftests: Add a test case for a linked watchpoint
KVM: arm64: selftests: Add a test case for a linked breakpoint
KVM: arm64: selftests: Change debug_version() to take ID_AA64DFR0_EL1
KVM: arm64: selftests: Stop unnecessary test stage tracking of debug-exceptions
KVM: arm64: selftests: Add helpers to enable debug exceptions
KVM: arm64: selftests: Remove the hard-coded {b,w}pn#0 from debug-exceptions
KVM: arm64: selftests: Add write_dbg{b,w}{c,v}r helpers in debug-exceptions
KVM: arm64: selftests: Use FIELD_GET() to extract ID register fields
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/memslot-fixes:
: .
: KVM memslot selftest fixes for non-4kB page sizes, courtesy
: of Gavin Shan. From the cover letter:
:
: "kvm/selftests/memslots_perf_test doesn't work with 64KB-page-size-host
: and 4KB-page-size-guest on aarch64. In the implementation, the host and
: guest page size have been hardcoded to 4KB. It's ovbiously not working
: on aarch64 which supports 4KB, 16KB, 64KB individually on host and guest.
:
: This series tries to fix it. After the series is applied, the test runs
: successfully with 64KB-page-size-host and 4KB-page-size-guest."
: .
KVM: selftests: memslot_perf_test: Report optimal memory slots
KVM: selftests: memslot_perf_test: Consolidate memory
KVM: selftests: memslot_perf_test: Support variable guest page size
KVM: selftests: memslot_perf_test: Probe memory slots for once
KVM: selftests: memslot_perf_test: Consolidate loop conditions in prepare_vm()
KVM: selftests: memslot_perf_test: Use data->nslots in prepare_vm()
Signed-off-by: Marc Zyngier <maz@kernel.org>
In check_all_cpu_dscr_defaults, opendir() opens the directory stream.
Add missing closedir() in the error path to release it.
In check_cpu_dscr_default, open() creates an open file descriptor.
Add missing close() in the error path to release it.
Fixes: ebd5858c90 ("selftests/powerpc: Add test for all DSCR sysfs interfaces")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221205084429.570654-1-linmq006@gmail.com
Both tolower and toupper are built in c functions, we should not
redefine them as this can result in a build error.
Fixes the following errors:
progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
10 | static inline char tolower(char c)
| ^~~~~~~
progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>'
4 | #include <bpf/bpf_helpers.h>
+++ |+#include <ctype.h>
5 |
progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
17 | static inline char toupper(char c)
| ^~~~~~~
progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>'
See background on this sort of issue:
https://stackoverflow.com/a/20582607https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213
(C99, 7.1.3p1) "All identifiers with external linkage in any of the
following subclauses (including the future library directions) are
always reserved for use as identifiers with external linkage."
This is documented behavior in GCC:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221203010847.2191265-1-james.hilliard1@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add three tests for cgrp local storage support for sleepable progs.
Two tests can load and run properly, one for cgroup_iter, another
for passing current->cgroups->dfl_cgrp to bpf_cgrp_storage_get()
helper. One test has bpf_rcu_read_lock() and failed to load.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221201050449.2785613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin mentioned that the verifier cannot assume arguments from
LSM hook sk_alloc_security being trusted since after the hook
is called, the sk ref_count is set to 1. This will overwrite
the ref_count changed by the bpf program and may cause ref_count
underflow later on.
I then further checked some other hooks. For example,
for bpf_lsm_file_alloc() hook in fs/file_table.c,
f->f_cred = get_cred(cred);
error = security_file_alloc(f);
if (unlikely(error)) {
file_free_rcu(&f->f_rcuhead);
return ERR_PTR(error);
}
atomic_long_set(&f->f_count, 1);
The input parameter 'f' to security_file_alloc() cannot be trusted
as well.
Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free
lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other
hooks should not be considered as trusted. This may not be a complete
list, but it covers common usage for sk and task.
Fixes: 3f00c52393 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add MEM_RCU pointer null checking for related tests. Also
modified task_acquire test so it takes a rcu ptr 'ptr' where
'ptr = rcu_ptr->rcu_field'.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221203184607.478314-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Expand the cxl_test topology to include CFMWS's that use XOR math
for interleave arithmetic, as defined in the CXL Specification 3.0.
With this expanded topology, cxl_test is useful for testing:
x1,x2,x4 ways with XOR interleave arithmetic.
Define the additional XOR CFMWS entries to appear only with the
module parameter interleave_arithmetic=1. The cxl_test default
continues to be modulo math.
modprobe cxl_test interleave_arithmetic=1
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/54670400cd48ba7fcc6d8ee0d6ae2276d3f51aad.1669847017.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A downstream port must be connected to a component register block.
For restricted hosts the base address is determined from the RCRB. The
RCRB is provided by the host's CEDT CHBS entry. Rework CEDT parser to
get the RCRB and add code to extract the component register block from
it.
RCRB's BAR[0..1] point to the component block containing CXL subsystem
component registers. MEMBAR extraction follows the PCI base spec here,
esp. 64 bit extraction and memory range alignment (6.0, 7.5.1.2.1). The
RCRB base address is cached in the cxl_dport per-host bridge so that the
upstream port component registers can be retrieved later by an RCD
(RCIEP) associated with the host bridge.
Note: Right now the component register block is used for HDM decoder
capability only which is optional for RCDs. If unsupported by the RCD,
the HDM init will fail. It is future work to bypass it in this case.
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/Y4dsGZ24aJlxSfI1@rric.localdomain
[djbw: introduce devm_cxl_add_rch_dport()]
Link: https://lore.kernel.org/r/166993044524.1882361.2539922887413208807.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Accept any cxl_test topology device as the first argument in
cxl_chbs_context.
This is in preparation for reworking the detection of the component
registers across VH and RCH topologies. Move
mock_acpi_table_parse_cedt() beneath the definition of is_mock_port()
and use is_mock_port() instead of the explicit mock cxl_acpi device
check.
Acked-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166993043433.1882361.17651413716599606118.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this using "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/net`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1669864248-829-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When testing in kci_test_ipsec_offload, srcip is configured as $dstip,
it should add xfrm policy rule in instead of out.
The test result of this patch is as follows:
PASS: ipsec_offload
Fixes: 2766a11161 ("selftests: rtnetlink: add ipsec offload API test")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221201082246.14131-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d2825fa936 ("crypto: sm3,sm4 - move into crypto directory") moves
SM3 and SM4 algorithm implementations from stand-alone library to crypto
API. The corresponding configuration options for the API version (generic)
are CONFIG_CRYPTO_SM3_GENERIC and CONFIG_CRYPTO_SM4_GENERIC, respectively.
Replace option selected in selftests configuration from the library version
to the API version.
Fixes: d2825fa936 ("crypto: sm3,sm4 - move into crypto directory")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221201131852.38501-1-tianjia.zhang@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bpf_legacy.h header uses llvm specific load functions, add
GCC compatible variants as well to fix tests using these functions
under GCC.
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221201190939.3230513-1-james.hilliard1@gmail.com
In the "mode_filter_without_nnp" test in seccomp_bpf, there is currently
a TODO which asks to check the capability CAP_SYS_ADMIN instead of euid.
This patch adds support to check if the calling process has the flag
CAP_SYS_ADMIN, and also if this flag has CAP_EFFECTIVE set.
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220731092529.28760-1-gautammenghani201@gmail.com
There is a spelling mistake in some help text. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-Id: <20221201091354.1613652-1-colin.i.king@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to
match the kernel nomenclature where test_and_set_bit() is atomic,
and __test_and_set_bit() provides the non-atomic variant.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without affecting
users that don't want atomic operations.
Leave the usage in ucall_free() as-is, it's the one place in tools/ that
actually wants/needs atomic behavior.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new ucall hook, GUEST_UCALL_NONE(), to allow tests to make ucalls
without allocating a ucall struct, and use it to enable single-step
in ARM's debug-exceptions test. Like the disable single-step path, the
enabling path also needs to ensure that no exclusive access sequences are
attempted after enabling single-step, as the exclusive monitor is cleared
on ERET from the debug exception taken to EL2.
The test currently "works" because clear_bit() isn't actually an atomic
operation... yet.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
- Clean up the MSR filter docs.
- Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmOJQesSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5IR4QAKGPbLRykY/2FohV2HDu5fDPxA2Fe9nu
5W7ZIptQu+tQtCTWKFEjcQdwYoNrLbr0hr1eGubVbIvBqJbwPQfH7G0765eOIcvy
s6Zn2N24IisIoUxdkJGOL3Tt1UR7wCFbwC+ms0i4FQvMcw+TbM0BTHgJDdwR5laX
mGN7ubz5iImwDFFE3Bd8Qy5I+FGL9CI60l+RzK6b7J8HYi1wOBMLU9QueF/dB7gR
g+navZJAAnvN6YIkjP5j8yPBuvhDzni379ue5ATDq1ALvyyI7xlYALsxpUjCnLuo
CkbvgmfmC94Vdm7pzFgsbazUN2oIjwoimjFQHP1bf8Jmd+770R282JpnwiD/ydCV
Tl2ArwXA2zxVxNZm9g/XqPBwWBWWgWfYIQbuuxc065MnXCnHkY5UGGf0JNx41CDl
hdtm9DHkft8+6kyBBmgkdKxd328Znljq02v3nLePUipfpDVaNj4VAUj9IpV6Lpuj
GJjs4Wx7oqFwH1Im/LqZgnOGwgkSj3ObHtkYx2RSrQAQultbjuplFz2qZWP8PF6A
FrJbcddKOmLINrdNOlvTd5WKCAjtV8vycjFkk+/7H67rpZdM8AI1StrzMP6gmwg4
ARozZJ2UF8nTriRYFQbFQyNm9bBTZ7YQ/HajqfhvCuZLi7i1EaImhC0F1xn7IU5S
6XhvQPvjRTgS
=i6OA
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.2-1' of https://github.com/kvm-x86/linux into HEAD
Misc KVM x86 fixes and cleanups for 6.2:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
- Clean up the MSR filter docs.
- Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
On 32bit the trigger-synthetic-eprobe.tc selftest fails with the error:
hist:syscalls:sys_exit_openat: error: Param type doesn't match synthetic event field type
Command: hist:keys=common_pid:filename=$__arg__1,ret=ret:onmatch(syscalls.sys_enter_openat).trace(synth_open,$filename,$ret)
^
This is because the synth_open synthetic event is created with:
echo "$SYNTH u64 filename; s64 ret;" > synthetic_events
Which works fine on 64 bit, as filename is a pointer and the return is
also a long. But for 32 bit architectures, it doesn't work.
Use "unsigned long" and "long" instead so that it works for both 64 bit
and 32 bit architectures.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- malloc() does not zero the buffer,
- fread() does not null-terminate it's output,
- `cat /proc/sys/kernel/core_pattern | hexdump -C` shows the file is
not inherently null-terminated
So using string operations on the buffer is risky. Explicitly add a null
character to the end to make it safer.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-3-bgray@linux.ibm.com
No need to write inline asm for mtspr/mfspr, we have macros for this
in reg.h
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-2-bgray@linux.ibm.com
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this using "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/powerpc`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1669862997-31335-1-git-send-email-yangtiezhu@loongson.cn
This test adds a basic HSRv0 network with 3 nodes. In its current shape
it sends and forwards packets, announcements and so merges nodes based
on MAC A/B information.
It is able to detect duplicate packets and packetloss should any occur.
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds test coverage for listening sockets created by the
in-kernel path manager in mptcp_join.sh.
It adds the listener event checking in the existing "remove single
address with port" test. The output looks like this:
003 remove single address with port syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ] - pt [ ok ]
syn[ ok ] - synack[ ok ] - ack[ ok ]
syn[ ok ] - ack [ ok ]
rm [ ok ] - rmsf [ ok ] invert
CREATE_LISTENER 10.0.2.1:10100[ ok ]
CLOSE_LISTENER 10.0.2.1:10100 [ ok ]
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch moves evts_ns1 and evts_ns2 out of do_transfer() as two global
variables in mptcp_join.sh. Init them in init() and remove them in
cleanup().
Add a new helper reset_with_events() to save the outputs of 'pm_nl_ctl
events' command in them. And a new helper kill_events_pids() to kill
pids of 'pm_nl_ctl events' command. Use these helpers in userspace pm
tests.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds test coverage for listening sockets created by userspace
processes.
It adds a new test named test_listener() and a new verifying helper
verify_listener_events(). The new output looks like this:
CREATE_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK]
DESTROY_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK]
MP_PRIO TX [OK]
MP_PRIO RX [OK]
CREATE_LISTENER 10.0.2.2:37106 [OK]
CLOSE_LISTENER 10.0.2.2:37106 [OK]
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch makes server_evts and client_evts global in userspace_pm.sh,
then these two variables could be used in test_announce(), test_remove()
and test_subflows(). The local variable 'evts' in these three functions
then could be dropped.
Also move local variable 'file' as a global one.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some userspace pm tests failed since pm listener events have been added.
Now MPTCP_EVENT_LISTENER_CREATED event becomes the first item in the
events list like this:
type:15,family:2,sport:10006,saddr4:0.0.0.0
type:1,token:3701282876,server_side:1,family:2,saddr4:10.0.1.1,...
And no token value in this MPTCP_EVENT_LISTENER_CREATED event.
This patch fixes this by specifying the type 1 item to search for token
values.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Just to avoid classical Bash pitfall where variables are accidentally
overridden by other functions because the proper scope has not been
defined.
That's also what is done in other MPTCP selftests scripts where all non
local variables are defined at the beginning of the script and the
others are defined with the "local" keyword.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is clearer to declare these global variables at the beginning of the
file as it is done in other MPTCP selftests rather than in functions in
the middle of the script.
So for uniformity reason, we can do the same here in mptcp_sockopt.sh.
Suggested-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The definition of 'rndh' was probably copied from one script to another
but some times, 'sec' was not defined, not used and/or not spelled
properly.
Here all the 'rndh' are now defined the same way.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some variables were set but never used.
This was not causing any issues except adding some confusion and having
shellcheck complaining about them.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new "sandbox" net namespace is available where no other netfilter
rules have been added.
Use this new netns instead of re-using "ns1" and clean it.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Modify list_push_pop_multiple to alloc and insert nodes 2-at-a-time.
Without the previous patch's fix, this block of code:
bpf_spin_lock(lock);
bpf_list_push_front(head, &f[i]->node);
bpf_list_push_front(head, &f[i + 1]->node);
bpf_spin_unlock(lock);
would fail check_reference_leak check as release_on_unlock logic would miss
a ref that should've been released.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221201183406.1203621-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Define and use kvm_static_assert() in the common KVM selftests headers to
provide deterministic behavior, and to allow creating static asserts
without dummy messages.
The kernel's static_assert() makes the message param optional, and on the
surface, tools/include/linux/build_bug.h appears to follow suit. However,
glibc may override static_assert() and redefine it as a direct alias of
_Static_assert(), which makes the message parameter mandatory. This leads
to non-deterministic behavior as KVM selftests code that utilizes
static_assert() without a custom message may or not compile depending on
the order of includes. E.g. recently added asserts in
x86_64/processor.h fail on some systems with errors like
In file included from lib/memstress.c:11:0:
include/x86_64/processor.h: In function ‘this_cpu_has_p’:
include/x86_64/processor.h:193:34: error: expected ‘,’ before ‘)’ token
static_assert(low_bit < high_bit); \
^
due to _Static_assert() expecting a comma before a message. The "message
optional" version of static_assert() uses macro magic to strip away the
comma when presented with empty an __VA_ARGS__
#ifndef static_assert
#define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
#define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
#endif // static_assert
and effectively generates "_Static_assert(expr, #expr)".
The incompatible version of static_assert() gets defined by this snippet
in /usr/include/assert.h:
#if defined __USE_ISOC11 && !defined __cplusplus
# undef static_assert
# define static_assert _Static_assert
#endif
which yields "_Static_assert(expr)" and thus fails as above.
KVM selftests don't actually care about using C11, but __USE_ISOC11 gets
defined because of _GNU_SOURCE, which many tests do #define. _GNU_SOURCE
triggers a massive pile of defines in /usr/include/features.h, including
_ISOC11_SOURCE:
/* If _GNU_SOURCE was defined by the user, turn on all the other features. */
#ifdef _GNU_SOURCE
# undef _ISOC95_SOURCE
# define _ISOC95_SOURCE 1
# undef _ISOC99_SOURCE
# define _ISOC99_SOURCE 1
# undef _ISOC11_SOURCE
# define _ISOC11_SOURCE 1
# undef _POSIX_SOURCE
# define _POSIX_SOURCE 1
# undef _POSIX_C_SOURCE
# define _POSIX_C_SOURCE 200809L
# undef _XOPEN_SOURCE
# define _XOPEN_SOURCE 700
# undef _XOPEN_SOURCE_EXTENDED
# define _XOPEN_SOURCE_EXTENDED 1
# undef _LARGEFILE64_SOURCE
# define _LARGEFILE64_SOURCE 1
# undef _DEFAULT_SOURCE
# define _DEFAULT_SOURCE 1
# undef _ATFILE_SOURCE
# define _ATFILE_SOURCE 1
#endif
which further down in /usr/include/features.h leads to:
/* This is to enable the ISO C11 extension. */
#if (defined _ISOC11_SOURCE \
|| (defined __STDC_VERSION__ && __STDC_VERSION__ >= 201112L))
# define __USE_ISOC11 1
#endif
To make matters worse, /usr/include/assert.h doesn't guard against
multiple inclusion by turning itself into a nop, but instead #undefs a
few macros and continues on. As a result, it's all but impossible to
ensure the "message optional" version of static_assert() will actually be
used, e.g. explicitly including assert.h and #undef'ing static_assert()
doesn't work as a later inclusion of assert.h will again redefine its
version.
#ifdef _ASSERT_H
# undef _ASSERT_H
# undef assert
# undef __ASSERT_VOID_CAST
# ifdef __USE_GNU
# undef assert_perror
# endif
#endif /* assert.h */
#define _ASSERT_H 1
#include <features.h>
Fixes: fcba483e82 ("KVM: selftests: Sanity check input to ioctls() at build time")
Fixes: ee37955366 ("KVM: selftests: Refactor X86_FEATURE_* framework to prep for X86_PROPERTY_*")
Fixes: 53a7dc0f21 ("KVM: selftests: Add X86_PROPERTY_* framework to retrieve CPUID values")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221122013309.1872347-1-seanjc@google.com
Move the AMX test's kvm_cpu_has() checks before creating the VM+vCPU,
there are no dependencies between the two operations. Opportunistically
add a comment to call out that enabling off-by-default XSAVE-managed
features must be done before KVM_GET_SUPPORTED_CPUID is cached.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-5-seanjc@google.com
Disallow using kvm_get_supported_cpuid() and thus caching KVM's supported
CPUID info before enabling XSAVE-managed features that are off-by-default
and must be enabled by ARCH_REQ_XCOMP_GUEST_PERM. Caching the supported
CPUID before all XSAVE features are enabled can result in false negatives
due to testing features that were cached before they were enabled.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-4-seanjc@google.com
Move __vm_xsave_require_permission() below the CPUID helpers so that a
future change can reference the cached result of KVM_GET_SUPPORTED_CPUID
while keeping the definition of the variable close to its intended user,
kvm_get_supported_cpuid().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-3-seanjc@google.com
Move the kvm_cpu_has() check on X86_FEATURE_XFD out of the helper to
enable off-by-default XSAVE-managed features and into the one test that
currenty requires XFD (XFeature Disable) support. kvm_cpu_has() uses
kvm_get_supported_cpuid() and thus caches KVM_GET_SUPPORTED_CPUID, and so
using kvm_cpu_has() before ARCH_REQ_XCOMP_GUEST_PERM effectively results
in the test caching stale values, e.g. subsequent checks on AMX_TILE will
get false negatives.
Although off-by-default features are nonsensical without XFD, checking
for XFD virtualization prior to enabling such features isn't strictly
required.
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Fixes: 7fbb653e01 ("KVM: selftests: Check KVM's supported CPUID, not host CPUID, for XFD")
Link: https://lore.kernel.org/r/20221125023839.315207-1-lei4.wang@intel.com
[sean: add Fixes, reword changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-2-seanjc@google.com
Restore the assert (on x86-64) that <10% of pages are still idle when NOT
running as a nested VM in the access tracking test. The original assert
was converted to a "warning" to avoid false failures when running the
test in a VM, but the non-nested case does not suffer from the same
"infinite TLB size" issue.
Using the HYPERVISOR flag isn't infallible as VMMs aren't strictly
required to enumerate the "feature" in CPUID, but practically speaking
anyone that is running KVM selftests in VMs is going to be using a VMM
and hypervisor that sets the HYPERVISOR flag.
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-3-seanjc@google.com
Warn if the number of idle pages is greater than or equal to 10% of the
total number of pages, not if the percentage of idle pages is less than
10%. The original code asserted that less than 10% of pages were still
idle, but the check got inverted when the assert was converted to a
warning.
Opportunistically clean up the warning; selftests are 64-bit only, there
is no need to use "%PRIu64" instead of "%lu".
Fixes: 6336a810db ("KVM: selftests: replace assertion with warning in access_tracking_perf_test")
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-2-seanjc@google.com
nfit_test overrode the security_show() sysfs attribute function in nvdimm
dimm_devs in order to allow testing of security unlock. With the
introduction of CXL security commands, the trick to override
security_show() becomes significantly more complicated. By introdcing a
security flag CONFIG_NVDIMM_SECURITY_TEST, libnvdimm can just toggle the
check via a compile option. In addition the original override can can be
removed from tools/testing/nvdimm/.
The flag will also be used to bypass cpu_cache_invalidate_memregion() when
set in a different commit. This allows testing on QEMU with nfit_test or
cxl_test since cpu_cache_has_invalidate_memregion() checks whether
X86_FEATURE_HYPERVISOR cpu feature flag is set on x86.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983618758.2734609.18031639517065867138.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The mock cxl mem devs needs a way to go into "locked" status to simulate
when the platform is rebooted. Add a sysfs mechanism so the device security
state is set to "locked" and the frozen state bits are cleared.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983617602.2734609.7042497620931694717.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>