This pull request contains the following branches:
doc.2022.10.20a: Documentation updates. This is the second
in a series from an ongoing review of the RCU documentation.
fixes.2022.10.21a: Miscellaneous fixes.
lazy.2022.11.30a: Introduces a default-off Kconfig option that depends
on RCU_NOCB_CPU that, on CPUs mentioned in the nohz_full or
rcu_nocbs boot-argument CPU lists, causes call_rcu() to introduce
delays. These delays result in significant power savings on
nearly idle Android and ChromeOS systems. These savings range
from a few percent to more than ten percent.
This series also includes several commits that change call_rcu()
to a new call_rcu_hurry() function that avoids these delays in
a few cases, for example, where timely wakeups are required.
Several of these are outside of RCU and thus have acks and
reviews from the relevant maintainers.
srcunmisafe.2022.11.09a: Creates an srcu_read_lock_nmisafe() and an
srcu_read_unlock_nmisafe() for architectures that support NMIs,
but which do not provide NMI-safe this_cpu_inc(). These NMI-safe
SRCU functions are required by the upcoming lockless printk()
work by John Ogness et al.
That printk() series depends on these commits, so if you pull
the printk() series before this one, you will have already
pulled in this branch, plus two more SRCU commits:
0cd7e350ab ("rcu: Make SRCU mandatory")
51f5f78a4f ("srcu: Make Tiny synchronize_srcu() check for readers")
These two commits appear to work well, but do not have
sufficient testing exposure over a long enough time for me to
feel comfortable pushing them unless something in mainline is
definitely going to use them immediately, and currently only
the new printk() work uses them.
torture.2022.10.18c: Changes providing minor but important increases
in test coverage for the new RCU polled-grace-period APIs.
torturescript.2022.10.20a: Changes that avoid redundant kernel builds,
thus providing about a 30% speedup for the torture.sh acceptance
test.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKnS8THHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jCMiD/4weraRjmcLhZ3tz2vgTI8ZsXdIiCfU
vCln0AOKroVo37S4BhViVfryV2D4VFfEb1UY6EgxNFu7Jd3z0seQShZh/5r8bFMU
p0E6TC8PwyKUpQstTOwOynkw6BWGW1qeL620PpBNRAy4MkxL8AGv40tHRIHEeAzc
cCTax2+xW9ae0ZtAZHDDCUAzpYpcjScIf4OZ3tkSaFCcpWZijg+dN60dnsZ9l7h9
DtqKH61rszXAtxkmN9Fs9OY5MPCXi9Es6LVYq6KN06jqxwJRqmYf+pai3apmNIOf
P8isXOQG58tbhBLpNCG58UBSkjI2GG8Lcq6hYr6d/7Ukm7RF49q8eL7OQlVrJMuQ
Zi2DVTEAu2U3pzdTC14gi3RvqP7dO+psBs+LpGXtj4RxYvAP99e9KSRcG14j/Wwa
L52AetBzBXTCS5nhPOG8RP22d8HRZLxMe9x7T8iVCDuwH4M1zTF5cVzLeEdgPAD7
tdX4eV16PLt1AvhCEuHU/2v520gc2K9oGXLI1A6kzquXh7FflcPWl5WS+sYUbB/p
gBsblz7C3I5GgSoW4aAMnkukZiYgSvVql8ZyRwQuRzvLpYcofMpoanZbcufDjuw9
N5QzAaMmzHnBu3hOJS2WaSZRZ73fed3NO8jo8q8EMfYeWK3NAHybBdaQqSTgsO8i
s+aN+LZ4s5MnRw==
=eMOr
-----END PGP SIGNATURE-----
Merge tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates. This is the second in a series from an ongoing
review of the RCU documentation.
- Miscellaneous fixes.
- Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU
that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument
CPU lists, causes call_rcu() to introduce delays.
These delays result in significant power savings on nearly idle
Android and ChromeOS systems. These savings range from a few percent
to more than ten percent.
This series also includes several commits that change call_rcu() to a
new call_rcu_hurry() function that avoids these delays in a few
cases, for example, where timely wakeups are required. Several of
these are outside of RCU and thus have acks and reviews from the
relevant maintainers.
- Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe()
for architectures that support NMIs, but which do not provide
NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required
by the upcoming lockless printk() work by John Ogness et al.
- Changes providing minor but important increases in torture test
coverage for the new RCU polled-grace-period APIs.
- Changes to torturescript that avoid redundant kernel builds, thus
providing about a 30% speedup for the torture.sh acceptance test.
* tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
net: devinet: Reduce refcount before grace period
net: Use call_rcu_hurry() for dst_release()
workqueue: Make queue_rcu_work() use call_rcu_hurry()
percpu-refcount: Use call_rcu_hurry() for atomic switch
scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
rcu/rcutorture: Use call_rcu_hurry() where needed
rcu/rcuscale: Use call_rcu_hurry() for async reader test
rcu/sync: Use call_rcu_hurry() instead of call_rcu
rcuscale: Add laziness and kfree tests
rcu: Shrinker for lazy rcu
rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
rcu: Make call_rcu() lazy to save power
rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
srcu: Debug NMI safety even on archs that don't require it
srcu: Explain the reason behind the read side critical section on GP start
srcu: Warn when NMI-unsafe API is used in NMI
arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
rcu-tasks: Make grace-period-age message human-readable
...
The test currently fails on 32bit. Fixing the "-1ull" vs. "-1ul" seems
to make the test pass and the compiler happy.
Note: This test is not in mm-stable yet. This fix should be squashed into
"selftests/vm: add KSM unmerge tests".
Link: https://lkml.kernel.org/r/20221205193716.276024-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The compiler complains about the conversion of a pointer to an int of
different width.
Link: https://lkml.kernel.org/r/20221205193716.276024-4-david@redhat.com
Fixes: 6f1405efc6 ("selftests/vm: anon_cow: add R/O longterm tests via gup_test")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The tests fail to compile in some environments (e.g., Debian 11.5 on x86).
Let's simply conditionally define MADV_POPULATE_(READ|WRITE) if not
already defined, similar to how the khugepaged.c test handles it.
Link: https://lkml.kernel.org/r/20221205193716.276024-3-david@redhat.com
Fixes: 39b2e5cae4 ("selftests/vm: make MADV_POPULATE_(READ|WRITE) use in-tree headers")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make sure that we ignore protection of a memcg that is the target of memcg
reclaim.
Link: https://lkml.kernel.org/r/20221202031512.1365483-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Refactor the code that drives writing to memory.reclaim (retrying, error
handling, etc) from test_memcg_reclaim() to a helper called
reclaim_until(), which proactively reclaims from a memcg until its usage
reaches a certain value.
While we are at it, refactor and simplify the reclaim loop.
This will be used in a following patch in another test.
Link: https://lkml.kernel.org/r/20221202031512.1365483-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Suggested-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A DAMON sysfs user could start DAMON with a scheme, remove the sysfs
directory for the scheme, and then ask stats or schemes tried regions
update. The related logic were not aware of the already removed directory
situation, so it was able to results in invalid memory accesses. The fix
has made with commit 8468b48661 ("mm/damon/sysfs-schemes: skip stats
update if the scheme directory is removed"), though. Add a selftest to
prevent such kinds of bugs from being introduced again.
Link: https://lkml.kernel.org/r/20221201170834.62823-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add a test to measure performance of KSM breaking not triggered via
COW, but triggered by disabling KSM on an area filled with KSM pages via
MADV_UNMERGEABLE.
Link: https://lkml.kernel.org/r/20221021101141.84170-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/ksm: break_ksm() cleanups and fixes", v2.
This series cleans up and fixes break_ksm(). In summary, we no longer use
fake write faults to break COW but instead FAULT_FLAG_UNSHARE. Further,
we move away from using follow_page() --- that we can hopefully remove
completely at one point --- and use new walk_page_range_vma() instead.
Fortunately, we can get rid of VM_FAULT_WRITE and FOLL_MIGRATION in common
code now.
Extend the existing ksm tests by an unmerge benchmark, and a some new
unmerge tests.
Also, add a selftest to measure MADV_UNMERGEABLE performance. In my setup
(AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance
on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a
performance degradation of ~6% -- 7% (old: ~5250 MiB/s, new: ~4900 MiB/s).
I don't think we particularly care for now, but it's good to be aware of
the implication.
This patch (of 9):
Let's add three unmerge tests (MADV_UNMERGEABLE unmerging all pages in the
range).
test_unmerge(): basic unmerge tests
test_unmerge_discarded(): have some pte_none() entries in the range
test_unmerge_uffd_wp(): protect the merged pages using uffd-wp
ksm_tests.c currently contains a mixture of benchmarks and tests, whereby
each test is carried out by executing the ksm_tests binary with specific
parameters. Let's add new ksm_functional_tests.c that performs multiple,
smaller functional tests all at once.
Link: https://lkml.kernel.org/r/20221021101141.84170-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221021101141.84170-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Our memory management kernel CI testing at Red Hat uses the VM
selftests and we have run into two problems:
First, our LTP tests overlap with the VM selftests.
We want to avoid unhelpful redundancy in our testing practices.
Second, we have observed the current run_vmtests.sh to report overall
failure/ambiguous results in the case that a machine lacks the necessary
hardware to perform one or more of the tests. E.g. ksm tests that
require more than one numa node.
We want to be able to run the vm selftests suitable to particular hardware.
Add the ability to run one or more groups of vm tests via run_vmtests.sh
instead of simply all-or-none in order to solve these problems.
Preserve existing default behavior of running all tests when the script
is invoked with no arguments.
Documentation of test groups is included in the patch as follows:
# ./run_vmtests.sh [ -h || --help ]
usage: ./tools/testing/selftests/vm/run_vmtests.sh [ -h | -t "<categories>"]
-t: specify specific categories to tests to run
-h: display this message
The default behavior is to run all tests.
Alternatively, specific groups tests can be run by passing a string
to the -t argument containing one or more of the following categories
separated by spaces:
- mmap
tests for mmap(2)
- gup_test
tests for gup using gup_test interface
- userfaultfd
tests for userfaultfd(2)
- compaction
a test for the patch "Allow compaction of unevictable pages"
- mlock
tests for mlock(2)
- mremap
tests for mremap(2)
- hugevm
tests for very large virtual address space
- vmalloc
vmalloc smoke tests
- hmm
hmm smoke tests
- madv_populate
test memadvise(2) MADV_POPULATE_{READ,WRITE} options
- memfd_secret
test memfd_secret(2)
- process_mrelease
test process_mrelease(2)
- ksm
ksm tests that do not require >=2 NUMA nodes
- ksm_numa
ksm tests that require >=2 NUMA nodes
- pkey
memory protection key tests
- soft_dirty
test soft dirty page bit semantics
- anon_cow
test anonymous copy-on-write semantics
example: ./run_vmtests.sh -t "hmm mmap ksm"
Link: https://lkml.kernel.org/r/20221018231222.1884715-1-jsavitz@redhat.com
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
post-6.0 issues.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5Ur2AAKCRDdBJ7gKXxA
jsGmAQDWSq6z9fVgk30XpMr/X7t5c6NTPw5GocVpdwG8iqch3gEAjEs5/Kcd/mx4
d1dLaJFu1u3syessp8nJrNr1HANIog8=
=L8zu
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes.
Six for MM, three for other areas. Four of these patches address
post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
memcg: fix possible use-after-free in memcg_write_event_control()
MAINTAINERS: update Muchun Song's email
mm/gup: fix gup_pud_range() for dax
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
tmpfs: fix data loss from failed fallocate
kselftests: cgroup: update kmem test precision tolerance
mm: do not BUG_ON missing brk mapping, because userspace can unmap it
mailmap: update Matti Vaittinen's email address
Check that verifier.c:states_equal() uses check_ids() to match
consistent active_lock/map_value configurations. This allows to prune
states with active spin locks even if numerical values of
active_lock ids do not match across compared states.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test that when reg->id is not same for the same register of type
PTR_TO_MAP_VALUE between current and old explored state, we currently
return false from regsafe and continue exploring.
Without the fix in prior commit, the test case fails.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A test case that would erroneously pass verification if
verifier.c:states_equal() maintains separate register ID mappings for
call frames.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Under certain conditions it was possible for verifier.c:regsafe() to
skip check_id() call. This commit adds negative test cases previously
errorneously accepted as safe.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The Makefile was assuming that headers_install was already done in
the top source directory, and was searching for installed uapi headers
there.
Unfortunately this is not the case and we need to manually call that step.
To do so, reorder the declaration of the variables, and reuses top_srcdir
provided by lib.mk
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202212060216.a6X8Py5H-lkp@intel.com/#t
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/20221206145936.922196-6-benjamin.tissoires@redhat.com
DYNAMIC_FTRACE_WITH_DIRECT_CALLS is implicit on x86_64 but is still a
WIP on aarm64. Ensure we get it selected to not have any surprises.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/20221206145936.922196-5-benjamin.tissoires@redhat.com
1813e51eec ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
the batch size while this test case has been left behind. This has led
to a test failure reported by test bot:
not ok 2 selftests: cgroup: test_kmem # exit=1
Update the tolerance for the pcp charges to reflect the
MEMCG_CHARGE_BATCH change to fix this.
[akpm@linux-foundation.org: update comments, per Roman]
Link: https://lkml.kernel.org/r/Y4m8Unt6FhWKC6IH@dhcp22.suse.cz
Fixes: 1813e51eec ("memcg: increase MEMCG_CHARGE_BATCH to 64")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212010958.c1053bd3-yujie.liu@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MEM_REGION_TEST_DATA is meant to hold data explicitly used by a
selftest, not implicit allocations due to the selftests infrastructure.
Allocate the ucall pool from MEM_REGION_DATA much like the rest of the
selftests library allocations.
Fixes: 426729b2cf ("KVM: selftests: Add ucall pool based implementation")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-5-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
An interesting feature of the Arm architecture is that the stage-1 MMU
supports two distinct VA regions, controlled by TTBR{0,1}_EL1. As KVM
selftests on arm64 only uses TTBR0_EL1, the VA space is constrained to
[0, 2^(va_bits-1)). This is different from other architectures that
allow for addressing low and high regions of the VA space from a single
page table.
KVM selftests' VA space allocator presumes the valid address range is
split between low and high memory based the MSB, which of course is a
poor match for arm64's TTBR0 region.
Allow architectures to override the default VA space layout. Make use of
the override to align vpages_valid with the behavior of TTBR0 on arm64.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOODb0PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDztsQAInRnsgLl57/SpqhZzExNCllN6AT/bdeB3uz
rnw3ScJOV174uNKp8lnPWoTvu2YUGiVtBp6tFHhDI8le7zHX438ZT8KE5mcs8p5i
KfFKnb8SHV2DDpqkcy24c0Xl/6vsg1qkKrdfJb49yl5ZakRITDpynW/7tn6dXsxX
wASeGFdCYeW4g2xMQzsCbtx6LgeQ8uomBmzRfPrOtZHYYxAn6+4Mj4595EC1sWxM
AQnbp8tW3Vw46saEZAQvUEOGOW9q0Nls7G21YqQ52IA+ZVDK1LmAF2b1XY3edjkk
pX8EsXOURfqdasBxfSfF3SgnUazoz9GHpSzp1cTVTktrPp40rrT7Ldtml0ktq69d
1malPj47KVMDsIq0kNJGnMxciXFgAHw+VaCQX+k4zhIatNwviMbSop2fEoxj22jc
4YGgGOxaGrnvmAJhreCIbr4CkZk5CJ8Zvmtfg+QM6npIp8BY8896nvORx/d4i6tT
H4caadd8AAR56ANUyd3+KqF3x0WrkaU0PLHJLy1tKwOXJUUTjcpvIfahBAAeUlSR
qEFrtb+EEMPgAwLfNOICcNkPZR/yyuYvM+FiUQNVy5cNiwFkpztpIctfOFaHySGF
K07O2/a1F6xKL0OKRUg7hGKknF9ecmux4vHhiUMuIk9VOgNTWobHozBDorLKXMzC
aWa6oGVC
=iIPT
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
Allow variables to execute shell commands. Note, these are processed when
they are first seen while parsing the config file. This is useful if you
have the same config file used for multiple hosts (as they may be in a git
repository).
HOSTNAME := ${shell hostname}
DEFAULTS IF "${HOSTNAME}" == "frodo"
Link: https://lkml.kernel.org/r/20221207212944.277ee850@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The BPF Makefile in net/bpf did incorrect path substitution for O=dir
builds, e.g.
make O=/tmp/kselftest headers
make O=/tmp/kselftest -C tools/testing/selftests
would fail in selftest builds [1] net/ with
clang-16: error: no such file or directory: 'kselftest/net/bpf/nat6to4.c'
clang-16: error: no input files
Add a pattern prerequisite and an order-only-prerequisite (for
creating the directory), to resolve the issue.
[1] https://lore.kernel.org/all/202212060009.34CkQmCN-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 837a3d66d6 ("selftests: net: Add cross-compilation support for BPF programs")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20221206102838.272584-1-bjorn@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that Spectrum-1 gained ip6gre support we can move the test out of
the Spectrum-2 directory.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original support for bpf_user_ringbuf_drain callbacks simply
short-circuited checks for the dynptr state, allowing users to pass
PTR_TO_DYNPTR (now CONST_PTR_TO_DYNPTR) to helpers that initialize a
dynptr. This bug would have also surfaced with other dynptr helpers in
the future that changed dynptr view or modified it in some way.
Include test cases for all cases, i.e. both bpf_dynptr_from_mem and
bpf_ringbuf_reserve_dynptr, and ensure verifier rejects both of them.
Without the fix, both of these programs load and pass verification.
While at it, remove sys_nanosleep target from failure cases' SEC
definition, as there is no such tracepoint.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While check_func_arg_reg_off is the place which performs generic checks
needed by various candidates of reg->type, there is some handling for
special cases, like ARG_PTR_TO_DYNPTR, OBJ_RELEASE, and
ARG_PTR_TO_RINGBUF_MEM.
This commit aims to streamline these special cases and instead leave
other things up to argument type specific code to handle. The function
will be restrictive by default, and cover all possible cases when
OBJ_RELEASE is set, without having to update the function again (and
missing to do that being a bug).
This is done primarily for two reasons: associating back reg->type to
its argument leaves room for the list getting out of sync when a new
reg->type is supported by an arg_type.
The other case is ARG_PTR_TO_RINGBUF_MEM. The problem there is something
we already handle, whenever a release argument is expected, it should
be passed as the pointer that was received from the acquire function.
Hence zero fixed and variable offset.
There is nothing special about ARG_PTR_TO_RINGBUF_MEM, where technically
its target register type PTR_TO_MEM | MEM_RINGBUF can already be passed
with non-zero offset to other helper functions, which makes sense.
Hence, lift the arg_type_is_release check for reg->off and cover all
possible register types, instead of duplicating the same kind of check
twice for current OBJ_RELEASE arg_types (alloc_mem and ptr_to_btf_id).
For the release argument, arg_type_is_dynptr is the special case, where
we go to actual object being freed through the dynptr, so the offset of
the pointer still needs to allow fixed and variable offset and
process_dynptr_func will verify them later for the release argument case
as well.
This is not specific to ARG_PTR_TO_DYNPTR though, we will need to make
this exception for any future object on the stack that needs to be
released. In this sense, PTR_TO_STACK as a candidate for object on stack
argument is a special case for release offset checks, and they need to
be done by the helper releasing the object on stack.
Since the check has been lifted above all register type checks, remove
the duplicated check that is being done for PTR_TO_BTF_ID.
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
for use in callback state, because in case of user ringbuf helpers,
there is no dynptr on the stack that is passed into the callback. To
reflect such a state, a special register type was created.
However, some checks have been bypassed incorrectly during the addition
of this feature. First, for arg_type with MEM_UNINIT flag which
initialize a dynptr, they must be rejected for such register type.
Secondly, in the future, there are plans to add dynptr helpers that
operate on the dynptr itself and may change its offset and other
properties.
In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
to such helpers, however the current code simply returns 0.
The rejection for helpers that release the dynptr is already handled.
For fixing this, we take a step back and rework existing code in a way
that will allow fitting in all classes of helpers and have a coherent
model for dealing with the variety of use cases in which dynptr is used.
First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
fact. To make the distinction clear, use MEM_RDONLY flag to indicate
that the helper only operates on the memory pointed to by the dynptr,
not the dynptr itself. In C parlance, it would be equivalent to taking
the dynptr as a point to const argument.
When either of these flags are not present, the helper is allowed to
mutate both the dynptr itself and also the memory it points to.
Currently, the read only status of the memory is not tracked in the
dynptr, but it would be trivial to add this support inside dynptr state
of the register.
With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
better reflect its usage, it can no longer be passed to helpers that
initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
A note to reviewers is that in code that does mark_stack_slots_dynptr,
and unmark_stack_slots_dynptr, we implicitly rely on the fact that
PTR_TO_STACK reg is the only case that can reach that code path, as one
cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
both cases such helpers won't be setting that flag.
The next patch will add a couple of selftest cases to make sure this
doesn't break.
Fixes: 2057156738 ("bpf: Add bpf_user_ringbuf_drain() helper")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ARG_PTR_TO_DYNPTR is akin to ARG_PTR_TO_TIMER, ARG_PTR_TO_KPTR, where
the underlying register type is subjected to more special checks to
determine the type of object represented by the pointer and its state
consistency.
Move dynptr checks to their own 'process_dynptr_func' function so that
is consistent and in-line with existing code. This also makes it easier
to reuse this code for kfunc handling.
Then, reuse this consolidated function in kfunc dynptr handling too.
Note that for kfuncs, the arg_type constraint of DYNPTR_TYPE_LOCAL has
been lifted.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Bpftool has new extra libbpf_det_bind probing map we need to exclude.
Also skip trying to load netdevsim modules if it's already loaded (builtin).
v2:
- drop iproute2->bpftool changes (Toke)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221206232739.2504890-1-sdf@google.com
grub2 has submenus where to use grub-reboot, it requires:
grub-reboot X>Y
where X is the main index and Y is the submenu. Thus if you have:
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux ...
[...]
}
submenu 'Advanced options for Debian GNU/Linux' $menuentry_id_option ...
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64 (recovery mode)' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux test' --class debian --class gnu-linux ...
[...]
}
And wanted to boot to the "Linux test" kernel, you need to run:
# grub-reboot 1>2
As 1 is the second top menu (the submenu) and 2 is the third of the sub
menu entries.
Have the grub.cfg parsing for grub2 handle such cases.
Cc: stable@vger.kernel.org
Fixes: a15ba91361 ("ktest: Add support for grub2")
Reviewed-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
After a full run of a make_min_config test, I noticed there were a lot of
CONFIGs still enabled that really should not be. Looking at them, I
noticed they were all defined as "default y". The issue is that the test
simple removes the config and re-runs make oldconfig, which enables it
again because it is set to default 'y'. Instead, explicitly disable the
config with writing "# CONFIG_FOO is not set" to the file to keep it from
being set again.
With this change, one of my box's minconfigs went from 768 configs set,
down to 521 configs set.
Link: https://lkml.kernel.org/r/20221202115936.016fce23@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 0a05c769a9 ("ktest: Added config_bisect test type")
Reviewed-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Convert big chunks of dynptr and map_kptr subtests to use generic
verification_tester. They are switched from using manually maintained
tables of test cases, specifying program name and expected error
verifier message, to btf_decl_tag-based annotations directly on
corresponding BPF programs: __failure to specify that BPF program is
expected to fail verification, and __msg() to specify expected log
message.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207201648.2990661-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It's become a common pattern to have a collection of small BPF programs
in one BPF object file, each representing one test case. On user-space
side of such tests we maintain a table of program names and expected
failure or success, along with optional expected verifier log message.
This works, but each set of tests reimplement this mundane code over and
over again, which is a waste of time for anyone trying to add a new set
of tests. Furthermore, it's quite error prone as it's way too easy to miss
some entries in these manually maintained test tables (as evidences by
dynptr_fail tests, in which ringbuf_release_uninit_dynptr subtest was
accidentally missed; this is fixed in next patch).
So this patch implements generic test_loader, which accepts skeleton
name and handles the rest of details: opens and loads BPF object file,
making sure each program is tested in isolation. Optionally each test
case can specify expected BPF verifier log message. In case of failure,
tester makes sure to report verifier log, but it also reports verifier
log in verbose mode unconditionally.
Now, the interesting deviation from existing custom implementations is
the use of btf_decl_tag attribute to specify expected-to-fail vs
expected-to-succeed markers and, optionally, expected log message
directly next to BPF program source code, eliminating the need to
manually create and update table of tests.
We define few macros wrapping btf_decl_tag with a convention that all
values of btf_decl_tag start with "comment:" prefix, and then utilizing
a very simple "just_some_text_tag" or "some_key_name=<value>" pattern to
define things like expected success/failure, expected verifier message,
extra verifier log level (if necessary). This approach is demonstrated
by next patch in which two existing sets of failure tests are converted.
Tester supports both expected-to-fail and expected-to-succeed programs,
though this patch set didn't convert any existing expected-to-succeed
programs yet, as existing tests couple BPF program loading with their
further execution through attach or test_prog_run. One way to allow
testing scenarios like this would be ability to specify custom callback,
executed for each successfully loaded BPF program. This is left for
follow up patches, after some more analysis of existing test cases.
This test_loader is, hopefully, a start of a test_verifier-like runner,
but integrated into test_progs infrastructure. It will allow much better
"user experience" of defining low-level verification tests that can take
advantage of all the libbpf-provided nicety features on BPF side: global
variables, declarative maps, etc. All while having a choice of defining
it in C or as BPF assembly (through __attribute__((naked)) functions and
using embedded asm), depending on what makes most sense in each
particular case. This will be explored in follow up patches as well.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221207201648.2990661-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cited commit added the table ID to the FIB info structure, but did not
properly initialize it when table ID 0 is used. This can lead to a route
in the default VRF with a preferred source address not being flushed
when the address is deleted.
Consider the following example:
# ip address add dev dummy1 192.0.2.1/28
# ip address add dev dummy1 192.0.2.17/28
# ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100
# ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Both routes are installed in the default VRF, but they are using two
different FIB info structures. One with a metric of 100 and table ID of
254 (main) and one with a metric of 200 and table ID of 0. Therefore,
when the preferred source address is deleted from the default VRF,
the second route is not flushed:
# ip address del dev dummy1 192.0.2.17/28
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Fix by storing a table ID of 254 instead of 0 in the route configuration
structure.
Add a test case that fails before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [FAIL]
Tests passed: 8
Tests failed: 1
And passes after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [ OK ]
Tests passed: 9
Tests failed: 0
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Reported-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cited commit added the table ID to the FIB info structure, but did not
prevent structures with different table IDs from being consolidated.
This can lead to routes being flushed from a VRF when an address is
deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB
info. This is already done for FIB info structures backed by a nexthop
object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [FAIL]
TEST: Route in default VRF not removed [ OK ]
RTNETLINK answers: File exists
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6
Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8
Tests failed: 0
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A series of prior patches added some kfuncs that allow struct
task_struct * objects to be used as kptrs. These kfuncs leveraged the
'refcount_t rcu_users' field of the task for performing refcounting.
This field was used instead of 'refcount_t usage', as we wanted to
leverage the safety provided by RCU for ensuring a task's lifetime.
A struct task_struct is refcounted by two different refcount_t fields:
1. p->usage: The "true" refcount field which task lifetime. The
task is freed as soon as this refcount drops to 0.
2. p->rcu_users: An "RCU users" refcount field which is statically
initialized to 2, and is co-located in a union with
a struct rcu_head field (p->rcu). p->rcu_users
essentially encapsulates a single p->usage
refcount, and when p->rcu_users goes to 0, an RCU
callback is scheduled on the struct rcu_head which
decrements the p->usage refcount.
Our logic was that by using p->rcu_users, we would be able to use RCU to
safely issue refcount_inc_not_zero() a task's rcu_users field to
determine if a task could still be acquired, or was exiting.
Unfortunately, this does not work due to p->rcu_users and p->rcu sharing
a union. When p->rcu_users goes to 0, an RCU callback is scheduled to
drop a single p->usage refcount, and because the fields share a union,
the refcount immediately becomes nonzero again after the callback is
scheduled.
If we were to split the fields out of the union, this wouldn't be a
problem. Doing so should also be rather non-controversial, as there are
a number of places in struct task_struct that have padding which we
could use to avoid growing the structure by splitting up the fields.
For now, so as to fix the kfuncs to be correct, this patch instead
updates bpf_task_acquire() and bpf_task_release() to use the p->usage
field for refcounting via the get_task_struct() and put_task_struct()
functions. Because we can no longer rely on RCU, the change also guts
the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions
pending a resolution on the above problem.
In addition, the task fixes the kfunc and rcu_read_lock selftests to
expect this new behavior.
Fixes: 90660309b0 ("bpf: Add kfuncs for storing struct task_struct * as a kptr")
Fixes: fca1aa7551 ("bpf: Handle MEM_RCU type properly")
Reported-by: Matus Jokay <matus.jokay@stuba.sk>
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When installing the selftests using
"make -C tools/testing/selftests install", we need to make sure
all the required files to run the selftests are installed. Let's
make sure this is the case.
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221205131618.1524337-2-daan.j.demeyer@gmail.com
It is useful to use vmlinux.h in the xfrm_info test like other kfunc
tests do. In particular, it is common for kfunc bpf prog that requires
to use other core kernel structures in vmlinux.h
Although vmlinux.h is preferred, it needs a ___local flavor of
struct bpf_xfrm_info in order to build the bpf selftests
when CONFIG_XFRM_INTERFACE=[m|n].
Cc: Eyal Birger <eyal.birger@gmail.com>
Fixes: 90a3a05eb3 ("selftests/bpf: add xfrm_info tests")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221206193554.1059757-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* for-next/selftests:
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
kselftest/arm64: Set test names prior to starting children
kselftest/arm64: Use preferred form for predicate load/stores
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: Print ASCII version of unknown signal frame magic values
kselftest/arm64: Remove validation of extra_context from TODO
kselftest/arm64: Provide progress messages when signalling children
kselftest/arm64: Check that all children are producing output in fp-stress
This is a fairly sedate release for the core code, but there's been a
lot of driver work especially around the x86 platforms and device tree
updates:
- More cleanups of the DAPM code from Morimoto-san.
- Factoring out of mapping hw_params onto SoundWire configuration by
Charles Keepax.
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740.
- Lots of device tree conversions and fixups.
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
There's more cross tree updates than usual, though all fairly minor:
- Some OMAP board file updates that were depedencies for removing their
providers in ASoC, as part of a wider effort removing the support for
the relevant OMAP platforms.
- A new I2C API required for updates to the new I2C probe API.
- A DRM update making use of a new API for fixing the capabilities
advertised via hdmi-codec.
Since this is being sent early I might send some more stuff if you've
not yet sent your pull request and there's more come in.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmOOO08ACgkQJNaLcl1U
h9BiIAf+O8zqcxtokSTS3wsoFHDp2zprTTUk3RZbxaYMFTmFrALbQVUx5EdJ179X
Dr9rL9T2iNBN7YGhDvmzoVvLrDncJEerMk7cgbc88a5tkB7ZipI7E9QliZt92QaP
ihrXAyD7ekLC6QLXrHFNmzwUeFspgtsBkT3MakcfcucuMN53UQbFwD1r0ziCJOMZ
gFnkZKBemUDsURxWR+LCveB/WYKNrZ6Bjokdg8HY1yenACMiXaoWedmIFJ2SxId3
NNS8vqUDUSEFhDaeJVBo1Ow9m8fe+dZBUeVk9Ej7bGEXxYptwtu3aODXknySa8Eb
YVlalR9qQPvaI1688WRleRWp+oyL4Q==
=x1AC
-----END PGP SIGNATURE-----
Merge tag 'asoc-v6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Updates for v6.2
This is a fairly sedate release for the core code, but there's been a
lot of driver work especially around the x86 platforms and device tree
updates:
- More cleanups of the DAPM code from Morimoto-san.
- Factoring out of mapping hw_params onto SoundWire configuration by
Charles Keepax.
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740.
- Lots of device tree conversions and fixups.
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
There's more cross tree updates than usual, though all fairly minor:
- Some OMAP board file updates that were depedencies for removing their
providers in ASoC, as part of a wider effort removing the support for
the relevant OMAP platforms.
- A new I2C API required for updates to the new I2C probe API.
- A DRM update making use of a new API for fixing the capabilities
advertised via hdmi-codec.
Since this is being sent early I might send some more stuff if you've
not yet sent your pull request and there's more come in.
Test the xfrm_info kfunc helpers.
The test setup creates three name spaces - NS0, NS1, NS2.
XFRM tunnels are setup between NS0 and the two other NSs.
The kfunc helpers are used to steer traffic from NS0 to the other
NSs based on a userspace populated bpf global variable and validate
that the return traffic had arrived from the desired NS.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20221203084659.1837829-5-eyal.birger@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The typical environment where cxl_test is run, QEMU, does not support
cpu_cache_invalidate_memregion(). Add the 'test' bypass symbols to the
configuration check.
Reported-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167026948179.3527561.4535373655515827457.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pick up support for "XOR" interleave math when parsing ACPI CFMWS window
structures. Fix up conflicts with the RCH emulation already pending in
cxl/next.
In an RCH topology a CXL host-bridge as Root Complex Integrated Endpoint
the represents the memory expander. Unlike a VH topology there is no
CXL/PCIE Root Port that host the endpoint. The CXL subsystem maps this
as the CXL root object (ACPI0017 on ACPI based systems) targeting the
host-bridge as a dport, per usual, but then that dport directly hosts
the endpoint port.
Mock up that configuration with a 4th host-bridge that has a 'cxl_rcd'
device instance as its immediate child.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/166993046170.1882361.12460762475782283638.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* kvm-arm64/dirty-ring:
: .
: Add support for the "per-vcpu dirty-ring tracking with a bitmap
: and sprinkles on top", courtesy of Gavin Shan.
:
: This branch drags the kvmarm-fixes-6.1-3 tag which was already
: merged in 6.1-rc4 so that the branch is in a working state.
: .
KVM: Push dirty information unconditionally to backup bitmap
KVM: selftests: Automate choosing dirty ring size in dirty_log_test
KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
KVM: selftests: Use host page size to map ring buffer in dirty_log_test
KVM: arm64: Enable ring-based dirty memory tracking
KVM: Support dirty ring in conjunction with bitmap
KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/access-tracking:
: .
: Small series to add support for arm64 to access_tracking_perf_test and
: correct a couple bugs along the way.
:
: Patches courtesy of Oliver Upton.
: .
KVM: selftests: Build access_tracking_perf_test for arm64
KVM: selftests: Have perf_test_util signal when to stop vCPUs
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/s2-faults:
: .
: New KVM/arm64 selftests exercising various sorts of S2 faults, courtesy
: of Ricardo Koller. From the cover letter:
:
: "This series adds a new aarch64 selftest for testing stage 2 fault handling
: for various combinations of guest accesses (e.g., write, S1PTW), backing
: sources (e.g., anon), and types of faults (e.g., read on hugetlbfs with a
: hole, write on a readonly memslot). Each test tries a different combination
: and then checks that the access results in the right behavior (e.g., uffd
: faults with the right address and write/read flag). [...]"
: .
KVM: selftests: aarch64: Add mix of tests into page_fault_test
KVM: selftests: aarch64: Add readonly memslot tests into page_fault_test
KVM: selftests: aarch64: Add dirty logging tests into page_fault_test
KVM: selftests: aarch64: Add userfaultfd tests into page_fault_test
KVM: selftests: aarch64: Add aarch64/page_fault_test
KVM: selftests: Use the right memslot for code, page-tables, and data allocations
KVM: selftests: Fix alignment in virt_arch_pgd_alloc() and vm_vaddr_alloc()
KVM: selftests: Add vm->memslots[] and enum kvm_mem_region_type
KVM: selftests: Stash backing_src_type in struct userspace_mem_region
tools: Copy bitfield.h from the kernel sources
KVM: selftests: aarch64: Construct DEFAULT_MAIR_EL1 using sysreg.h macros
KVM: selftests: Add missing close and munmap in __vm_mem_region_delete()
KVM: selftests: aarch64: Add virt_get_pte_hva() library function
KVM: selftests: Add a userfaultfd library
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/linked-bps:
: .
: Additional selftests for the arm64 breakpoints/watchpoints,
: courtesy of Reiji Watanabe. From the cover letter:
:
: "This series adds test cases for linked {break,watch}points to the
: debug-exceptions test, and expands {break,watch}point tests to
: use non-zero {break,watch}points (the current test always uses
: {break,watch}point#0)."
: .
KVM: arm64: selftests: Test with every breakpoint/watchpoint
KVM: arm64: selftests: Add a test case for a linked watchpoint
KVM: arm64: selftests: Add a test case for a linked breakpoint
KVM: arm64: selftests: Change debug_version() to take ID_AA64DFR0_EL1
KVM: arm64: selftests: Stop unnecessary test stage tracking of debug-exceptions
KVM: arm64: selftests: Add helpers to enable debug exceptions
KVM: arm64: selftests: Remove the hard-coded {b,w}pn#0 from debug-exceptions
KVM: arm64: selftests: Add write_dbg{b,w}{c,v}r helpers in debug-exceptions
KVM: arm64: selftests: Use FIELD_GET() to extract ID register fields
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/memslot-fixes:
: .
: KVM memslot selftest fixes for non-4kB page sizes, courtesy
: of Gavin Shan. From the cover letter:
:
: "kvm/selftests/memslots_perf_test doesn't work with 64KB-page-size-host
: and 4KB-page-size-guest on aarch64. In the implementation, the host and
: guest page size have been hardcoded to 4KB. It's ovbiously not working
: on aarch64 which supports 4KB, 16KB, 64KB individually on host and guest.
:
: This series tries to fix it. After the series is applied, the test runs
: successfully with 64KB-page-size-host and 4KB-page-size-guest."
: .
KVM: selftests: memslot_perf_test: Report optimal memory slots
KVM: selftests: memslot_perf_test: Consolidate memory
KVM: selftests: memslot_perf_test: Support variable guest page size
KVM: selftests: memslot_perf_test: Probe memory slots for once
KVM: selftests: memslot_perf_test: Consolidate loop conditions in prepare_vm()
KVM: selftests: memslot_perf_test: Use data->nslots in prepare_vm()
Signed-off-by: Marc Zyngier <maz@kernel.org>
In check_all_cpu_dscr_defaults, opendir() opens the directory stream.
Add missing closedir() in the error path to release it.
In check_cpu_dscr_default, open() creates an open file descriptor.
Add missing close() in the error path to release it.
Fixes: ebd5858c90 ("selftests/powerpc: Add test for all DSCR sysfs interfaces")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221205084429.570654-1-linmq006@gmail.com
Both tolower and toupper are built in c functions, we should not
redefine them as this can result in a build error.
Fixes the following errors:
progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
10 | static inline char tolower(char c)
| ^~~~~~~
progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>'
4 | #include <bpf/bpf_helpers.h>
+++ |+#include <ctype.h>
5 |
progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
17 | static inline char toupper(char c)
| ^~~~~~~
progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>'
See background on this sort of issue:
https://stackoverflow.com/a/20582607https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213
(C99, 7.1.3p1) "All identifiers with external linkage in any of the
following subclauses (including the future library directions) are
always reserved for use as identifiers with external linkage."
This is documented behavior in GCC:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221203010847.2191265-1-james.hilliard1@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add three tests for cgrp local storage support for sleepable progs.
Two tests can load and run properly, one for cgroup_iter, another
for passing current->cgroups->dfl_cgrp to bpf_cgrp_storage_get()
helper. One test has bpf_rcu_read_lock() and failed to load.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221201050449.2785613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin mentioned that the verifier cannot assume arguments from
LSM hook sk_alloc_security being trusted since after the hook
is called, the sk ref_count is set to 1. This will overwrite
the ref_count changed by the bpf program and may cause ref_count
underflow later on.
I then further checked some other hooks. For example,
for bpf_lsm_file_alloc() hook in fs/file_table.c,
f->f_cred = get_cred(cred);
error = security_file_alloc(f);
if (unlikely(error)) {
file_free_rcu(&f->f_rcuhead);
return ERR_PTR(error);
}
atomic_long_set(&f->f_count, 1);
The input parameter 'f' to security_file_alloc() cannot be trusted
as well.
Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free
lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other
hooks should not be considered as trusted. This may not be a complete
list, but it covers common usage for sk and task.
Fixes: 3f00c52393 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add MEM_RCU pointer null checking for related tests. Also
modified task_acquire test so it takes a rcu ptr 'ptr' where
'ptr = rcu_ptr->rcu_field'.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221203184607.478314-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Expand the cxl_test topology to include CFMWS's that use XOR math
for interleave arithmetic, as defined in the CXL Specification 3.0.
With this expanded topology, cxl_test is useful for testing:
x1,x2,x4 ways with XOR interleave arithmetic.
Define the additional XOR CFMWS entries to appear only with the
module parameter interleave_arithmetic=1. The cxl_test default
continues to be modulo math.
modprobe cxl_test interleave_arithmetic=1
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/54670400cd48ba7fcc6d8ee0d6ae2276d3f51aad.1669847017.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A downstream port must be connected to a component register block.
For restricted hosts the base address is determined from the RCRB. The
RCRB is provided by the host's CEDT CHBS entry. Rework CEDT parser to
get the RCRB and add code to extract the component register block from
it.
RCRB's BAR[0..1] point to the component block containing CXL subsystem
component registers. MEMBAR extraction follows the PCI base spec here,
esp. 64 bit extraction and memory range alignment (6.0, 7.5.1.2.1). The
RCRB base address is cached in the cxl_dport per-host bridge so that the
upstream port component registers can be retrieved later by an RCD
(RCIEP) associated with the host bridge.
Note: Right now the component register block is used for HDM decoder
capability only which is optional for RCDs. If unsupported by the RCD,
the HDM init will fail. It is future work to bypass it in this case.
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/Y4dsGZ24aJlxSfI1@rric.localdomain
[djbw: introduce devm_cxl_add_rch_dport()]
Link: https://lore.kernel.org/r/166993044524.1882361.2539922887413208807.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Accept any cxl_test topology device as the first argument in
cxl_chbs_context.
This is in preparation for reworking the detection of the component
registers across VH and RCH topologies. Move
mock_acpi_table_parse_cedt() beneath the definition of is_mock_port()
and use is_mock_port() instead of the explicit mock cxl_acpi device
check.
Acked-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166993043433.1882361.17651413716599606118.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this using "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/net`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1669864248-829-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When testing in kci_test_ipsec_offload, srcip is configured as $dstip,
it should add xfrm policy rule in instead of out.
The test result of this patch is as follows:
PASS: ipsec_offload
Fixes: 2766a11161 ("selftests: rtnetlink: add ipsec offload API test")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221201082246.14131-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d2825fa936 ("crypto: sm3,sm4 - move into crypto directory") moves
SM3 and SM4 algorithm implementations from stand-alone library to crypto
API. The corresponding configuration options for the API version (generic)
are CONFIG_CRYPTO_SM3_GENERIC and CONFIG_CRYPTO_SM4_GENERIC, respectively.
Replace option selected in selftests configuration from the library version
to the API version.
Fixes: d2825fa936 ("crypto: sm3,sm4 - move into crypto directory")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221201131852.38501-1-tianjia.zhang@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bpf_legacy.h header uses llvm specific load functions, add
GCC compatible variants as well to fix tests using these functions
under GCC.
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221201190939.3230513-1-james.hilliard1@gmail.com
In the "mode_filter_without_nnp" test in seccomp_bpf, there is currently
a TODO which asks to check the capability CAP_SYS_ADMIN instead of euid.
This patch adds support to check if the calling process has the flag
CAP_SYS_ADMIN, and also if this flag has CAP_EFFECTIVE set.
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220731092529.28760-1-gautammenghani201@gmail.com
There is a spelling mistake in some help text. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-Id: <20221201091354.1613652-1-colin.i.king@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to
match the kernel nomenclature where test_and_set_bit() is atomic,
and __test_and_set_bit() provides the non-atomic variant.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without affecting
users that don't want atomic operations.
Leave the usage in ucall_free() as-is, it's the one place in tools/ that
actually wants/needs atomic behavior.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new ucall hook, GUEST_UCALL_NONE(), to allow tests to make ucalls
without allocating a ucall struct, and use it to enable single-step
in ARM's debug-exceptions test. Like the disable single-step path, the
enabling path also needs to ensure that no exclusive access sequences are
attempted after enabling single-step, as the exclusive monitor is cleared
on ERET from the debug exception taken to EL2.
The test currently "works" because clear_bit() isn't actually an atomic
operation... yet.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
- Clean up the MSR filter docs.
- Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmOJQesSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5IR4QAKGPbLRykY/2FohV2HDu5fDPxA2Fe9nu
5W7ZIptQu+tQtCTWKFEjcQdwYoNrLbr0hr1eGubVbIvBqJbwPQfH7G0765eOIcvy
s6Zn2N24IisIoUxdkJGOL3Tt1UR7wCFbwC+ms0i4FQvMcw+TbM0BTHgJDdwR5laX
mGN7ubz5iImwDFFE3Bd8Qy5I+FGL9CI60l+RzK6b7J8HYi1wOBMLU9QueF/dB7gR
g+navZJAAnvN6YIkjP5j8yPBuvhDzni379ue5ATDq1ALvyyI7xlYALsxpUjCnLuo
CkbvgmfmC94Vdm7pzFgsbazUN2oIjwoimjFQHP1bf8Jmd+770R282JpnwiD/ydCV
Tl2ArwXA2zxVxNZm9g/XqPBwWBWWgWfYIQbuuxc065MnXCnHkY5UGGf0JNx41CDl
hdtm9DHkft8+6kyBBmgkdKxd328Znljq02v3nLePUipfpDVaNj4VAUj9IpV6Lpuj
GJjs4Wx7oqFwH1Im/LqZgnOGwgkSj3ObHtkYx2RSrQAQultbjuplFz2qZWP8PF6A
FrJbcddKOmLINrdNOlvTd5WKCAjtV8vycjFkk+/7H67rpZdM8AI1StrzMP6gmwg4
ARozZJ2UF8nTriRYFQbFQyNm9bBTZ7YQ/HajqfhvCuZLi7i1EaImhC0F1xn7IU5S
6XhvQPvjRTgS
=i6OA
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.2-1' of https://github.com/kvm-x86/linux into HEAD
Misc KVM x86 fixes and cleanups for 6.2:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
- Clean up the MSR filter docs.
- Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
On 32bit the trigger-synthetic-eprobe.tc selftest fails with the error:
hist:syscalls:sys_exit_openat: error: Param type doesn't match synthetic event field type
Command: hist:keys=common_pid:filename=$__arg__1,ret=ret:onmatch(syscalls.sys_enter_openat).trace(synth_open,$filename,$ret)
^
This is because the synth_open synthetic event is created with:
echo "$SYNTH u64 filename; s64 ret;" > synthetic_events
Which works fine on 64 bit, as filename is a pointer and the return is
also a long. But for 32 bit architectures, it doesn't work.
Use "unsigned long" and "long" instead so that it works for both 64 bit
and 32 bit architectures.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- malloc() does not zero the buffer,
- fread() does not null-terminate it's output,
- `cat /proc/sys/kernel/core_pattern | hexdump -C` shows the file is
not inherently null-terminated
So using string operations on the buffer is risky. Explicitly add a null
character to the end to make it safer.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-3-bgray@linux.ibm.com
No need to write inline asm for mtspr/mfspr, we have macros for this
in reg.h
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-2-bgray@linux.ibm.com
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this using "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/powerpc`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1669862997-31335-1-git-send-email-yangtiezhu@loongson.cn
This test adds a basic HSRv0 network with 3 nodes. In its current shape
it sends and forwards packets, announcements and so merges nodes based
on MAC A/B information.
It is able to detect duplicate packets and packetloss should any occur.
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds test coverage for listening sockets created by the
in-kernel path manager in mptcp_join.sh.
It adds the listener event checking in the existing "remove single
address with port" test. The output looks like this:
003 remove single address with port syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ] - pt [ ok ]
syn[ ok ] - synack[ ok ] - ack[ ok ]
syn[ ok ] - ack [ ok ]
rm [ ok ] - rmsf [ ok ] invert
CREATE_LISTENER 10.0.2.1:10100[ ok ]
CLOSE_LISTENER 10.0.2.1:10100 [ ok ]
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch moves evts_ns1 and evts_ns2 out of do_transfer() as two global
variables in mptcp_join.sh. Init them in init() and remove them in
cleanup().
Add a new helper reset_with_events() to save the outputs of 'pm_nl_ctl
events' command in them. And a new helper kill_events_pids() to kill
pids of 'pm_nl_ctl events' command. Use these helpers in userspace pm
tests.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds test coverage for listening sockets created by userspace
processes.
It adds a new test named test_listener() and a new verifying helper
verify_listener_events(). The new output looks like this:
CREATE_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK]
DESTROY_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK]
MP_PRIO TX [OK]
MP_PRIO RX [OK]
CREATE_LISTENER 10.0.2.2:37106 [OK]
CLOSE_LISTENER 10.0.2.2:37106 [OK]
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch makes server_evts and client_evts global in userspace_pm.sh,
then these two variables could be used in test_announce(), test_remove()
and test_subflows(). The local variable 'evts' in these three functions
then could be dropped.
Also move local variable 'file' as a global one.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some userspace pm tests failed since pm listener events have been added.
Now MPTCP_EVENT_LISTENER_CREATED event becomes the first item in the
events list like this:
type:15,family:2,sport:10006,saddr4:0.0.0.0
type:1,token:3701282876,server_side:1,family:2,saddr4:10.0.1.1,...
And no token value in this MPTCP_EVENT_LISTENER_CREATED event.
This patch fixes this by specifying the type 1 item to search for token
values.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Just to avoid classical Bash pitfall where variables are accidentally
overridden by other functions because the proper scope has not been
defined.
That's also what is done in other MPTCP selftests scripts where all non
local variables are defined at the beginning of the script and the
others are defined with the "local" keyword.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is clearer to declare these global variables at the beginning of the
file as it is done in other MPTCP selftests rather than in functions in
the middle of the script.
So for uniformity reason, we can do the same here in mptcp_sockopt.sh.
Suggested-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The definition of 'rndh' was probably copied from one script to another
but some times, 'sec' was not defined, not used and/or not spelled
properly.
Here all the 'rndh' are now defined the same way.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some variables were set but never used.
This was not causing any issues except adding some confusion and having
shellcheck complaining about them.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new "sandbox" net namespace is available where no other netfilter
rules have been added.
Use this new netns instead of re-using "ns1" and clean it.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Modify list_push_pop_multiple to alloc and insert nodes 2-at-a-time.
Without the previous patch's fix, this block of code:
bpf_spin_lock(lock);
bpf_list_push_front(head, &f[i]->node);
bpf_list_push_front(head, &f[i + 1]->node);
bpf_spin_unlock(lock);
would fail check_reference_leak check as release_on_unlock logic would miss
a ref that should've been released.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221201183406.1203621-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Define and use kvm_static_assert() in the common KVM selftests headers to
provide deterministic behavior, and to allow creating static asserts
without dummy messages.
The kernel's static_assert() makes the message param optional, and on the
surface, tools/include/linux/build_bug.h appears to follow suit. However,
glibc may override static_assert() and redefine it as a direct alias of
_Static_assert(), which makes the message parameter mandatory. This leads
to non-deterministic behavior as KVM selftests code that utilizes
static_assert() without a custom message may or not compile depending on
the order of includes. E.g. recently added asserts in
x86_64/processor.h fail on some systems with errors like
In file included from lib/memstress.c:11:0:
include/x86_64/processor.h: In function ‘this_cpu_has_p’:
include/x86_64/processor.h:193:34: error: expected ‘,’ before ‘)’ token
static_assert(low_bit < high_bit); \
^
due to _Static_assert() expecting a comma before a message. The "message
optional" version of static_assert() uses macro magic to strip away the
comma when presented with empty an __VA_ARGS__
#ifndef static_assert
#define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
#define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
#endif // static_assert
and effectively generates "_Static_assert(expr, #expr)".
The incompatible version of static_assert() gets defined by this snippet
in /usr/include/assert.h:
#if defined __USE_ISOC11 && !defined __cplusplus
# undef static_assert
# define static_assert _Static_assert
#endif
which yields "_Static_assert(expr)" and thus fails as above.
KVM selftests don't actually care about using C11, but __USE_ISOC11 gets
defined because of _GNU_SOURCE, which many tests do #define. _GNU_SOURCE
triggers a massive pile of defines in /usr/include/features.h, including
_ISOC11_SOURCE:
/* If _GNU_SOURCE was defined by the user, turn on all the other features. */
#ifdef _GNU_SOURCE
# undef _ISOC95_SOURCE
# define _ISOC95_SOURCE 1
# undef _ISOC99_SOURCE
# define _ISOC99_SOURCE 1
# undef _ISOC11_SOURCE
# define _ISOC11_SOURCE 1
# undef _POSIX_SOURCE
# define _POSIX_SOURCE 1
# undef _POSIX_C_SOURCE
# define _POSIX_C_SOURCE 200809L
# undef _XOPEN_SOURCE
# define _XOPEN_SOURCE 700
# undef _XOPEN_SOURCE_EXTENDED
# define _XOPEN_SOURCE_EXTENDED 1
# undef _LARGEFILE64_SOURCE
# define _LARGEFILE64_SOURCE 1
# undef _DEFAULT_SOURCE
# define _DEFAULT_SOURCE 1
# undef _ATFILE_SOURCE
# define _ATFILE_SOURCE 1
#endif
which further down in /usr/include/features.h leads to:
/* This is to enable the ISO C11 extension. */
#if (defined _ISOC11_SOURCE \
|| (defined __STDC_VERSION__ && __STDC_VERSION__ >= 201112L))
# define __USE_ISOC11 1
#endif
To make matters worse, /usr/include/assert.h doesn't guard against
multiple inclusion by turning itself into a nop, but instead #undefs a
few macros and continues on. As a result, it's all but impossible to
ensure the "message optional" version of static_assert() will actually be
used, e.g. explicitly including assert.h and #undef'ing static_assert()
doesn't work as a later inclusion of assert.h will again redefine its
version.
#ifdef _ASSERT_H
# undef _ASSERT_H
# undef assert
# undef __ASSERT_VOID_CAST
# ifdef __USE_GNU
# undef assert_perror
# endif
#endif /* assert.h */
#define _ASSERT_H 1
#include <features.h>
Fixes: fcba483e82 ("KVM: selftests: Sanity check input to ioctls() at build time")
Fixes: ee37955366 ("KVM: selftests: Refactor X86_FEATURE_* framework to prep for X86_PROPERTY_*")
Fixes: 53a7dc0f21 ("KVM: selftests: Add X86_PROPERTY_* framework to retrieve CPUID values")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221122013309.1872347-1-seanjc@google.com
Move the AMX test's kvm_cpu_has() checks before creating the VM+vCPU,
there are no dependencies between the two operations. Opportunistically
add a comment to call out that enabling off-by-default XSAVE-managed
features must be done before KVM_GET_SUPPORTED_CPUID is cached.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-5-seanjc@google.com
Disallow using kvm_get_supported_cpuid() and thus caching KVM's supported
CPUID info before enabling XSAVE-managed features that are off-by-default
and must be enabled by ARCH_REQ_XCOMP_GUEST_PERM. Caching the supported
CPUID before all XSAVE features are enabled can result in false negatives
due to testing features that were cached before they were enabled.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-4-seanjc@google.com
Move __vm_xsave_require_permission() below the CPUID helpers so that a
future change can reference the cached result of KVM_GET_SUPPORTED_CPUID
while keeping the definition of the variable close to its intended user,
kvm_get_supported_cpuid().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-3-seanjc@google.com
Move the kvm_cpu_has() check on X86_FEATURE_XFD out of the helper to
enable off-by-default XSAVE-managed features and into the one test that
currenty requires XFD (XFeature Disable) support. kvm_cpu_has() uses
kvm_get_supported_cpuid() and thus caches KVM_GET_SUPPORTED_CPUID, and so
using kvm_cpu_has() before ARCH_REQ_XCOMP_GUEST_PERM effectively results
in the test caching stale values, e.g. subsequent checks on AMX_TILE will
get false negatives.
Although off-by-default features are nonsensical without XFD, checking
for XFD virtualization prior to enabling such features isn't strictly
required.
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Fixes: 7fbb653e01 ("KVM: selftests: Check KVM's supported CPUID, not host CPUID, for XFD")
Link: https://lore.kernel.org/r/20221125023839.315207-1-lei4.wang@intel.com
[sean: add Fixes, reword changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-2-seanjc@google.com
Restore the assert (on x86-64) that <10% of pages are still idle when NOT
running as a nested VM in the access tracking test. The original assert
was converted to a "warning" to avoid false failures when running the
test in a VM, but the non-nested case does not suffer from the same
"infinite TLB size" issue.
Using the HYPERVISOR flag isn't infallible as VMMs aren't strictly
required to enumerate the "feature" in CPUID, but practically speaking
anyone that is running KVM selftests in VMs is going to be using a VMM
and hypervisor that sets the HYPERVISOR flag.
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-3-seanjc@google.com
Warn if the number of idle pages is greater than or equal to 10% of the
total number of pages, not if the percentage of idle pages is less than
10%. The original code asserted that less than 10% of pages were still
idle, but the check got inverted when the assert was converted to a
warning.
Opportunistically clean up the warning; selftests are 64-bit only, there
is no need to use "%PRIu64" instead of "%lu".
Fixes: 6336a810db ("KVM: selftests: replace assertion with warning in access_tracking_perf_test")
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-2-seanjc@google.com
nfit_test overrode the security_show() sysfs attribute function in nvdimm
dimm_devs in order to allow testing of security unlock. With the
introduction of CXL security commands, the trick to override
security_show() becomes significantly more complicated. By introdcing a
security flag CONFIG_NVDIMM_SECURITY_TEST, libnvdimm can just toggle the
check via a compile option. In addition the original override can can be
removed from tools/testing/nvdimm/.
The flag will also be used to bypass cpu_cache_invalidate_memregion() when
set in a different commit. This allows testing on QEMU with nfit_test or
cxl_test since cpu_cache_has_invalidate_memregion() checks whether
X86_FEATURE_HYPERVISOR cpu feature flag is set on x86.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983618758.2734609.18031639517065867138.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The mock cxl mem devs needs a way to go into "locked" status to simulate
when the platform is rebooted. Add a sysfs mechanism so the device security
state is set to "locked" and the frozen state bits are cleared.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983617602.2734609.7042497620931694717.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add support to emulate a CXL mem device support the "Disable Passphrase"
operation. The operation supports disabling of either a user or a master
passphrase. The emulation will provide support for both user and master
passphrase.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983612447.2734609.2767804273351656413.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add support to emulate a CXL mem device supporting the "Set Passphrase"
operation. The operation supports setting of either a user or a master
passphrase.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983611314.2734609.12996309794483934484.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add the emulation support for handling "Get Security State" opcode for a
CXL memory device for the cxl_test. The function will copy back device
security state bitmask to the output payload.
The security state data is added as platform_data for the mock mem device.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983610177.2734609.4953959949148428755.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that we can skip unsupported configurations add some more test cases
using that, cover 8kHz, 44.1kHz and 96kHz plus 8kHz mono and 48kHz 6
channel.
44.1kHz is a different clock base to the existing 48kHz tests and may
therefore show problems with the clock configuration if only 8kHz based
rates are really available (or help diagnose if bad clocking is due to
only 44.1kHz based rates being supported). 8kHz mono and 48Hz 6 channel
are real world formats and should show if clocking does not account for
channel count properly.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-7-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Rather than just numbering the tests try to provide semi descriptive names
for what the tests are trying to cover. This also has the advantage of
meaning we can add more tests without having to keep the list of tests
ordered by existing number which should make it easier to understand what
we're testing and why.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-6-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The values in the one example configuration file we currently have are the
default values for the two tests we have so there's no need to actually set
them. Comment them out as examples, with a rename for the tests so that we
can update the tests in the code more easily.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-5-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If constraint selection gives us a number of channels other than the one
that we asked for that isn't a failure, that is the device implementing
constraints and advertising that it can't support whatever we asked
for. Report such cases as a test skip rather than failure so we don't have
false positives.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-4-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If constraint selection gives us a sample rate other than the one that we
asked for that isn't a failure, that is the device implementing sample
rate constraints and advertising that it can't support whatever we asked
for. Report such cases as a test skip rather than failure so we don't have
false positives.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-3-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In order to help make the list of tests a bit easier to maintain refactor
things so we pass the tests around as a struct with the parameters in,
enabling us to add new tests by adding to a table with comments saying
what each of the number are. We could also use named initializers if we get
more parameters.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-2-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When everything is starting up we are likely to have a lot of child
processes producing output at once. This means that we can reduce
overhead a bit by allowing epoll_wait() to return more than one
descriptor at once, it cuts down on the number of system calls we need
to do which on virtual platforms where the syscall overhead is a bit
more noticable and we're likely to have a lot more children active can
make a small but noticable difference.
On physical platforms the relatively small number of processes being run
and vastly improved speeds push the effects of this change into the
noise.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Now we hold execution of the stress test programs until all children are
started there is no need to drain output while that is happening.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
At present fp-stress has a bit of a thundering herd problem since the
children it spawns start running immediately, meaning that they can start
starving the parent process of CPU before it has even started all the
children. This is much more severe on virtual platforms since they tend to
support far more SVE and SME vector lengths, be slower in general and for
some have issues with performance when simulating multiple CPUs.
We can mitigate this problem by having all the child processes block before
starting the test program, meaning that we at least have all the child
processes started before we start heavily using CPU. We still have the same
load issues while waiting for the actual stress test programs to start up
and produce output but they're at least all ready to go before that kicks
in, resulting in substantial reductions in overall runtime on some of the
severely affected systems. One test was showing about 20% improvement.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The test prog dumps a single AF_UNIX socket's UID with and without
unshare(CLONE_NEWUSER) and checks if it matches the result of getuid().
Without the preceding patch, the test prog is killed by a NULL deref
in sk_diag_dump_uid().
# ./diag_uid
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN diag_uid.uid.1 ...
BUG: kernel NULL pointer dereference, address: 0000000000000270
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 105212067 P4D 105212067 PUD 1051fe067 PMD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
RIP: 0010:sk_diag_fill (./include/net/sock.h:920 net/unix/diag.c:119 net/unix/diag.c:170)
...
# 1: Test terminated unexpectedly by signal 9
# FAIL diag_uid.uid.1
not ok 1 diag_uid.uid.1
# RUN diag_uid.uid_unshare.1 ...
# 1: Test terminated by timeout
# FAIL diag_uid.uid_unshare.1
not ok 2 diag_uid.uid_unshare.1
# FAILED: 0 / 2 tests passed.
# Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
With the patch, the test succeeds.
# ./diag_uid
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN diag_uid.uid.1 ...
# OK diag_uid.uid.1
ok 1 diag_uid.uid.1
# RUN diag_uid.uid_unshare.1 ...
# OK diag_uid.uid_unshare.1
ok 2 diag_uid.uid_unshare.1
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add nvdimm_security_ops support for CXL memory device with the introduction
of the ->get_flags() callback function. This is part of the "Persistent
Memory Data-at-rest Security" command set for CXL memory device support.
The ->get_flags() function provides the security state of the persistent
memory device defined by the CXL 3.0 spec section 8.2.9.8.6.1.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983609611.2734609.13231854299523325319.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Verify the KVM allows userspace to set all supported bits in the
IA32_FEATURE_CONTROL MSR irrespective of the current guest CPUID, and
that all unsupported bits are rejected.
Throw the testcase into vmx_msrs_test even though it's not technically a
VMX MSR; it's close enough, and the most frequently feature controlled by
the MSR is VMX.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220607232353.3375324-4-seanjc@google.com
Cover the essential functionality of the iommufd with a directed test from
userspace. This aims to achieve reasonable functional coverage using the
in-kernel self test framework.
A second test does a failure injection sweep of the success paths to study
error unwind behaviors.
This allows achieving high coverage of the corner cases in pages.c.
The selftest requires CONFIG_IOMMUFD_TEST to be enabled, and several huge
pages which may require:
echo 4 > /proc/sys/vm/nr_hugepages
Link: https://lore.kernel.org/r/19-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Eric Auger <eric.auger@redhat.com> # aarch64
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This test was overlooked with a hard-coded mntpoint path in test when
we're removing the hugetlb mntpoint in commit 0796c7b8be. Fix it up so
the test can keep running.
Link: https://lkml.kernel.org/r/Y3aojfUC2nSwbCzB@x1n
Fixes: 0796c7b8be ("selftests/vm: drop mnt point for hugetlb in run_vmtests.sh")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's test whether R/O long-term pinning is reliable for non-anonymous
memory: when R/O long-term pinning a page, the expectation is that we
break COW early before pinning, such that actual write access via the
page tables won't break COW later and end up replacing the R/O-pinned
page in the page table.
Consequently, R/O long-term pinning in private mappings would only target
exclusive anonymous pages.
For now, all tests fail:
# [RUN] R/O longterm GUP pin ... with shared zeropage
not ok 151 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with memfd
not ok 152 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with tmpfile
not ok 153 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with huge zeropage
not ok 154 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with memfd hugetlb (2048 kB)
not ok 155 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with memfd hugetlb (1048576 kB)
not ok 156 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with shared zeropage
not ok 157 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with memfd
not ok 158 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with tmpfile
not ok 159 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with huge zeropage
not ok 160 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (2048 kB)
not ok 161 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (1048576 kB)
not ok 162 Longterm R/O pin is reliable
Link: https://lkml.kernel.org/r/20221116102659.70287-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add basic tests for COW with non-anonymous pages in private
mappings: write access should properly trigger COW and result in the
private changes not being visible through other page mappings.
Especially, add tests for:
* Zeropage
* Huge zeropage
* Ordinary pagecache pages via memfd and tmpfile()
* Hugetlb pages via memfd
Fortunately, all tests pass.
Link: https://lkml.kernel.org/r/20221116102659.70287-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O
long-term pinning)".
For now, we did not support reliable R/O long-term pinning in COW
mappings. That means, if we would trigger R/O long-term pinning in
MAP_PRIVATE mapping, we could end up pinning the (R/O-mapped) shared
zeropage or a pagecache page.
The next write access would trigger a write fault and replace the pinned
page by an exclusive anonymous page in the process page table; whatever
the process would write to that private page copy would not be visible by
the owner of the previous page pin: for example, RDMA could read stale
data. The end result is essentially an unexpected and hard-to-debug
memory corruption.
Some drivers tried working around that limitation by using
"FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now.
FOLL_WRITE would trigger a write fault, if required, and break COW before
pinning the page. FOLL_FORCE is required because the VMA might lack write
permissions, and drivers wanted to make that working as well, just like
one would expect (no write access, but still triggering a write access to
break COW).
However, that is not a practical solution, because
(1) Drivers that don't stick to that undocumented and debatable pattern
would still run into that issue. For example, VFIO only uses
FOLL_LONGTERM for R/O long-term pinning.
(2) Using FOLL_WRITE just to work around a COW mapping + page pinning
limitation is unintuitive. FOLL_WRITE would, for example, mark the
page softdirty or trigger uffd-wp, even though, there actually isn't
going to be any write access.
(3) The purpose of FOLL_FORCE is debug access, not access without lack of
VMA permissions by arbitrarty drivers.
So instead, make R/O long-term pinning work as expected, by breaking COW
in a COW mapping early, such that we can remove any FOLL_FORCE usage from
drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE).
More details in patch #8.
This patch (of 19):
Originally, the plan was to have a separate tests for testing COW of
non-anonymous (e.g., shared zeropage) pages.
Turns out, that we'd need a lot of similar functionality and that there
isn't a really good reason to separate it. So let's prepare for non-anon
tests by renaming to "cow".
Link: https://lkml.kernel.org/r/20221116102659.70287-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221116102659.70287-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nelson Escobar <neescoba@cisco.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When testing overflow and overread, there is no need to keep unnecessary
compilation warnings, we should simply ignore them.
The motivation for this patch is to eliminate the compilation warning,
maybe one day we will compile the kernel with "-Werror -Wall", at which
point this compilation warning will turn into a compilation error, we
should fix this error in advance.
How to reproduce the problem (with gcc-11.3.1):
$ make -C tools/testing/selftests/
...
warning: `write' reading 4294967295 bytes from a region of size 1
[-Wstringop-overread]
warning: `read' writing 4294967295 bytes into a region of size 25
overflows the destination [-Wstringop-overflow=]
"-Wno-stringop-overread" is supported at least in gcc-11.1.0.
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d14c547abd484d3540b692bb8048c4a6efe92c8b
Link: https://lkml.kernel.org/r/tencent_51C4ACA8CB3895C2D7F35178440283602107@qq.com
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's extend the test to cover the possible mprotect() optimization when
removing write-protection. mprotect() must not allow write-access to a
COW-shared page by accident.
Link: https://lkml.kernel.org/r/20221108174652.198904-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 8ebe0a5eaa ("mm,madvise,hugetlb: fix unexpected data loss with
MADV_DONTNEED on hugetlbfs") changed how the passed length was interpreted
for hugetlb mappings. It was changed from align up to align down. The
hugetlb-madvise test explicitly tests this behavior. Change test to
expect new behavior.
Link: https://lkml.kernel.org/r/20221104011632.357049-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/oe-lkp/202211040619.2ec447d7-oliver.sang@intel.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a simple test case for ensuring tried_regions directory existence.
Link: https://lkml.kernel.org/r/20221101220328.95765-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The networking programs typically don't require CAP_PERFMON, but through kfuncs
like bpf_cast_to_kern_ctx() they can access memory through PTR_TO_BTF_ID. In
such case enforce CAP_PERFMON.
Also make sure that only GPL programs can access kernel data structures.
All kfuncs require GPL already.
Also remove allow_ptr_to_map_access. It's the same as allow_ptr_leaks and
different name for the same check only causes confusion.
Fixes: fd264ca020 ("bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx")
Fixes: 50c6b8a9ae ("selftests/bpf: Add a test for btf_type_tag "percpu"")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221125220617.26846-1-alexei.starovoitov@gmail.com
BPF CI fails for arm64 and s390x each with the following result:
[...]
All error logs:
serial_test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
serial_test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
libbpf: prog 'test_kprobe_empty': failed to attach: Operation not supported
serial_test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -95
#92 kprobe_multi_bench_attach:FAIL
[...]
Add the test to the deny list.
Fixes: 5b6c7e5c44 ("selftests/bpf: Add attach bench test")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add simple test cases for DAMON_LRU_SORT's 'enabled' parameter. Those
tests are focusing on the synchronous behavior of DAMON_RECLAIM enabling
and disabling.
Link: https://lkml.kernel.org/r/20221025173650.90624-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add simple test cases for DAMON_RECLAIM's 'enabled' parameter. Those
tests are focusing on the synchronous behavior of DAMON_RECLAIM enabling
and disabling.
Link: https://lkml.kernel.org/r/20221025173650.90624-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
icmp conntrack will set icmp redirects as RELATED, but icmpv6 will not
do this.
For icmpv6, only icmp errors (code <= 128) are examined for RELATED state.
ICMPV6 Redirects are part of neighbour discovery mechanism, those are
handled by marking a selected subset (e.g. neighbour solicitations) as
UNTRACKED, but not REDIRECT -- they will thus be flagged as INVALID.
Add minimal support for REDIRECTs. No parsing of neighbour options is
added for simplicity, so this will only check that we have the embeeded
original header (ND_OPT_REDIRECT_HDR), and then attempt to do a flow
lookup for this tuple.
Also extend the existing test case to cover redirects.
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Eric Garver <eric@garver.life>
Link: https://github.com/firewalld/firewalld/issues/1046
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Just a simple test to make sure we don't introduce unwanted compiler
warnings and API still supports passing enums as input argument.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221130200013.2997831-2-andrii@kernel.org
This patch removes the need to pin prog when attaching to tc ingress
in the btf_skc_cls_ingress test. Instead, directly use the
bpf_tc_hook_create() and bpf_tc_attach(). The qdisc clsact
will go away together with the netns, so no need to
bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-8-martin.lau@linux.dev
After removing the mount/umount dance from {open,close}_netns()
in the pervious patch, "serial_" can be removed from
the tests using {open,close}_netns().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-7-martin.lau@linux.dev
The previous patches have removed the need to do the mount and umount
dance when switching netns. In particular:
* Avoid remounting /sys/fs/bpf to have a clean start
* Avoid remounting /sys to get a ifindex of a particular netns
This patch can finally remove the mount and umount dance in
{open,close}_netns which is unnecessarily complicated and
error-prone.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-6-martin.lau@linux.dev
This patch removes the need to pin prog in the remaining tests in
tc_redirect.c by directly using the bpf_tc_hook_create() and
bpf_tc_attach(). The clsact qdisc will go away together with
the test netns, so no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-5-martin.lau@linux.dev
This patch removes the need to pin prog in the tc_redirect_peer_l3
test by directly using the bpf_tc_hook_create() and bpf_tc_attach().
The clsact qdisc will go away together with the test netns, so
no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-4-martin.lau@linux.dev
This patch removes the need to pin prog in the tc_redirect_dtime
test by directly using the bpf_tc_hook_create() and bpf_tc_attach().
The clsact qdisc will go away together with the test netns, so
no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-3-martin.lau@linux.dev
When switching netns, the setns_by_fd() is doing dances in mount/umounting
the /sys directories. One reason is the tc_redirect.c test is depending
on the /sys/net/class/*/ifindex instead of using the if_nametoindex().
if_nametoindex() uses ioctl() to get the ifindex.
This patch is to move all /sys/net/class/*/ifindex usages to
if_nametoindex(). The current code checks ifindex >= 0 which is
incorrect. ifindex > 0 should be checked instead. This patch also
stores ifindex_veth_src and ifindex_veth_dst since the latter patch
will need them.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-2-martin.lau@linux.dev
Torture test the cases where the runstate crosses a page boundary, and
and especially the case where it's configured in 32-bit mode and doesn't,
but then switching to 64-bit mode makes it go onto the second page.
To simplify this, make the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST ioctl
also update the guest runstate area. It already did so if the actual
runstate changed, as a side-effect of kvm_xen_update_runstate(). So
doing it in the plain adjustment case is making it more consistent, as
well as giving us a nice way to trigger the update without actually
running the vCPU again and changing the values.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Closer inspection of the Xen code shows that we aren't supposed to be
using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
If we randomly set the top bit of ->state_entry_time for a guest that
hasn't asked for it and doesn't expect it, that could make the runtimes
fail to add up and confuse the guest. Without the flag it's perfectly
safe for a vCPU to read its own vcpu_runstate_info; just not for one
vCPU to read *another's*.
I briefly pondered adding a word for the whole set of VMASST_TYPE_*
flags but the only one we care about for HVM guests is this, so it
seemed a bit pointless.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The guest runstate area can be arbitrarily byte-aligned. In fact, even
when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
fields in the structure end up being unaligned due to the fact that the
32-bit ABI only aligns them to 32 bits.
So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
is buggy, because if it's unaligned then we can't update the whole field
atomically; the low bytes might be observable before the _UPDATE bit is.
Xen actually updates the *byte* containing that top bit, on its own. KVM
should do the same.
In addition, we cannot assume that the runstate area fits within a single
page. One option might be to make the gfn_to_pfn cache cope with regions
that cross a page — but getting a contiguous virtual kernel mapping of a
discontiguous set of IOMEM pages is a distinctly non-trivial exercise,
and it seems this is the *only* current use case for the GPC which would
benefit from it.
An earlier version of the runstate code did use a gfn_to_hva cache for
this purpose, but it still had the single-page restriction because it
used the uhva directly — because it needs to be able to do so atomically
when the vCPU is being scheduled out, so it used pagefault_disable()
around the accesses and didn't just use kvm_write_guest_cached() which
has a fallback path.
So... use a pair of GPCs for the first and potential second page covering
the runstate area. We can get away with locking both at once because
nothing else takes more than one GPC lock at a time so we can invent
a trivial ordering rule.
The common case where it's all in the same page is kept as a fast path,
but in both cases, the actual guest structure (compat or not) is built
up from the fields in @vx, following preset pointers to the state and
times fields. The only difference is whether those pointers point to
the kernel stack (in the split case) or to guest memory directly via
the GPC. The fast path is also fixed to use a byte access for the
XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual
memcpy.
Finally, Xen also does write the runstate area immediately when it's
configured. Flip the kvm_xen_update_runstate() and …_guest() functions
and call the latter directly when the runstate area is set. This means
that other ioctls which modify the runstate also write it immediately
to the guest when they do so, which is also intended.
Update the xen_shinfo_test to exercise the pathological case where the
XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is
actually in a different page to the rest of the 64-bit word.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The systemwide perf hardware breakpoint test tries to open a perf event
on each cpu. On large systems, we run out of file descriptors and fail
the test. Instead, have the test set the file descriptor limit to an
arbitraty high value.
Reported-by: Rohan Deshpande <rohan_d@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/187fed5843cecc1e5066677b6296ee88337d7bef.1669096083.git.naveen.n.rao@linux.vnet.ibm.com
This patch first adds TFO support in mptcp_connect.c.
This can be enabled via a new option: -o MPTFO.
Once enabled, the TCP_FASTOPEN socket option is enabled for the server
side and a sendto() with MSG_FASTOPEN is used instead of a connect() for
the client side.
Note that the first SYN has a limit of bytes it can carry. In other
words, it is allowed to send less data than the provided one. We then
need to track more status info to properly allow the next sendmsg()
starting from the next part of the data to send the rest.
Also in TFO scenarios, we need to completely spool the partially xmitted
buffer -- and account for that -- before starting sendfile/mmap xmit,
otherwise the relevant tests will fail.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the async test case was introduced, despite being a completely
independent test case, the command to run it was added to the same shell
script as the smoke test case. Since a shell script implicitly returns
the error code from the last run command, this effectively caused the
script to only return as error code the result from the async test case,
hiding the smoke test result (which could then only be seen from the
python unittest logs).
Move the async test case call to its own shell script runner to avoid
the aforementioned issue. This also makes the output clearer to read,
since each kselftest KTAP result now matches with one python unittest
report.
While at it, also make it so the async test case is skipped if
/dev/tpmrm0 doesn't exist, since commit 8335adb8f9 ("selftests: tpm:
add async space test with noneexisting handle") added a test that relies
on it.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
sysfs now supports splice_* operations with
commit f2d6c2708b ("kernfs: wire up ->splice_read and ->splice_write")
Update the selftests to expect success instead of failure.
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Siddharth Gupta <sidgup@codeaurora.org
Reported-by: Dilip Kota <dilip.kota@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmOGOdYACgkQMUZtbf5S
IrsknQ//SAoOyDOEu15YzOt8hAupLKoF6MM+D0dwwTEQZLf7IVXCjPpkKtVh7Si7
YCBoyrqrDs7vwaUrVoKY19Amwov+EYrHCpdC+c7wdZ7uxTaYfUbJJUGmxYOR179o
lV1+1Aiqg9F9C6CUsmZ5lDN2Yb7/uPDBICIV8LM+VzJAtXjurBVauyMwAxLxPOAr
cgvM+h5xzE7DXMF2z8R/mUq5MSIWoJo9hy2UwbV+f2liMTQuw9rwTbyw3d7+H/6p
xmJcBcVaABjoUEsEhld3NTlYbSEnlFgCQBfDWzf2e4y6jBxO0JepuIc7SZwJFRJY
XBqdsKcGw5RkgKbksKUgxe126XFX0SUUQEp0UkOIqe15k7eC2yO9uj1gRm6OuV4s
J94HKzHX9WNV5OQ790Ed2JyIJScztMZlNFVJ/cz2/+iKR42xJg6kaO6Rt2fobtmL
VC2cH+RfHzLl+2+7xnfzXEDgFePSBlA02Aq1wihU3zB3r7WCFHchEf9T7sGt1QF0
03R+8E3+N2tYqphPAXyDoy6kXQJTPxJHAe1FNHJlwgfieUDEWZi/Pm+uQrKIkDeo
oq9MAV2QBNSD1w4wl7cXfvicO5kBr/OP6YBqwkpsGao2jCSIgkWEX2DRrUaLczXl
5/Z+m/gCO5tAEcVRYfMivxUIon//9EIhbErVpHTlNWpRHk24eS4=
=0Lnw
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and wifi.
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF"
* tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
MAINTAINERS: Update maintainer list for chelsio drivers
ionic: update MAINTAINERS entry
sctp: fix memory leak in sctp_stream_outq_migrate()
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
net/mlx5: Lag, Fix for loop when checking lag
Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path"
net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions
net: tun: Fix use-after-free in tun_detach()
net: mdiobus: fix unbalanced node reference count
net: hsr: Fix potential use-after-free
tipc: re-fetch skb cb after tipc_msg_validate
mptcp: fix sleep in atomic at close time
mptcp: don't orphan ssk in mptcp_close()
dsa: lan9303: Correct stat name
ipv4: Fix route deletion when nexthop info is not specified
net: wwan: iosm: fix incorrect skb length
net: wwan: iosm: fix crash in peek throughput test
net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type
net: wwan: iosm: fix kernel test robot reported error
...
Signal that a test run is complete through perf_test_args instead of
having tests open code a similar solution. Ensure that the field resets
to false at the beginning of a test run as the structure is reused
between test runs, eliminating a couple of bugs:
access_tracking_perf_test hangs indefinitely on a subsequent test run,
as 'done' remains true. The bug doesn't amount to much right now, as x86
supports a single guest mode. However, this is a precondition of
enabling the test for other architectures with >1 guest mode, like
arm64.
memslot_modification_stress_test has the exact opposite problem, where
subsequent test runs complete immediately as 'run_vcpus' remains false.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
[oliver: added commit message, preserve spin_wait_for_next_iteration()]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221118211503.4049023-2-oliver.upton@linux.dev
The minimal alsa-lib configuration code is similar in both mixer
and pcm tests. Move this code to the shared conf.c source file.
Also, fix the build rules inspired by rseq tests. Build libatest.so
which is linked to the both test utilities dynamically.
Also, set the TEST_FILES variable for lib.mk.
Cc: linux-kselftest@vger.kernel.org
Cc: Shuah Khan <shuah@kernel.org>
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129085306.2345763-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY4AC5QAKCRDbK58LschI
g1e0AQCfAqduTy7mYd02jDNCV0wLphNp9FbPiP9OrQT37ABpKAEA1ulj1X59bX3d
HnZdDKuatcPZT9MV5hDLM7MFJ9GjOA4=
=fNmM
-----END PGP SIGNATURE-----
Daniel Borkmann says:
====================
bpf-next 2022-11-25
We've added 101 non-merge commits during the last 11 day(s) which contain
a total of 109 files changed, 8827 insertions(+), 1129 deletions(-).
The main changes are:
1) Support for user defined BPF objects: the use case is to allocate own
objects, build own object hierarchies and use the building blocks to
build own data structures flexibly, for example, linked lists in BPF,
from Kumar Kartikeya Dwivedi.
2) Add bpf_rcu_read_{,un}lock() support for sleepable programs,
from Yonghong Song.
3) Add support storing struct task_struct objects as kptrs in maps,
from David Vernet.
4) Batch of BPF map documentation improvements, from Maryam Tahhan
and Donald Hunter.
5) Improve BPF verifier to propagate nullness information for branches
of register to register comparisons, from Eduard Zingerman.
6) Fix cgroup BPF iter infra to hold reference on the start cgroup,
from Hou Tao.
7) Fix BPF verifier to not mark fentry/fexit program arguments as trusted
given it is not the case for them, from Alexei Starovoitov.
8) Improve BPF verifier's realloc handling to better play along with dynamic
runtime analysis tools like KASAN and friends, from Kees Cook.
9) Remove legacy libbpf mode support from bpftool,
from Sahid Orentino Ferdjaoui.
10) Rework zero-len skb redirection checks to avoid potentially breaking
existing BPF test infra users, from Stanislav Fomichev.
11) Two small refactorings which are independent and have been split out
of the XDP queueing RFC series, from Toke Høiland-Jørgensen.
12) Fix a memory leak in LSM cgroup BPF selftest, from Wang Yufen.
13) Documentation on how to run BPF CI without patch submission,
from Daniel Müller.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Link: https://lore.kernel.org/r/20221125012450.441-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY4ACtQAKCRDbK58LschI
gyIcAP4nE/chC+gYDOleloK2tQlQawM5Sa4kwHjFzBdPD3tRrAEAs6y2fJjb5vZo
OXIFKhXv5Xo3knmynkTSVSVOvB48IAE=
=Cme8
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
bpf 2022-11-25
We've added 10 non-merge commits during the last 8 day(s) which contain
a total of 7 files changed, 48 insertions(+), 30 deletions(-).
The main changes are:
1) Several libbpf ringbuf fixes related to probing for its availability,
size overflows when mmaping a 2G ringbuf and rejection of invalid
reservationsizes, from Hou Tao.
2) Fix a buggy return pointer in libbpf for attach_raw_tp function,
from Jiri Olsa.
3) Fix a local storage BPF map bug where the value's spin lock field
can get initialized incorrectly, from Xu Kuohai.
4) Two follow-up fixes in kprobe_multi BPF selftests for BPF CI,
from Jiri Olsa.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Make test_bench_attach serial
selftests/bpf: Filter out default_idle from kprobe_multi bench
bpf: Set and check spin lock value in sk_storage_map_test
bpf: Do not copy spin lock field from user in bpf_selem_alloc
libbpf: Check the validity of size in user_ring_buffer__reserve()
libbpf: Handle size overflow for user ringbuf mmap
libbpf: Handle size overflow for ringbuf mmap
libbpf: Use page size as max_entries when probing ring buffer map
bpf, perf: Use subprog name when reporting subprog ksymbol
libbpf: Use correct return pointer in attach_raw_tp
====================
Link: https://lore.kernel.org/r/20221125001034.29473-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the kernel receives a route deletion request from user space it
tries to delete a route that matches the route attributes specified in
the request.
If only prefix information is specified in the request, the kernel
should delete the first matching FIB alias regardless of its associated
FIB info. However, an error is currently returned when the FIB info is
backed by a nexthop object:
# ip nexthop add id 1 via 192.0.2.2 dev dummy10
# ip route add 198.51.100.0/24 nhid 1
# ip route del 198.51.100.0/24
RTNETLINK answers: No such process
Fix by matching on such a FIB info when legacy nexthop attributes are
not specified in the request. An earlier check already covers the case
where a nexthop ID is specified in the request.
Add tests that cover these flows. Before the fix:
# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [FAIL]
Tests passed: 11
Tests failed: 1
After the fix:
# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [ OK ]
Tests passed: 12
Tests failed: 0
No regressions in other tests:
# ./fib_nexthops.sh
...
Tests passed: 228
Tests failed: 0
# ./fib_tests.sh
...
Tests passed: 186
Tests failed: 0
Cc: stable@vger.kernel.org
Reported-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Jonas Gorski <jonas.gorski@gmail.com>
Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
Fixes: 6bf92d70e6 ("net: ipv4: fix route with nexthop object delete warning")
Fixes: 61b91eb33a ("ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221124210932.2470010-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Fixes for Xen emulation. While nobody should be enabling it in
the kernel (the only public users of the feature are the selftests),
the bug effectively allows userspace to read arbitrary memory.
* Correctness fixes for nested hypervisors that do not intercept INIT
or SHUTDOWN on AMD; the subsequent CPU reset can cause a use-after-free
when it disables virtualization extensions. While downgrading the panic
to a WARN is quite easy, the full fix is a bit more laborious; there
are also tests. This is the bulk of the pull request.
* Fix race condition due to incorrect mmu_lock use around
make_mmu_pages_available().
Generic:
* Obey changes to the kvm.halt_poll_ns module parameter in VMs
not using KVM_CAP_HALT_POLL, restoring behavior from before
the introduction of the capability
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmODI84UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPVJwgAombWOBf549JiHGPtwejuQO20nTSj
Om9pzWQ9dR182P+ju/FdqSPXt/Lc8i+z5zSXDrV3HQ6/a3zIItA+bOAUiMFvHNAQ
w/7pEb1MzVOsEg2SXGOjZvW3WouB4Z4R0PosInYjrFrRGRAaw5iaTOZHGezE44t2
WBWk1PpdMap7J/8sjNT1ble72ig9JdSW4qeJUQ1GWxHCigI5sESCQVqF446KM0jF
gTYPGX5TqpbWiIejF0yNew9yNKMi/yO4Pz8I5j3vtopeHx24DCIqUAGaEg6ykErX
vnzYbVP7NaFrqtje49PsK6i1cu2u7uFPArj0dxo3DviQVZVHV1q6tNmI4A==
=Qgei
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fixes for Xen emulation. While nobody should be enabling it in the
kernel (the only public users of the feature are the selftests),
the bug effectively allows userspace to read arbitrary memory.
- Correctness fixes for nested hypervisors that do not intercept INIT
or SHUTDOWN on AMD; the subsequent CPU reset can cause a
use-after-free when it disables virtualization extensions. While
downgrading the panic to a WARN is quite easy, the full fix is a
bit more laborious; there are also tests. This is the bulk of the
pull request.
- Fix race condition due to incorrect mmu_lock use around
make_mmu_pages_available().
Generic:
- Obey changes to the kvm.halt_poll_ns module parameter in VMs not
using KVM_CAP_HALT_POLL, restoring behavior from before the
introduction of the capability"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Update gfn_to_pfn_cache khva when it moves within the same page
KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0
KVM: x86/xen: Validate port number in SCHEDOP_poll
KVM: x86/mmu: Fix race condition in direct_page_fault
KVM: x86: remove exit_int_info warning in svm_handle_exit
KVM: selftests: add svm part to triple_fault_test
KVM: x86: allow L1 to not intercept triple fault
kvm: selftests: add svm nested shutdown test
KVM: selftests: move idt_entry to header
KVM: x86: forcibly leave nested mode on vCPU reset
KVM: x86: add kvm_leave_nested
KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use
KVM: x86: nSVM: leave nested mode on vCPU free
KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL
KVM: Avoid re-reading kvm->max_halt_poll_ns during halt-polling
KVM: Cap vcpu->halt_poll_ns before halting rather than after
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/ftrace`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/gpio`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The newly added PCM test produces a binary which is not ignored by git
when built in tree, fix that.
Fixes: aba51cd094 ("selftests: alsa - add PCM test")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20221125153654.1037868-1-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since we now flush output immediately on starting children we should ensure
that the child name is set beforehand so that any output that does get
flushed from the newly created child has the name of the child attached.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221124120722.150988-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add a few positive/negative tests to test bpf_rcu_read_lock()
and its corresponding verifier support. The new test will fail
on s390x and aarch64, so an entry is added to each of their
respective deny lists.
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221124053222.2374650-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verify when a bond is configured with {up,down}delay and the link state
of slave members flaps if there are no remaining members up the bond
should immediately select a member to bring up. (from bonding.txt
section 13.1 paragraph 4)
Suggested-by: Liang Li <liali@redhat.com>
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add some selftest testcases that validate the expected behavior of the
bpf_task_from_pid() kfunc that was added in the prior patch.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221122145300.251210-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_cgroup_ancestor() allows BPF programs to access the ancestor of a
struct cgroup *. This patch adds selftests that validate its expected
behavior.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221122055458.173143-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a selftest suite to validate the cgroup kfuncs that were
added in the prior patch.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221122055458.173143-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently LLVM fails to recognize .data.* as data section and defaults to .text
section. Later BPF backend tries to emit 4-byte NOP instruction which doesn't
exist in BPF ISA and aborts.
The fix for LLVM is pending:
https://reviews.llvm.org/D138477
While waiting for the fix lets workaround the linked_list test case
by using .bss.* prefix which is properly recognized by LLVM as BSS section.
Fix libbpf to support .bss. prefix and adjust tests.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Install a cleanup function using the trap command for signals EXIT,
SIGINT, SIGQUIT and SIGABRT. The cleanup function will perform:
1. Online the CPUs that were made offline during the test.
2. Removing the cgroups created.
3. Restoring the original /sys/kernel/debug/sched/verbose value,
currently it's left turned on, irrespective of the original
configuration value.
the test performs steps 1 and 2, on the successful runs, but not during
all of the failed runs. With the cleanup(), the system will perform all
three steps during failed/passed test runs.
Signed-off-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add checking of the test return value, otherwise it will report success
forever for test_create_read().
Fixes: dff6d2ae56 ("selftests/efivarfs: clean up test files from test_create*()")
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The selftests/net does not have proper cross-compilation support, and
does not properly state libbpf as a dependency. Mimic/copy the BPF
build from selftests/bpf, which has the nice side-effect that libbpf
is built as well.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20221119171841.2014936-1-bjorn@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
LWT_XMIT to test L3 case, TC to test L2 case.
v2:
- s/veth_ifindex/ipip_ifindex/ in two places (Martin)
- add comment about which condition triggers the rejection (Martin)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221121180340.1983627-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Alexei hit another rcu warnings because of this test.
Making test_bench_attach serial so it does not disrupts
other tests during parallel tests run.
While this change is not the fix, it should be less likely
to hit it with this test being executed serially.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221116100228.2064612-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei hit following rcu warning when running prog_test -j.
[ 128.049567] WARNING: suspicious RCU usage
[ 128.049569] 6.1.0-rc2 #912 Tainted: G O
...
[ 128.050944] kprobe_multi_link_handler+0x6c/0x1d0
[ 128.050947] ? kprobe_multi_link_handler+0x42/0x1d0
[ 128.050950] ? __cpuidle_text_start+0x8/0x8
[ 128.050952] ? __cpuidle_text_start+0x8/0x8
[ 128.050958] fprobe_handler.part.1+0xac/0x150
[ 128.050964] 0xffffffffa02130c8
[ 128.050991] ? default_idle+0x5/0x20
[ 128.050998] default_idle+0x5/0x20
It's caused by bench test attaching kprobe_multi link to default_idle
function, which is not executed in rcu safe context so the kprobe
handler on top of it will trigger the rcu warning.
Filtering out default_idle function from the bench test.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221116100228.2064612-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update sk_storage_map_test to make sure kernel does not copy user
non-zero value spin lock to kernel, and does not copy kernel spin
lock value to user.
If user spin lock value is copied to kernel, this test case will
make kernel spin on the copied lock, resulting in rcu stall and
softlockup.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20221114134720.1057939-3-xukuohai@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test closes both iterator link fd and cgroup fd, and removes the
cgroup file to make a dead cgroup before reading from cgroup iterator.
It also uses kern_sync_rcu() and usleep() to wait for the release of
start cgroup. If the start cgroup is not pinned by cgroup iterator,
reading from iterator fd will trigger use-after-free.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20221121073440.1828292-4-houtao@huaweicloud.com
Add remove_cgroup() to remove a cgroup which doesn't have any children
or live processes. It will be used by the following patch to test cgroup
iterator on a dead cgroup.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221121073440.1828292-3-houtao@huaweicloud.com
The `nettest` binary, built from `selftests/net/nettest.c`,
was expected to be found in the path during test execution of
`fcnal-test.sh` and `pmtu.sh`, leading to tests getting
skipped when the binary is not installed in the system, as can
be seen in these logs found in the wild [1]:
# TEST: vti4: PMTU exceptions [SKIP]
[ 350.600250] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 350.607421] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm6udp not supported
# TEST: vti6: PMTU exceptions (ESP-in-UDP) [SKIP]
[ 351.605102] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 351.612243] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm4udp not supported
The `unicast_extensions.sh` tests also rely on `nettest`, but
it runs fine there because it looks for the binary in the
current working directory [2]:
The same mechanism that works for the Unicast extensions tests
is here copied over to the PMTU and functional tests.
[1] https://lkft.validation.linaro.org/scheduler/job/5839508#L6221
[2] https://lkft.validation.linaro.org/scheduler/job/5839508#L7958
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conform to the rest of Hyper-V emulation selftests which have 'hyperv'
prefix. Get rid of '_test' suffix as well as the purpose of this code
is fairly obvious.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-49-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the Partition
assist page.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-48-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the
Partition assist page.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-47-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V MSR-Bitmap tests do RDMSR from L2 to exit to L1. While 'evmcs_test'
correctly clobbers all GPRs (which are not preserved), 'hyperv_svm_test'
does not. Introduce a more generic rdmsr_from_l2() to avoid code
duplication and remove hardcoding of MSRs. Do not put it in common code
because it is really just a selftests bug rather than a processor
feature that requires it.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-46-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmmcall()/vmcall() are used to exit from L2 to L1 and no concrete hypercall
ABI is currenty followed. With the introduction of Hyper-V L2 TLB flush
it becomes (theoretically) possible that L0 will take responsibility for
handling the call and no L1 exit will happen. Prevent this by stuffing RAX
(KVM ABI) and RCX (Hyper-V ABI) with 'safe' values.
While on it, convert vmmcall() to 'static inline', make it setup stack
frame and move to include/x86_64/svm_util.h.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-45-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There's no need to pollute VMX and SVM code with Hyper-V specific
stuff and allocate Hyper-V specific test pages for all test as only
few really need them. Create a dedicated struct and an allocation
helper.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-43-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In preparation to putting Hyper-V specific test pages to a dedicated
struct, move eVMCS load logic from load_vmcs(). Tests call load_vmcs()
directly and the only one which needs 'enlightened' version is
evmcs_test so there's not much gain in having this merged.
Temporary pass both GPA and HVA to load_evmcs().
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-42-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V VP assist page is not eVMCS specific, it is also used for
enlightened nSVM. Move the code to vendor neutral place.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-41-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
'struct hv_vp_assist_page' definition doesn't match TLFS. Also, define
'struct hv_nested_enlightenments_control' and use it instead of opaque
'__u64'.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-40-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
'struct hv_enlightened_vmcs' definition in selftests is not '__packed'
and so we rely on the compiler doing the right padding. This is not
obvious so it seems beneficial to use the same definition as in kernel.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-39-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a selftest for Hyper-V PV TLB flush hypercalls
(HvFlushVirtualAddressSpace/HvFlushVirtualAddressSpaceEx,
HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).
The test creates one 'sender' vCPU and two 'worker' vCPU which do busy
loop reading from a certain GVA checking the observed value. Sender
vCPU swaos the data page with another page filled with a different value.
The expectation for workers is also altered. Without TLB flush on worker
vCPUs, they may continue to observe old value. To guard against accidental
TLB flushes for worker vCPUs the test is repeated 100 times.
Hyper-V TLB flush hypercalls are tested in both 'normal' and 'XMM
fast' modes.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-38-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extend the test to check the scenario when NCI core tries to send data
to already closed device to ensure that nothing bad happens.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Three tests are added. One is from John Fastabend ({1]) which tests
tracing style access for xdp program from the kernel ctx.
Another is a tc test to test both kernel ctx tracing style access
and explicit non-ctx type cast. The third one is for negative tests
including two tests, a tp_bpf test where the bpf_rdonly_cast()
returns a untrusted ptr which cannot be used as helper argument,
and a tracepoint test where the kernel ctx is a u64.
Also added the test to DENYLIST.s390x since s390 does not currently
support calling kernel functions in JIT mode.
[1] https://lore.kernel.org/bpf/20221109215242.1279993-1-john.fastabend@gmail.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221120195442.3114844-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A previous change added a series of kfuncs for storing struct
task_struct objects as referenced kptrs. This patch adds a new
task_kfunc test suite for validating their expected behavior.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221120051004.3605026-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.
The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.
There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:
SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
struct task_struct *task,
u64 clone_flags)
{
struct task_struct *acquired, *nested;
nested = task->last_wakee;
/* Would not be rejected by the verifier. */
acquired = bpf_task_acquire(nested);
if (!acquired)
return 0;
bpf_task_release(acquired);
return 0;
}
To address this, this patch defines a new type flag called PTR_TRUSTED
which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
passed directly from the kernel as a tracepoint or struct_ops callback
argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
pointer is no longer PTR_TRUSTED. From the example above, the struct
task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
obtained from 'task->last_wakee' is not PTR_TRUSTED.
A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will add selftests to validate.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
'size' is unsigned, it never less than zero.
Link: https://lkml.kernel.org/r/20221105110611.28920-1-yuehaibing@huawei.com
Fixes: 6c26df84e1 ("selftests: cgroup: return -errno from cg_read()/cg_write() on failure")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: zefan li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add local_config.h and local_config.mk to .gitignore to avoid accidentally
checking it in.
Link: https://lkml.kernel.org/r/20221103005754.205420-1-zhaogongyi@huawei.com
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
syscall(3) returns -1 and sets errno on error, unlike "syscall"
instruction.
Systems which have <= 32/64 CPUs are unaffected. Test won't bounce
to all CPUs before completing if there are more of them.
Link: https://lkml.kernel.org/r/Y1bUiT7VRXlXPQa1@p183
Fixes: 1f5bd05476 ("proc: selftests: test /proc/uptime")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Instead of adding the whole test to DENYLIST.s390x, which also has
success test cases that should be run, just skip over failure test
cases in case the JIT does not support kfuncs.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118185938.2139616-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, tests can only request a new vaddr range by using
vm_vaddr_alloc()/vm_vaddr_alloc_page()/vm_vaddr_alloc_pages() but
these functions allocate and map physical pages too. Make it possible
to request unmapped range too.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-36-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to vm_vaddr_alloc(), virt_map() needs to reflect the mapping
in vm->vpages_mapped.
While on it, remove unneeded code wrapping in vm_vaddr_alloc().
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-35-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a selftest for Hyper-V PV IPI hypercalls
(HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx).
The test creates one 'sender' vCPU and two 'receiver' vCPU and then
issues various combinations of send IPI hypercalls in both 'normal'
and 'fast' (with XMM input where necessary) mode. Later, the test
checks whether IPIs were delivered to the expected destination vCPU[s].
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-34-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All Hyper-V specific tests issuing hypercalls need this.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-33-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
HYPERV_LINUX_OS_ID needs to be written to HV_X64_MSR_GUEST_OS_ID by
each Hyper-V specific selftest.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-32-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
from 128-bit registers. Moreover, these helpers are not used. Borrow
_kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
XMM0-XMM8 for now.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-31-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that KVM isn't littered with "struct hv_enlightenments" casts, rename
the struct to "hv_vmcb_enlightenments" to highlight the fact that the
struct is specifically for SVM's VMCB.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a union to provide hv_enlightenments side-by-side with the sw_reserved
bytes that Hyper-V's enlightenments overlay. Casting sw_reserved
everywhere is messy, confusing, and unnecessarily unsafe.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move Hyper-V's VMCB "struct hv_enlightenments" to the svm.h header so
that the struct can be referenced in "struct vmcb_control_area".
Alternatively, a dedicated header for SVM+Hyper-V could be added, a la
x86_64/evmcs.h, but it doesn't appear that Hyper-V will end up needing
a wholesale replacement for the VMCB.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move Hyper-V's VMCB enlightenment definitions to the TLFS header; the
definitions come directly from the TLFS[*], not from KVM.
No functional change intended.
[*] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_svm_enlightened_vmcb_fields
[vitaly: rename VMCB_HV_ -> HV_VMCB_ to match the rest of
hyperv-tlfs.h, keep svm/hyperv.h]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The preferred form of the str/ldr for predicate registers with an immediate
of zero is to omit the zero, and the clang built in assembler rejects the
zero immediate. Drop the immediate.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221117114130.687261-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
After commit afef88e655 ("selftests/bpf: Store BPF object files with
.bpf.o extension"), we should use xdp_dummy.bpf.o instade of xdp_dummy.o.
In addition, use the BPF_FILE variable to save the BPF object file name,
which can be better identified and modified.
Fixes: afef88e655 ("selftests/bpf: Store BPF object files with .bpf.o extension")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Daniel Müller <deso@posteo.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds 12 small test cases: 01-04 test for the sysctl
net.sctp.l3mdev_accept. 05-10 test for only binding to a right
l3mdev device, the connection can be created. 11-12 test for
two socks binding to different l3mdev devices at the same time,
each of them can process the packets from the corresponding
peer. The tests run for both IPv4 and IPv6 SCTP.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The estimated time was supposing the rate was expressed in mibit
(bit * 1024^2) but it is in mbit (bit * 1000^2).
This makes the threshold higher but in a more realistic way to avoid
false positives reported by CI instances.
Before this patch, the thresholds were at 7561/4005ms and now they are
at 7906/4178ms.
While at it, also fix a typo in the linked comment, spotted by Mat.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/310
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not running it from a new netns causes issues if some MPTCP settings are
modified, e.g. if MPTCP is disabled from the sysctl knob, if multiple
addresses are available and added to the MPTCP path-manager, etc.
In these cases, the created connection will not behave as expected, e.g.
unable to create an MPTCP socket, more than one subflow is seen, etc.
A new "sandbox" net namespace is now created and used to run
mptcp_sockopt from this controlled environment.
Fixes: ce9979129a ("selftests: mptcp: add mptcp getsockopt test cases")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On slow or busy VM, some test-cases still fail because the
data transfer completes before the endpoint manipulation
actually took effect.
Address the issue by artificially increasing the runtime for
the relevant test-cases.
Fixes: ef360019db ("selftests: mptcp: signal addresses testcases")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/309
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Preparing the metadata for bpf_list_head involves a complicated parsing
step and type resolution for the contained value. Ensure that corner
cases are tested against and invalid specifications in source are duly
rejected. Also include tests for incorrect ownership relationships in
the BTF.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-24-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Include various tests covering the success and failure cases. Also, run
the success cases at runtime to verify correctness of linked list
manipulation routines, in addition to ensuring successful verification.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-23-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
First, ensure that whenever a bpf_spin_lock is present in an allocation,
the reg->id is preserved. This won't be true for global variables
however, since they have a single map value per map, hence the verifier
harcodes it to 0 (so that multiple pseudo ldimm64 insns can yield the
same lock object per map at a given offset).
Next, add test cases for all possible combinations (kptr, global, map
value, inner map value). Since we lifted restriction on locking in inner
maps, also add test cases for them. Currently, each lookup into an inner
map gets a fresh reg->id, so even if the reg->map_ptr is same, they will
be treated as separate allocations and the incorrect unlock pairing will
be rejected.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-22-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make updates in preparation for adding more test cases to this selftest:
- Convert from CHECK_ to ASSERT macros.
- Use BPF skeleton
- Fix typo sping -> spin
- Rename spinlock.c -> spin_lock.c
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-21-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add user facing __contains macro which provides a convenient wrapper
over the verbose kernel specific BTF declaration tag required to
annotate BPF list head structs in user types.
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-20-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a linked list API for use in BPF programs, where it expects
protection from the bpf_spin_lock in the same allocation as the
bpf_list_head. For now, only one bpf_spin_lock can be present hence that
is assumed to be the one protecting the bpf_list_head.
The following functions are added to kick things off:
// Add node to beginning of list
void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node);
// Add node to end of list
void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node);
// Remove node at beginning of list and return it
struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head);
// Remove node at end of list and return it
struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head);
The lock protecting the bpf_list_head needs to be taken for all
operations. The verifier ensures that the lock that needs to be taken is
always held, and only the correct lock is taken for these operations.
These checks are made statically by relying on the reg->id preserved for
registers pointing into regions having both bpf_spin_lock and the
objects protected by it. The comment over check_reg_allocation_locked in
this change describes the logic in detail.
Note that bpf_list_push_front and bpf_list_push_back are meant to
consume the object containing the node in the 1st argument, however that
specific mechanism is intended to not release the ref_obj_id directly
until the bpf_spin_unlock is called. In this commit, nothing is done,
but the next commit will be introducing logic to handle this case, so it
has been left as is for now.
bpf_list_pop_front and bpf_list_pop_back delete the first or last item
of the list respectively, and return pointer to the element at the
list_node offset. The user can then use container_of style macro to get
the actual entry type. The verifier however statically knows the actual
type, so the safety properties are still preserved.
With these additions, programs can now manage their own linked lists and
store their objects in them.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-17-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce bpf_obj_drop, which is the kfunc used to free allocated
objects (allocated using bpf_obj_new). Pairing with bpf_obj_new, it
implicitly destructs the fields part of object automatically without
user intervention.
Just like the previous patch, btf_struct_meta that is needed to free up
the special fields is passed as a hidden argument to the kfunc.
For the user, a convenience macro hides over the kernel side kfunc which
is named bpf_obj_drop_impl.
Continuing the previous example:
void prog(void) {
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return;
bpf_obj_drop(f);
}
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-15-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce type safe memory allocator bpf_obj_new for BPF programs. The
kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments
to kfuncs still requires having them in prototype, unlike BPF helpers
which always take 5 arguments and have them checked using bpf_func_proto
in verifier, ignoring unset argument types.
Introduce __ign suffix to ignore a specific kfunc argument during type
checks, then use this to introduce support for passing type metadata to
the bpf_obj_new_impl kfunc.
The user passes BTF ID of the type it wants to allocates in program BTF,
the verifier then rewrites the first argument as the size of this type,
after performing some sanity checks (to ensure it exists and it is a
struct type).
The second argument is also fixed up and passed by the verifier. This is
the btf_struct_meta for the type being allocated. It would be needed
mostly for the offset array which is required for zero initializing
special fields while leaving the rest of storage in unitialized state.
It would also be needed in the next patch to perform proper destruction
of the object's special fields.
Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free,
using the any context BPF memory allocator introduced recently. To this
end, a global instance of the BPF memory allocator is initialized on
boot to be used for this purpose. This 'bpf_global_ma' serves all
allocations for bpf_obj_new. In the future, bpf_obj_new variants will
allow specifying a custom allocator.
Note that now that bpf_obj_new can be used to allocate objects that can
be linked to BPF linked list (when future linked list helpers are
available), we need to also free the elements using bpf_mem_free.
However, since the draining of elements is done outside the
bpf_spin_lock, we need to do migrate_disable around the call since
bpf_list_head_free can be called from map free path where migration is
enabled. Otherwise, when called from BPF programs migration is already
disabled.
A convenience macro is included in the bpf_experimental.h header to hide
over the ugly details of the implementation, leading to user code
looking similar to a language level extension which allocates and
constructs fields of a user type.
struct bar {
struct bpf_list_node node;
};
struct foo {
struct bpf_spin_lock lock;
struct bpf_list_head head __contains(bar, node);
};
void prog(void) {
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return;
...
}
A key piece of this story is still missing, i.e. the free function,
which will come in the next patch.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-14-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As we continue to add more features, argument types, kfunc flags, and
different extensions to kfuncs, the code to verify the correctness of
the kfunc prototype wrt the passed in registers has become ad-hoc and
ugly to read. To make life easier, and make a very clear split between
different stages of argument processing, move all the code into
verifier.c and refactor into easier to read helpers and functions.
This also makes sharing code within the verifier easier with kfunc
argument processing. This will be more and more useful in later patches
as we are now moving to implement very core BPF helpers as kfuncs, to
keep them experimental before baking into UAPI.
Remove all kfunc related bits now from btf_check_func_arg_match, as
users have been converted away to refactored kfunc argument handling.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-12-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It's very unusual to have both a command line option and a compile time
option, and apparently that's confusing to people. Also, basically
everybody enables the compile time option now, which means people who
want to disable this wind up having to use the command line option to
ensure that anyway. So just reduce the number of moving pieces and nix
the compile time option in favor of the more versatile command line
option.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
When cross-compiling [1], the get_sys_includes make macro should use
the target system include path, and not the build hosts system include
path.
Make clang honor the CROSS_COMPILE triple.
[1] e.g. "ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- make"
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/bpf/20221115182051.582962-2-bjorn@kernel.org
When cross-compiling selftests/bpf, the resolve_btfids binary end up
in a different directory, than the regular resolve_btfids
builds. Populate RESOLVE_BTFIDS for sub-make, so it can find the
binary.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20221115182051.582962-1-bjorn@kernel.org
Attestation is used to verify the trustworthiness of a TDX guest.
During the guest bring-up, the Intel TDX module measures and records
the initial contents and configuration of the guest, and at runtime,
guest software uses runtime measurement registers (RMTRs) to measure
and record details related to kernel image, command line params, ACPI
tables, initrd, etc. At guest runtime, the attestation process is used
to attest to these measurements.
The first step in the TDX attestation process is to get the TDREPORT
data. It is a fixed size data structure generated by the TDX module
which includes the above mentioned measurements data, a MAC ID to
protect the integrity of the TDREPORT, and a 64-Byte of user specified
data passed during TDREPORT request which can uniquely identify the
TDREPORT.
Intel's TDX guest driver exposes TDX_CMD_GET_REPORT0 IOCTL interface to
enable guest userspace to get the TDREPORT subtype 0.
Add a kernel self test module to test this ABI and verify the validity
of the generated TDREPORT.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Acked-by: Wander Lairson Costa <wander@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/all/20221116223820.819090-4-sathyanarayanan.kuppuswamy%40linux.intel.com
Current release - regressions:
- tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()
Previous releases - regressions:
- bridge: fix memory leaks when changing VLAN protocol
- dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims
- dsa: don't leak tagger-owned storage on switch driver unbind
- eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6 is removed
- eth: stmmac: ensure tx function is not running in stmmac_xdp_release()
- eth: hns3: fix return value check bug of rx copybreak
Previous releases - always broken:
- kcm: close race conditions on sk_receive_queue
- bpf: fix alignment problem in bpf_prog_test_run_skb()
- bpf: fix writing offset in case of fault in strncpy_from_kernel_nofault
- eth: macvlan: use built-in RCU list checking
- eth: marvell: add sleep time after enabling the loopback bit
- eth: octeon_ep: fix potential memory leak in octep_device_setup()
Misc:
- tcp: configurable source port perturb table size
- bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmN2FlMSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkWAwQAJcV7XEB7bEssgabFkEmC4uvS/sFlyHC
uSwFRn5ojaB2c56T1CnNYmitg9Wr4arC6Vca28iai6BgqB6t4qLRI/WWTsZiEPhi
mt/pjNN2u9JMyaafHFHYfXnbSDWRF7kPMpNw4l3uL0vkGyjSI7LGAOP4Qh8C1h/d
tNVSDZnj4Laj/3JtDf7Rk6ydCqPYnNdWxFfoZ/SQkjYZKD3Ze9tml7WJykAzCTLp
yUiPC6TvHOnWIZYbB04sVVOQD4V+95TSOgEhB6wzs/CXB7iBEY+N+oCedjP9Xrfw
n3ea4anBoTleDnJXJI57LhdJBkyoXncfbpbYLwXljyIgosr7XVTALvOG8XUhg/DW
FzN5DWQ54jzTsx2eXFJzjQQcDIgyxazk9EdoHdqF8byCasP+fofq1JvzyqtvNSyh
h8Ps6jdMZrWpXuFDVApXUhP32A/+9q+dFSYHJO681m6mf4CIaUXdm4aB1dkxDAvg
PSlk797U94RQCzJgqxhrgsq1PGQPBb+qadZrAiD3aQi26g0NWCTg7uFpCeCEK2ZF
fLwc2XxrwLQm1q7xQVoEg4UxPIIf0mUesvOD9sLDYop0rFIw8x0v7jdYM4kyhN3o
6FWAXKxBe3LJ9jTTsVTbZbfHYpTnS8Q2KSclBN+/dZNHwwsUPHjy17Z2Ct3o3Jlm
lNbiiD30BgsD
=vVJk
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Current release - regressions:
- tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()
Previous releases - regressions:
- bridge: fix memory leaks when changing VLAN protocol
- dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims
- dsa: don't leak tagger-owned storage on switch driver unbind
- eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6
is removed
- eth: stmmac: ensure tx function is not running in
stmmac_xdp_release()
- eth: hns3: fix return value check bug of rx copybreak
Previous releases - always broken:
- kcm: close race conditions on sk_receive_queue
- bpf: fix alignment problem in bpf_prog_test_run_skb()
- bpf: fix writing offset in case of fault in
strncpy_from_kernel_nofault
- eth: macvlan: use built-in RCU list checking
- eth: marvell: add sleep time after enabling the loopback bit
- eth: octeon_ep: fix potential memory leak in octep_device_setup()
Misc:
- tcp: configurable source port perturb table size
- bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)"
* tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
net: use struct_group to copy ip/ipv6 header addresses
net: usb: smsc95xx: fix external PHY reset
net: usb: qmi_wwan: add Telit 0x103a composition
netdevsim: Fix memory leak of nsim_dev->fa_cookie
tcp: configurable source port perturb table size
l2tp: Serialize access to sk_user_data with sk_callback_lock
net: thunderbolt: Fix error handling in tbnet_init()
net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start()
net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()
net: dsa: don't leak tagger-owned storage on switch driver unbind
net/x25: Fix skb leak in x25_lapb_receive_frame()
net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()
bridge: switchdev: Fix memory leaks when changing VLAN protocol
net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process
net: hns3: fix return value check bug of rx copybreak
net: hns3: fix incorrect hw rss hash type of rx packet
net: phy: marvell: add sleep time after enabling the loopback bit
net: ena: Fix error handling in ena_init()
kcm: close race conditions on sk_receive_queue
net: ionic: Fix error handling in ionic_init_module()
...
This fixes three issues in nested SVM:
1) in the shutdown_interception() vmexit handler we call kvm_vcpu_reset().
However, if running nested and L1 doesn't intercept shutdown, the function
resets vcpu->arch.hflags without properly leaving the nested state.
This leaves the vCPU in inconsistent state and later triggers a kernel
panic in SVM code. The same bug can likely be triggered by sending INIT
via local apic to a vCPU which runs a nested guest.
On VMX we are lucky that the issue can't happen because VMX always
intercepts triple faults, thus triple fault in L2 will always be
redirected to L1. Plus, handle_triple_fault() doesn't reset the vCPU.
INIT IPI can't happen on VMX either because INIT events are masked while
in VMX mode.
Secondarily, KVM doesn't honour SHUTDOWN intercept bit of L1 on SVM.
A normal hypervisor should always intercept SHUTDOWN, a unit test on
the other hand might want to not do so.
Finally, the guest can trigger a kernel non rate limited printk on SVM
from the guest, which is fixed as well.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a SVM implementation to triple_fault_test to test that
emulated/injected shutdown works.
Since instead of the VMX, the SVM allows the hypervisor to avoid
intercepting shutdown in guest, don't intercept shutdown to test that
KVM suports this correctly.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-9-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add test that tests that on SVM if L1 doesn't intercept SHUTDOWN,
then L2 crashes L1 and doesn't crash L2
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct idt_entry will be used for a test which will break IDT on purpose.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kmemleak reports this issue:
unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue occurs in the following scenarios:
unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak
The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed
Fixes: dca85aac88 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Now that a VM isn't needed to check for nEPT support, assert that KVM
supports nEPT in prepare_eptp() instead of skipping the test, and push
the TEST_REQUIRE() check out to individual tests. The require+assert are
somewhat redundant and will incur some amount of ongoing maintenance
burden, but placing the "require" logic in the test makes it easier to
find/understand a test's requirements and in this case, provides a very
strong hint that the test cares about nEPT.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com
[sean: rebase on merged code, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
When checking for nEPT support in KVM, use kvm_get_feature_msr() instead
of vcpu_get_msr() to retrieve KVM's default TRUE_PROCBASED_CTLS and
PROCBASED_CTLS2 MSR values, i.e. don't require a VM+vCPU to query nEPT
support.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com
[sean: rebase on merged code, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Drop kvm_get_supported_cpuid_entry() and its inner helper now that all
known usage can use X86_FEATURE_*, X86_PROPERTY_*, X86_PMU_FEATURE_*, or
the dedicated Family/Model helpers. Providing "raw" access to CPUID
leafs is undesirable as it encourages open coding CPUID checks, which is
often error prone and not self-documenting.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-13-seanjc@google.com
Add KVM variants of the x86 Family and Model helpers, and use them in the
PMU event filter test. Open code the retrieval of KVM's supported CPUID
entry 0x1.0 in anticipation of dropping kvm_get_supported_cpuid_entry().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-12-seanjc@google.com
Add dedicated helpers for getting x86's Family and Model, which are the
last holdouts that "need" raw access to CPUID information. FMS info is
a mess and requires not only splicing together multiple values, but
requires doing so conditional in the Family case.
Provide wrappers to reduce the odds of copy+paste errors, but mostly to
allow for the eventual removal of kvm_get_supported_cpuid_entry().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-11-seanjc@google.com
Add an X86_PMU_FEATURE_* framework to simplify probing architectural
events on Intel PMUs, which require checking the length of a bit vector
and the _absence_ of a "feature" bit. Add helpers for both KVM and
"this CPU", and use the newfangled magic (along with X86_PROPERTY_*)
to clean up pmu_event_filter_test.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-10-seanjc@google.com
Add X86_PROPERTY_PMU_VERSION and use it in vmx_pmu_caps_test to replace
open coded versions of the same functionality.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-9-seanjc@google.com
Add and use x86 "properties" for the myriad AMX CPUID values that are
validated by the AMX test. Drop most of the test's single-usage
helpers so that the asserts more precisely capture what check failed.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-8-seanjc@google.com
Extent X86_PROPERTY_* support to KVM, i.e. add kvm_cpu_property() and
kvm_cpu_has_p(), and use the new helpers in kvm_get_cpu_address_width().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-7-seanjc@google.com
Use X86_PROPERTY_MAX_KVM_LEAF to replace the equivalent open coded check
on KVM's maximum paravirt CPUID leaf.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-5-seanjc@google.com
Introduce X86_PROPERTY_* to allow retrieving values/properties from CPUID
leafs, e.g. MAXPHYADDR from CPUID.0x80000008. Use the same core code as
X86_FEATURE_*, the primary difference is that properties are multi-bit
values, whereas features enumerate a single bit.
Add this_cpu_has_p() to allow querying whether or not a property exists
based on the maximum leaf associated with the property, e.g. MAXPHYADDR
doesn't exist if the max leaf for 0x8000_xxxx is less than 0x8000_0008.
Use the new property infrastructure in vm_compute_max_gfn() to prove
that the code works as intended. Future patches will convert additional
selftests code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-4-seanjc@google.com
Refactor the X86_FEATURE_* framework to prepare for extending the core
logic to support "properties". The "feature" framework allows querying a
single CPUID bit to detect the presence of a feature; the "property"
framework will extend the idea to allow querying a value, i.e. to get a
value that is a set of contiguous bits in a CPUID leaf.
Opportunistically add static asserts to ensure features are fully defined
at compile time, and to try and catch mistakes in the definition of
features.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-3-seanjc@google.com
Add X86_FEATURE_PAE and use it to guesstimate the MAXPHYADDR when the
MAXPHYADDR CPUID entry isn't supported.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-2-seanjc@google.com
Add a selftest to exercise the KVM_CAP_EXIT_ON_EMULATION_FAILURE
capability.
This capability is also exercised through
smaller_maxphyaddr_emulation_test, but that test requires
allow_smaller_maxphyaddr=Y, which is off by default on Intel when ept=Y
and unconditionally disabled on AMD when npt=Y. This new test ensures
that KVM_CAP_EXIT_ON_EMULATION_FAILURE is exercised independent of
allow_smaller_maxphyaddr.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-11-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Change smaller_maxphyaddr_emulation_test to expect a #PF(RSVD), rather
than an emulation failure, when TDP is disabled. KVM only needs to
emulate instructions to emulate a smaller guest.MAXPHYADDR when TDP is
enabled.
Fixes: 39bbcc3a4e ("selftests: kvm: Allows userspace to handle emulation errors.")
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-10-dmatlack@google.com
[sean: massage comment to talk about having to emulate due to MAXPHYADDR]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Provide the error code on a fault in KVM_ASM_SAFE(), e.g. to allow tests
to assert that #PF generates the correct error code without needing to
manually install a #PF handler. Use r10 as the scratch register for the
error code, as it's already clobbered by the asm blob (loaded with the
RIP of the to-be-executed instruction). Deliberately load the output
"error_code" even in the non-faulting path so that error_code is always
initialized with deterministic data (the aforementioned RIP), i.e to
ensure a selftest won't end up with uninitialized consumption regardless
of how KVM_ASM_SAFE() is used.
Don't clear r10 in the non-faulting case and instead load error code with
the RIP (see above). The error code is valid if and only if an exception
occurs, and '0' isn't necessarily a better "invalid" value, e.g. '0'
could result in false passes for a buggy test.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-9-dmatlack@google.com
Clear R9 in the non-faulting path of KVM_ASM_SAFE() and fall through to
to a common load of "vector" to effectively load "vector" with '0' to
reduce the code footprint of the asm blob, to reduce the runtime overhead
of the non-faulting path (when "vector" is stored in a register), and so
that additional output constraints that are valid if and only if a fault
occur are loaded even in the non-faulting case.
A future patch will add a 64-bit output for the error code, and if its
output is not explicitly loaded with _something_, the user of the asm
blob can end up technically consuming uninitialized data. Using a
common path to load the output constraints will allow using an existing
scratch register, e.g. r10, to hold the error code in the faulting path,
while also guaranteeing the error code is initialized with deterministic
data in the non-faulting patch (r10 is loaded with the RIP of
to-be-executed instruction).
Consuming the error code when a fault doesn't occur would obviously be a
test bug, but there's no guarantee the compiler will detect uninitialized
consumption. And conversely, it's theoretically possible that the
compiler might throw a false positive on uninitialized data, e.g. if the
compiler can't determine that the non-faulting path won't touch the error
code.
Alternatively, the error code could be explicitly loaded in the
non-faulting path, but loading a 64-bit memory|register output operand
with an explicitl value requires a sign-extended "MOV imm32, r/m64",
which isn't exactly straightforward and has a largish code footprint.
And loading the error code with what is effectively garbage (from a
scratch register) avoids having to choose an arbitrary value for the
non-faulting case.
Opportunistically remove a rogue asterisk in the block comment.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-8-dmatlack@google.com
Copy KVM's macros for page fault error masks into processor.h so they
can be used in selftests.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-7-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Move the flds instruction emulation failure handling code to a header
so it can be re-used in an upcoming test.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-5-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Delete a bunch of code related to ucall handling from
smaller_maxphyaddr_emulation_test. The only thing
smaller_maxphyaddr_emulation_test needs to check is that the vCPU exits
with UCALL_DONE after the second vcpu_run().
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-4-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Hard-code the flds instruction and assert the exact instruction bytes
are present in run->emulation_failure. The test already requires the
instruction bytes to be present because that's the only way the test
will advance the RIP past the flds and get to GUEST_DONE().
Note that KVM does not necessarily return exactly 2 bytes in
run->emulation_failure since it may not know the exact instruction
length in all cases. So just assert that
run->emulation_failure.insn_size is at least 2.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-3-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename emulator_error_test to smaller_maxphyaddr_emulation_test and
update the comment at the top of the file to document that this is
explicitly a test to validate that KVM emulates instructions in response
to an EPT violation when emulating a smaller MAXPHYADDR.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-2-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In xapic_state_test's test_icr(), explicitly skip iterations that would
match vcpu->id instead of assuming vcpu->id is '0', so that IPIs are
are correctly sent to non-existent vCPUs.
Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvm/YyoZr9rXSSMEtdh5@google.com
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Link: https://lore.kernel.org/r/20221017175819.12672-1-gautammenghani201@gmail.com
[sean: massage shortlog and changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add arch specific API kvm_arch_vm_post_create to perform any required setup
after VM creation.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-4-vannapurve@google.com
[sean: place x86's implementation by vm_arch_vcpu_add()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Introduce arch specific API: kvm_selftest_arch_init to allow each arch to
handle initialization before running any selftest logic.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-3-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Consolidate common startup logic in one place by implementing a single
setup function with __attribute((constructor)) for all selftests within
kvm_util.c.
This allows moving logic like:
/* Tell stdout not to buffer its content */
setbuf(stdout, NULL);
to a single file for all selftests.
This will also allow any required setup at entry in future to be done in
common main function.
Link: https://lore.kernel.org/lkml/Ywa9T+jKUpaHLu%2Fl@google.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-2-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Play nice with huge pages when getting PTEs and translating GVAs to GPAs,
there's no reason to disallow using huge pages in selftests. Use
PG_LEVEL_NONE to indicate that the caller doesn't care about the mapping
level and just wants to get the pte+level.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-8-seanjc@google.com
Verify the parent PTE is PRESENT when getting a child via virt_get_pte()
so that the helper can be used for getting PTEs/GPAs without losing
sanity checks that the walker isn't wandering into the weeds.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-5-seanjc@google.com
Remove the pointless shift from GPA=>GFN and immediately back to
GFN=>GPA when creating guest page tables. Ignore the other walkers
that have a similar pattern for the moment, they will be converted
to use virt_get_pte() in the near future.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-4-seanjc@google.com
Drop the reserved bit checks from the helper to retrieve a PTE, there's
very little value in sanity checking the constructed page tables as any
will quickly be noticed in the form of an unexpected #PF. The checks
also place unnecessary restrictions on the usage of the helpers, e.g. if
a test _wanted_ to set reserved bits for whatever reason.
Removing the NX check in particular allows for the removal of the @vcpu
param, which will in turn allow the helper to be reused nearly verbatim
for addr_gva2gpa().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-3-seanjc@google.com
Drop vm_{g,s}et_page_table_entry() and instead expose the "inner"
helper (was _vm_get_page_table_entry()) that returns a _pointer_ to the
PTE, i.e. let tests directly modify PTEs instead of bouncing through
helpers that just make life difficult.
Opportunsitically use BIT_ULL() in emulator_error_test, and use the
MAXPHYADDR define to set the "rogue" GPA bit instead of open coding the
same value.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-2-seanjc@google.com
There is a spelling mistake in an assert message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220928213458.64089-1-colin.i.king@gmail.com
[sean: fix an ironic typo in the changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
To play nice with guests whose stack memory is encrypted, e.g. AMD SEV,
introduce a new "ucall pool" implementation that passes the ucall struct
via dedicated memory (which can be mapped shared, a.k.a. as plain text).
Because not all architectures have access to the vCPU index in the guest,
use a bitmap with atomic accesses to track which entries in the pool are
free/used. A list+lock could also work in theory, but synchronizing the
individual pointers to the guest would be a mess.
Note, there's no need to rewalk the bitmap to ensure success. If all
vCPUs are simply allocating, success is guaranteed because there are
enough entries for all vCPUs. If one or more vCPUs are freeing and then
reallocating, success is guaranteed because vCPUs _always_ walk the
bitmap from 0=>N; if vCPU frees an entry and then wins a race to
re-allocate, then either it will consume the entry it just freed (bit is
the first free bit), or the losing vCPU is guaranteed to see the freed
bit (winner consumes an earlier bit, which the loser hasn't yet visited).
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Peter Gonda <pgonda@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-8-seanjc@google.com
Drop ucall_uninit() and ucall_arch_uninit() now that ARM doesn't modify
the host's copy of ucall_exit_mmio_addr, i.e. now that there's no need to
reset the pointer before potentially creating a new VM. The few calls to
ucall_uninit() are all immediately followed by kvm_vm_free(), and that is
likely always going to hold true, i.e. it's extremely unlikely a test
will want to effectively disable ucall in the middle of a test.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-7-seanjc@google.com
Fix a mostly-theoretical bug where ARM's ucall MMIO setup could result in
different VMs stomping on each other by cloberring the global pointer.
Fix the most obvious issue by saving the MMIO gpa into the VM.
A more subtle bug is that creating VMs in parallel (on multiple tasks)
could result in a VM using the wrong address. Synchronizing a global to
a guest effectively snapshots the value on a per-VM basis, i.e. the
"global" is already prepped to work with multiple VMs, but setting the
global in the host is not thread-safe. To fix that bug, add
write_guest_global() to allow stuffing a VM's copy of a "global" without
modifying the host value.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-6-seanjc@google.com
Do init_ucall() automatically during VM creation to kill two (three?)
birds with one stone.
First, initializing ucall immediately after VM creations allows forcing
aarch64's MMIO ucall address to immediately follow memslot0. This is
still somewhat fragile as tests could clobber the MMIO address with a
new memslot, but it's safe-ish since tests have to be conversative when
accounting for memslot0. And this can be hardened in the future by
creating a read-only memslot for the MMIO page (KVM ARM exits with MMIO
if the guest writes to a read-only memslot). Add a TODO to document that
selftests can and should use a memslot for the ucall MMIO (doing so
requires yet more rework because tests assumes thay can use all memslots
except memslot0).
Second, initializing ucall for all VMs prepares for making ucall
initialization meaningful on all architectures. aarch64 is currently the
only arch that needs to do any setup, but that will change in the future
by switching to a pool-based implementation (instead of the current
stack-based approach).
Lastly, defining the ucall MMIO address from common code will simplify
switching all architectures (except s390) to a common MMIO-based ucall
implementation (if there's ever sufficient motivation to do so).
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-4-seanjc@google.com
Consolidate the actual copying of a ucall struct from guest=>host into
the common get_ucall(). Return a host virtual address instead of a guest
virtual address even though the addr_gva2hva() part could be moved to
get_ucall() too. Conceptually, get_ucall() is invoked from the host and
should return a host virtual address (and returning NULL for "nothing to
see here" is far superior to returning 0).
Use pointer shenanigans instead of an unnecessary bounce buffer when the
caller of get_ucall() provides a valid pointer.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-3-seanjc@google.com
Make ucall() a common helper that populates struct ucall, and only calls
into arch code to make the actually call out to userspace.
Rename all arch-specific helpers to make it clear they're arch-specific,
and to avoid collisions with common helpers (one more on its way...)
Add WRITE_ONCE() to stores in ucall() code (as already done to aarch64
code in commit 9e2f6498ef ("selftests: KVM: Handle compiler
optimizations in ucall")) to prevent clang optimizations breaking ucalls.
Cc: Colton Lewis <coltonlewis@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-2-seanjc@google.com
Automatically disable single-step when the guest reaches the end of the
verified section instead of using an explicit ucall() to ask userspace to
disable single-step. An upcoming change to implement a pool-based scheme
for ucall() will add an atomic operation (bit test and set) in the guest
ucall code, and if the compiler generate "old school" atomics, e.g.
40e57c: c85f7c20 ldxr x0, [x1]
40e580: aa100011 orr x17, x0, x16
40e584: c80ffc31 stlxr w15, x17, [x1]
40e588: 35ffffaf cbnz w15, 40e57c <__aarch64_ldset8_sync+0x1c>
the guest will hang as the local exclusive monitor is reset by eret,
i.e. the stlxr will always fail due to the debug exception taken to EL2.
Link: https://lore.kernel.org/all/20221006003409.649993-8-seanjc@google.com
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221117002350.2178351-3-seanjc@google.com
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Disable single-step by setting debug.control to KVM_GUESTDBG_ENABLE,
not to SINGLE_STEP_DISABLE. The latter is an arbitrary test enum that
just happens to have the same value as KVM_GUESTDBG_ENABLE, and so
effectively disables single-step debug.
No functional change intended.
Cc: Reiji Watanabe <reijiw@google.com>
Fixes: b18e4d4aeb ("KVM: arm64: selftests: Add a test case for KVM_GUESTDBG_SINGLESTEP")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221117002350.2178351-2-seanjc@google.com
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
removing the PKRU bit from XSTATE_BV.
Signed-off-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20221115230932.7126-7-khuey%40kylehuey.com
Replace the perf_test_ prefix on symbol names with memstress_ to match
the new file name.
"memstress" better describes the functionality proveded by this library,
which is to provide functionality for creating and running a VM that
stresses VM memory by reading and writing to guest memory on all vCPUs
in parallel.
"memstress" also contains the same number of chracters as "perf_test",
making it a drop-in replacement in symbols, e.g. function names, without
impacting line lengths. Also the lack of underscore between "mem" and
"stress" makes it clear "memstress" is a noun.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221012165729.3505266-4-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename the local variables "pta" (which is short for perf_test_args) for
args. "pta" is not an obvious acronym and using "args" mirrors
"vcpu_args".
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221012165729.3505266-3-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename the perf_test_util.[ch] files to memstress.[ch]. Symbols are
renamed in the following commit to reduce the amount of churn here in
hopes of playiing nice with git's file rename detection.
The name "memstress" was chosen to better describe the functionality
proveded by this library, which is to create and run a VM that
reads/writes to guest memory on all vCPUs in parallel.
"memstress" also contains the same number of chracters as "perf_test",
making it a drop-in replacement in symbols, e.g. function names, without
impacting line lengths. Also the lack of underscore between "mem" and
"stress" makes it clear "memstress" is a noun.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221012165729.3505266-2-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Create the ability to randomize page access order with the -a
argument. This includes the possibility that the same pages may be hit
multiple times during an iteration or not at all.
Population has random access as false to ensure all pages will be
touched by population and avoid page faults in late dirty memory that
would pollute the test results.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-5-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Randomize which pages are written vs read using the random number
generator.
Change the variable wr_fract and associated function calls to
write_percent that now operates as a percentage from 0 to 100 where X
means each page has an X% chance of being written. Change the -f
argument to -w to reflect the new variable semantics. Keep the same
default of 100% writes.
Population always uses 100% writes to ensure all memory is actually
populated and not just mapped to the zero page. The prevents expensive
copy-on-write faults from occurring during the dirty memory iterations
below, which would pollute the performance results.
Each vCPU calculates its own random seed by adding its index to the
seed provided.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-4-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Create a -r argument to specify a random seed. If no argument is
provided, the seed defaults to 1. The random seed is set with
perf_test_set_random_seed() and must be set before guest_code runs to
apply.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-3-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Implement random number generator for guest code to randomize parts
of the test, making it less predictable and a more accurate reflection
of reality.
The random number generator chosen is the Park-Miller Linear
Congruential Generator, a fancy name for a basic and well-understood
random number generator entirely sufficient for this purpose.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-2-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a command line option, -c, to pin vCPUs to physical CPUs (pCPUs),
i.e. to force vCPUs to run on specific pCPUs.
Requirement to implement this feature came in discussion on the patch
"Make page tables for eager page splitting NUMA aware"
https://lore.kernel.org/lkml/YuhPT2drgqL+osLl@google.com/
This feature is useful as it provides a way to analyze performance based
on the vCPUs and dirty log worker locations, like on the different NUMA
nodes or on the same NUMA nodes.
To keep things simple, implementation is intentionally very limited,
either all of the vCPUs will be pinned followed by an optional main
thread or nothing will be pinned.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-8-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Many KVM selftests take command line arguments which are supposed to be
positive (>0) or non-negative (>=0). Some tests do these validation and
some missed adding the check.
Add atoi_positive() and atoi_non_negative() to validate inputs in
selftests before proceeding to use those values.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-7-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Change test args memslot_modification_delay and nr_memslot_modifications
to delay and nr_iterations for simplicity.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-6-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace size_1gb defined in max_guest_memory_test.c with the SZ_1G,
SZ_2G and SZ_4G from linux/sizes.h header file.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-5-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
atoi() doesn't detect errors. There is no way to know that a 0 return
is correct conversion or due to an error.
Introduce atoi_paranoid() to detect errors and provide correct
conversion. Replace all atoi() calls with atoi_paranoid().
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: David Matlack <dmatlack@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-4-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
There are 13 command line options and they are not in any order. Put
them in alphabetical order to make it easy to add new options.
No functional change intended.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-3-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Passing -e option (Run VCPUs while dirty logging is being disabled) in
dirty_log_perf_test also unintentionally enables -g (Do not enable
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2). Add break between two switch case
logic.
Fixes: cfe12e64b0 ("KVM: selftests: Add an option to run vCPUs while disabling dirty logging")
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-2-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
This initial code does a simple sample transfer tests. By default,
all PCM devices are detected and tested with short and long
buffering parameters for 4 seconds. If the sample transfer timing
is not in a +-100ms boundary, the test fails. Only the interleaved
buffering scheme is supported in this version.
The configuration may be modified with the configuration files.
A specific hardware configuration is detected and activated
using the sysfs regex matching. This allows to use the DMI string
(/sys/class/dmi/id/* tree) or any other system parameters
exposed in sysfs for the matching for the CI automation.
The configuration file may also specify the PCM device list to detect
the missing PCM devices.
v1..v2:
- added missing alsa-local.h header file
Cc: Mark Brown <broonie@kernel.org>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@intel.com>
Cc: Liam Girdwood <liam.r.girdwood@intel.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Jimmy Cheng-Yi Chiang <cychiang@google.com>
Cc: Curtis Malainey <cujomalainey@google.com>
Cc: Brian Norris <briannorris@chromium.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221108115914.3751090-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Enable unprivileged bpf for selftests kernel by default.
This forces CI to run test_verifier tests in both privileged
and unprivileged modes.
The test_verifier.c:do_test uses sysctl kernel.unprivileged_bpf_disabled
to decide whether to run or to skip test cases in unprivileged mode.
The CONFIG_BPF_UNPRIV_DEFAULT_OFF controls the default value of the
kernel.unprivileged_bpf_disabled.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221116015456.2461135-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verify that nullness information is porpagated in the branches of
register to register JEQ and JNE operations.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221115224859.2452988-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There is no point in failing the tests when RTC is not present.
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Tested-by: Daniel Diaz <daniel.diaz@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Insert 3 programs to check that we are doing the correct thing:
'2', '1', '3' are inserted, but '1' is supposed to be executed first.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Simple report descriptor override in HID: replace part of the report
descriptor from a static definition in the bpf kernel program.
Note that this test should be run last because we disconnect/reconnect
the device, meaning that it changes the overall uhid device.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add tests for the newly implemented function.
We test here only the GET_REPORT part because the other calls are pure
HID protocol and won't infer the result of the test of the bpf hook.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Use a different report with a bigger size and ensures we are doing
things properly.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The tests are pretty basic:
- create a virtual uhid device that no userspace will like (to not mess
up the running system)
- attach a BPF prog to it
- open the matching hidraw node
- inject one event and check:
* that the BPF program can do something on the event stream
* can modify the event stream
- add another test where we attach/detach BPF programs to see if we get
errors
Note: the Makefile is extracted from selftests/bpf so we can rebuild
the libbpf and bpftool components from the current kernel tree without
relying on system installed components.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Currently, verifier uses MEM_ALLOC type tag to specially tag memory
returned from bpf_ringbuf_reserve helper. However, this is currently
only used for this purpose and there is an implicit assumption that it
only refers to ringbuf memory (e.g. the check for ARG_PTR_TO_ALLOC_MEM
in check_func_arg_reg_off).
Hence, rename MEM_ALLOC to MEM_RINGBUF to indicate this special
relationship and instead open the use of MEM_ALLOC for more generic
allocations made for user types.
Also, since ARG_PTR_TO_ALLOC_MEM_OR_NULL is unused, simply drop it.
Finally, update selftests using 'alloc_' verifier string to 'ringbuf_'.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_cpuset_prs.sh is failing with the following error:
test_cpuset_prs.sh: line 29: [[: 8
57%: syntax error in expression (error token is "57%")
This is happening because `lscpu | grep "^CPU(s)"` returns two lines in
some systems (such as Debian unstable):
# lscpu | grep "^CPU(s)"
CPU(s): 8
CPU(s) scaling MHz: 55%
This is a simple fix that discard the second line.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In preparation for cxl_acpi walking pci_root->bus->bridge, add that
association to the mock pci_root instances.
Note that the missing 3rd entry in mock_pci_root[] was not noticed until
now given that the test version of to_cxl_host_bridge()
(tools/testing/cxl/mock_acpi.c), obviated the need for that entry.
However, "cxl/acpi: Improve debug messages in cxl_acpi_probe()" [1]
needs pci_root->bus->bridge to be populated.
Link: https://lore.kernel.org/r/20221018132341.76259-6-rrichter@amd.com [1]
Cc: Robert Richter <rrichter@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL dports are added in a couple of code paths using
devm_cxl_add_dport(). Debug messages are individually generated, but are
incomplete and inconsistent. Change this by moving its generation to
devm_cxl_add_dport(). This unifies the messages and reduces code
duplication. Also, generate messages on failure. Use a
__devm_cxl_add_dport() wrapper to keep the readability of the error
exits.
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20221018132341.76259-5-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In some platform, the schedule event may came slowly, delay 100ms can't
cover it.
I was notice that on my board which running in low cpu_freq,and this
selftests allways gose fail.
So maybe we can check more times here to wait longer.
Fixes: 43bb45da82 ("selftests: ftrace: Add a selftest to test event enable/disable func trigger")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix following coccicheck warning:
tools/testing/selftests/arm64/mte/check_mmap_options.c:64:24-25:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:66:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:135:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:96:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:190:24-25:
WARNING: Use ARRAY_SIZE
Signed-off-by: KaiLong Wang <wangkailong@jari.cn>
Link: https://lore.kernel.org/r/777ce8ba.12e.184705d4211.Coremail.wangkailong@jari.cn
Signed-off-by: Will Deacon <will@kernel.org>
'time' is the local variable of run_test() function, while 'max_time' is
the local variable of do_transfer() function. So in do_transfer(),
$max_time should be used, not $time.
Please note that here $time == $max_time so the behaviour is not changed
but the right variable is used.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEET63h6RnJhTJHuKTjXOwUVIRcSScFAmNu2EkACgkQXOwUVIRc
SSebKhAA0ffmp5jJgEJpQYNABGLYIJcwKkBrGClDbMJLtwCjevGZJajT9fpbCLb1
eK6EIhdfR0NTO+0KtUVkZ8WMa81OmLEJYdTNtJfNE23ENMpssiAWhlhDF8AoXeKv
Bo3j719gn3Cw9PWXQoircH3wpj+5RMDnjxy4iYlA5yNrvzC7XVmssMF+WALvQnuK
CGrfR57hxdgmphmasRqeCzEoriwihwPsG3k6eQN8rf7ZytLhs90tMVgT9L3Cd2u9
DafA0Xl8mZdz2mHhThcJhQVq4MUymZj44ufuHDiOs1j6nhUlWToyQuvegPOqxKti
uLGtZul0ls+3UP0Lbrv1oEGU/MWMxyDz4IBc0EVs0k3ItQbmSKs6r9WuPFGd96Sb
GHk68qFVySeLGN0LfKe3rCHJ9ZoIOPYJg9qT8Rd5bOhetgGwSsxZTxUI39BxkFup
CEqwIDnts1TMU37GDjj+vssKW91k4jEzMZVtRfsL3J36aJs28k/Ez4AqLXg6WU6u
ADqFaejVPcXbN9rX90onIYxxiL28gZSeT+i8qOPELZtqTQmNWz+tC/ySVuWnD8Mn
Nbs7PZ1IWiNZpsKS8pZnpd6j4mlBeJnwXkPKiFy+xHGuwRSRdYl6G9e5CtlRely/
rwQ8DtaOpRYMrGhnmBEdAOCa9t/iqzrzHzjoigjJ7iAST4ToJ5s=
=Y+/e
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:
====================
bpf-next 2022-11-11
We've added 49 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 3592 insertions(+), 1371 deletions(-).
The main changes are:
1) Veristat tool improvements to support custom filtering, sorting, and replay
of results, from Andrii Nakryiko.
2) BPF verifier precision tracking fixes and improvements,
from Andrii Nakryiko.
3) Lots of new BPF documentation for various BPF maps, from Dave Tucker,
Donald Hunter, Maryam Tahhan, Bagas Sanjaya.
4) BTF dedup improvements and libbpf's hashmap interface clean ups, from
Eduard Zingerman.
5) Fix veth driver panic if XDP program is attached before veth_open, from
John Fastabend.
6) BPF verifier clean ups and fixes in preparation for follow up features,
from Kumar Kartikeya Dwivedi.
7) Add access to hwtstamp field from BPF sockops programs,
from Martin KaFai Lau.
8) Various fixes for BPF selftests and samples, from Artem Savkov,
Domenico Cerasuolo, Kang Minchul, Rong Tao, Yang Jihong.
9) Fix redirection to tunneling device logic, preventing skb->len == 0, from
Stanislav Fomichev.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
selftests/bpf: fix veristat's singular file-or-prog filter
selftests/bpf: Test skops->skb_hwtstamp
selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test
bpf: Add hwtstamp field for the sockops prog
selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch
bpf, docs: Document BPF_MAP_TYPE_ARRAY
docs/bpf: Document BPF map types QUEUE and STACK
docs/bpf: Document BPF ARRAY_OF_MAPS and HASH_OF_MAPS
docs/bpf: Document BPF_MAP_TYPE_CPUMAP map
docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map
libbpf: Hashmap.h update to fix build issues using LLVM14
bpf: veth driver panics when xdp prog attached before veth_open
selftests: Fix test group SKIPPED result
selftests/bpf: Tests for btf_dedup_resolve_fwds
libbpf: Resolve unambigous forward declarations
libbpf: Hashmap interface update to allow both long and void* keys/values
samples/bpf: Fix sockex3 error: Missing BPF prog type
selftests/bpf: Fix u32 variable compared with less than zero
Documentation: bpf: Escape underscore in BPF type name prefix
selftests/bpf: Use consistent build-id type for liburandom_read.so
...
====================
Link: https://lore.kernel.org/r/20221111233733.1088228-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrii Nakryiko says:
====================
bpf 2022-11-11
We've added 11 non-merge commits during the last 8 day(s) which contain
a total of 11 files changed, 83 insertions(+), 74 deletions(-).
The main changes are:
1) Fix strncpy_from_kernel_nofault() to prevent out-of-bounds writes,
from Alban Crequy.
2) Fix for bpf_prog_test_run_skb() to prevent wrong alignment,
from Baisong Zhong.
3) Switch BPF_DISPATCHER to static_call() instead of ftrace infra, with
a small build fix on top, from Peter Zijlstra and Nathan Chancellor.
4) Fix memory leak in BPF verifier in some error cases, from Wang Yufen.
5) 32-bit compilation error fixes for BPF selftests, from Pu Lehui and
Yang Jihong.
6) Ensure even distribution of per-CPU free list elements, from Xu Kuohai.
7) Fix copy_map_value() to track special zeroed out areas properly,
from Xu Kuohai.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix offset calculation error in __copy_map_value and zero_map_value
bpf: Initialize same number of free nodes for each pcpu_freelist
selftests: bpf: Add a test when bpf_probe_read_kernel_str() returns EFAULT
maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault()
selftests/bpf: Fix test_progs compilation failure in 32-bit arch
selftests/bpf: Fix casting error when cross-compiling test_verifier for 32-bit platforms
bpf: Fix memory leaks in __check_func_call
bpf: Add explicit cast to 'void *' for __BPF_DISPATCHER_UPDATE()
bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")
bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb()
====================
Link: https://lore.kernel.org/r/20221111231624.938829-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
introduced post-6.0 or which aren't considered serious enough to justify a
-stable backport.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY27xPAAKCRDdBJ7gKXxA
juFXAP4tSmfNDrT6khFhV0l4cS43bluErVNLh32RfXBqse8GYgEA5EPvZkOssLqY
86ejRXFgAArxYC4caiNURUQL+IASvQo=
=YVOx
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"22 hotfixes.
Eight are cc:stable and the remainder address issues which were
introduced post-6.0 or which aren't considered serious enough to
justify a -stable backport"
* tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
docs: kmsan: fix formatting of "Example report"
mm/damon/dbgfs: check if rm_contexts input is for a real context
maple_tree: don't set a new maximum on the node when not reusing nodes
maple_tree: fix depth tracking in maple_state
arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging
fs: fix leaked psi pressure state
nilfs2: fix use-after-free bug of ns_writer on remount
x86/traps: avoid KMSAN bugs originating from handle_bug()
kmsan: make sure PREEMPT_RT is off
Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARN
x86/uaccess: instrument copy_from_user_nmi()
kmsan: core: kmsan_in_runtime() should return true in NMI context
mm: hugetlb_vmemmap: include missing linux/moduleparam.h
mm/shmem: use page_mapping() to detect page cache for uffd continue
mm/memremap.c: map FS_DAX device memory as decrypted
Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"
nilfs2: fix deadlock in nilfs_count_free_blocks()
mm/mmap: fix memory leak in mmap_region()
hugetlbfs: don't delete error page from pagecache
maple_tree: reorganize testing to restore module testing
...
Fix the bug of filtering out filename too early, before we know the
program name, if using unified file-or-prog filter (i.e., -f
<any-glob>). Because we try to filter BPF object file early without
opening and parsing it, if any_glob (file-or-prog) filter is used we
have to accept any filename just to get program name, which might match
any_glob.
Fixes: 10b1b3f3e5 ("selftests/bpf: consolidate and improve file/prog filtering in veristat")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221111181242.2101192-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmNuf4IQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppvMD/9K2kFcAiD85QmRoIgwlIRM604KZ6aGXqk3
BjTavxfB+3DJcb82FHywBF5DC0sUtrBOTn7+DJpf13lb4L2DZY1lfLkRL7SKHSs5
o1z+1uLcBtZtGCq5M+yhpxbAzJ2kNWdRe+FutSA6wiz03ATXTwo2qE1MLaw1jxap
DowK08DUtLNaFNoEGdpW8iub9ql1OVWWZdOaxZmVJkdPWeWMD6Zaqwi/MeyNv0aY
KbVpYHa2AGxGY6+2krLpL09kqYlW++UvFsofM6RJrHTlLyBdYKvM2Z+Tv9I6w81s
ZerVl5srC2pVj1K0isO7A25GTVIVzI9im/GCzStNTasFtlzW85CwLEcDS8T679bY
I0P+Wl3ZoLJztChrcSufiAaOfJIichML7H3h/iEkSE51+9cBr42fqJO64dc+s/Bi
OGmaFowYgJgOClzpAJ2upd2aNu4sLiR2DUb3qdHDpcio9bfpIme1Do1yB94kRR//
yIFrs47PW+JumE90iKJPnDRHWrl3dVUK27MqkAWSBuvOkBjKxLBSVHIARs1lGWy1
25y4atEMaEYnvjC3ATwM0WX0LY+5jCVqOXyfMPAMmEZ7WDbER7FfGxnnmw/pwka7
D4CiSWn5H2Jp9Lq7HiblgYucXXNCPYgSx9JiXnY/KBpARaKUIXuTOq2PuJ/FW4UG
dsJap0W2rw==
=s8Z1
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Nothing major, just a few minor tweaks:
- Tweak for the TCP zero-copy io_uring self test (Pavel)
- Rather than use our internal cached value of number of CQ events
available, use what the user can see (Dylan)
- Fix a typo in a comment, added in this release (me)
- Don't allow wrapping while adding provided buffers (me)
- Fix a double poll race, and add a lockdep assertion for it too
(Pavel)"
* tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux:
io_uring/poll: lockdep annote io_poll_req_insert_locked
io_uring/poll: fix double poll req->flags races
io_uring: check for rollover of buffer ID when providing buffers
io_uring: calculate CQEs from the user visible value
io_uring: fix typo in io_uring.h comment
selftests/net: don't tests batched TCP io_uring zc
This patch tests reading the skops->skb_hwtstamp field.
A local test was also done such that the shinfo hwtstamp was temporary
set to a non zero value in the kernel bpf_skops_parse_hdr()
and the same value can be read by the skops test.
An adjustment is needed to the btf_dump selftest because
the changes in the 'struct bpf_sock_ops'.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-4-martin.lau@linux.dev
This patch fixes the incorrect ASSERT test in tcp_hdr_options during
the CHECK to ASSERT macro cleanup.
Fixes: 3082f8cd4b ("selftests/bpf: Convert tcp_hdr_options test to ASSERT_* macros")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-3-martin.lau@linux.dev
xdp_synproxy fails to be compiled in the 32-bit arch, log is as follows:
xdp_synproxy.c: In function 'parse_options':
xdp_synproxy.c:175:36: error: left shift count >= width of type [-Werror=shift-count-overflow]
175 | *tcpipopts = (mss6 << 32) | (ttl << 24) | (wscale << 16) | mss4;
| ^~
xdp_synproxy.c: In function 'syncookie_open_bpf_maps':
xdp_synproxy.c:289:28: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
289 | .map_ids = (__u64)map_ids,
| ^
Fix it.
Fixes: fb5cd0ce70 ("selftests/bpf: Add selftests for raw syncookie helpers")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221111030836.37632-1-yangjihong1@huawei.com
This commit tests previous fix of bpf_probe_read_kernel_str().
The BPF helper bpf_probe_read_kernel_str should return -EFAULT when
given a bad source pointer and the target buffer should only be modified
to make the string NULL terminated.
bpf_probe_read_kernel_str() was previously inserting a NULL before the
beginning of the dst buffer. This test should ensure that the
implementation stays correct for now on.
Without the fix, this test will fail as follows:
$ cd tools/testing/selftests/bpf
$ make
$ sudo ./test_progs --name=varlen
...
test_varlen:FAIL:check got 0 != exp 66
Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221110085614.111213-3-albancrequy@linux.microsoft.com
Changes v1 to v2:
- add ack tag
- fix my email
- rebase on bpf tree and tag for bpf tree
Current release - new code bugs:
- can: af_can: can_exit(): add missing dev_remove_pack() of canxl_packet
Previous releases - regressions:
- bpf, sockmap: fix the sk->sk_forward_alloc warning
- wifi: mac80211: fix general-protection-fault in
ieee80211_subif_start_xmit()
- can: af_can: fix NULL pointer dereference in can_rx_register()
- can: dev: fix skb drop check, avoid o-o-b access
- nfnetlink: fix potential dead lock in nfnetlink_rcv_msg()
Previous releases - always broken:
- bpf: fix wrong reg type conversion in release_reference()
- gso: fix panic on frag_list with mixed head alloc types
- wifi: brcmfmac: fix buffer overflow in brcmf_fweh_event_worker()
- wifi: mac80211: set TWT Information Frame Disabled bit as 1
- eth: macsec offload related fixes, make sure to clear the keys
from memory
- tun: fix memory leaks in the use of napi_get_frags
- tun: call napi_schedule_prep() to ensure we own a napi
- tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent
- ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg
to network
- tipc: fix a msg->req tlv length check
- sctp: clear out_curr if all frag chunks of current msg are pruned,
avoid list corruption
- mctp: fix an error handling path in mctp_init(), avoid leaks
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmNtnlEACgkQMUZtbf5S
IrvSfg//axNePPwFiAdbYUmSNmnnv2Zpyz1l9a2/WvKKMeyAH3d4zuQGyTz7VgoJ
at4k1fr14vm+3qBhlL0UFdd+h/wBewwuuWLiogIfhgqDO7KavZsbTJWQ59DSHH08
ujihvt7dF9ByVd3hOpUDjrYGd2rPghqXk8l/2gpPp/KIrbj1jSW0DdF7Y48/0RRw
PYzNYZ9tqICw1crBT52ZilNEebGaUuWpPLzV2owlhJpzqyRLcgd9GWN9DkKieiiw
wF0Wi7A8b/+cR/Wo93RAXtvEayN9vp/t6iyiI1opv3Yg6bhAMlzDUX/v79ccnAM6
wJ3b8bKyLgph5ZTNmbL8GwC2pwl/20hOgCVLb/Haykqrk4oO2+xD39fjKniFP/71
IBYuLCethi0zmiSyR8yO4iyrfJCnkJffoxtcG8O5x+FuCfMI1xQWx44bSc34KlqT
vDw/VmnIfXH9K3F+QdWtlZfLiM0F6vd7RNGIxX0cC2wQCwaubCo0LOs5vl2+jpR8
Xclo+OquQtX5XRqGGQDtA7kCM9jfuc/DWla1v10wy7ZagiKkdfrV7Zu7r431Dtwn
BWeKZAA38o9WNRb4FD5GGUN0dK5R5V25LmbpvYuerq5Ub3pGJgHMsdA15LqsqTnW
MGIokGFhu7ToAQEnaRkF96jh3c3yoMU/sWXsqh7x/G6Tir7JGUw=
=WPta
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wifi, can and bpf.
Current release - new code bugs:
- can: af_can: can_exit(): add missing dev_remove_pack() of
canxl_packet
Previous releases - regressions:
- bpf, sockmap: fix the sk->sk_forward_alloc warning
- wifi: mac80211: fix general-protection-fault in
ieee80211_subif_start_xmit()
- can: af_can: fix NULL pointer dereference in can_rx_register()
- can: dev: fix skb drop check, avoid o-o-b access
- nfnetlink: fix potential dead lock in nfnetlink_rcv_msg()
Previous releases - always broken:
- bpf: fix wrong reg type conversion in release_reference()
- gso: fix panic on frag_list with mixed head alloc types
- wifi: brcmfmac: fix buffer overflow in brcmf_fweh_event_worker()
- wifi: mac80211: set TWT Information Frame Disabled bit as 1
- eth: macsec offload related fixes, make sure to clear the keys from
memory
- tun: fix memory leaks in the use of napi_get_frags
- tun: call napi_schedule_prep() to ensure we own a napi
- tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent
- ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to
network
- tipc: fix a msg->req tlv length check
- sctp: clear out_curr if all frag chunks of current msg are pruned,
avoid list corruption
- mctp: fix an error handling path in mctp_init(), avoid leaks"
* tag 'net-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
eth: sp7021: drop free_netdev() from spl2sw_init_netdev()
MAINTAINERS: Move Vivien to CREDITS
net: macvlan: fix memory leaks of macvlan_common_newlink
ethernet: tundra: free irq when alloc ring failed in tsi108_open()
net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open()
ethernet: s2io: disable napi when start nic failed in s2io_card_up()
net: atlantic: macsec: clear encryption keys from the stack
net: phy: mscc: macsec: clear encryption keys when freeing a flow
stmmac: dwmac-loongson: fix missing of_node_put() while module exiting
stmmac: dwmac-loongson: fix missing pci_disable_device() in loongson_dwmac_probe()
stmmac: dwmac-loongson: fix missing pci_disable_msi() while module exiting
cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open()
mctp: Fix an error handling path in mctp_init()
stmmac: intel: Update PCH PTP clock rate from 200MHz to 204.8MHz
net: cxgb3_main: disable napi when bind qsets failed in cxgb_up()
net: cpsw: disable napi in cpsw_ndo_open()
iavf: Fix VF driver counting VLAN 0 filters
ice: Fix spurious interrupt during removal of trusted VF
net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions
net/mlx5e: E-Switch, Fix comparing termination table instance
...
Add some mix of tests into page_fault_test: memory regions with all the
pairwise combinations of read-only, userfaultfd, and dirty-logging. For
example, writing into a read-only region which has a hole handled with
userfaultfd.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-15-ricarkol@google.com
Add some readonly memslot tests into page_fault_test. Mark the data and/or
page-table memory regions as readonly, perform some accesses, and check
that the right fault is triggered when expected (e.g., a store with no
write-back should lead to an mmio exit).
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-14-ricarkol@google.com
Add some dirty logging tests into page_fault_test. Mark the data and/or
page-table memory regions for dirty logging, perform some accesses, and
check that the dirty log bits are set or clean when expected.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-13-ricarkol@google.com
Add some userfaultfd tests into page_fault_test. Punch holes into the
data and/or page-table memslots, perform some accesses, and check that
the faults are taken (or not taken) when expected.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-12-ricarkol@google.com
Add a new test for stage 2 faults when using different combinations of
guest accesses (e.g., write, S1PTW), backing source type (e.g., anon)
and types of faults (e.g., read on hugetlbfs with a hole). The next
commits will add different handling methods and more faults (e.g., uffd
and dirty logging). This first commit starts by adding two sanity checks
for all types of accesses: AF setting by the hw, and accessing memslots
with holes.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-11-ricarkol@google.com
Now that kvm_vm allows specifying different memslots for code, page tables,
and data, use the appropriate memslot when making allocations in
common/libraty code. Change them accordingly:
- code (allocated by lib/elf) use the CODE memslot
- stacks, exception tables, and other core data pages (like the TSS in x86)
use the DATA memslot
- page tables and the PGD use the PT memslot
- test data (anything allocated with vm_vaddr_alloc()) uses the TEST_DATA
memslot
No functional change intended. All allocators keep using memslot #0.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-10-ricarkol@google.com
Refactor virt_arch_pgd_alloc() and vm_vaddr_alloc() in both RISC-V and
aarch64 to fix the alignment of parameters in a couple of calls. This will
make it easier to fix the alignment in a future commit that adds an extra
parameter (that happens to be very long).
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-9-ricarkol@google.com
The vm_create() helpers are hardcoded to place most page types (code,
page-tables, stacks, etc) in the same memslot #0, and always backed with
anonymous 4K. There are a couple of issues with that. First, tests
willing to differ a bit, like placing page-tables in a different backing
source type must replicate much of what's already done by the vm_create()
functions. Second, the hardcoded assumption of memslot #0 holding most
things is spread everywhere; this makes it very hard to change.
Fix the above issues by having selftests specify how they want memory to be
laid out. Start by changing ____vm_create() to not create memslot #0; a
test (to come) will specify all memslots used by the VM. Then, add the
vm->memslots[] array to specify the right memslot for different memory
allocators, e.g.,: lib/elf should use the vm->[MEM_REGION_CODE] memslot.
This will be used as a way to specify the page-tables memslots (to be
backed by huge pages for example).
There is no functional change intended. The current commit lays out memory
exactly as before. A future commit will change the allocators to get the
region they should be using, e.g.,: like the page table allocators using
the pt memslot.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-8-ricarkol@google.com
Add the backing_src_type into struct userspace_mem_region. This struct
already stores a lot of info about memory regions, except the backing
source type. This info will be used by a future commit in order to
determine the method for punching a hole.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-7-ricarkol@google.com
Define macros for memory type indexes and construct DEFAULT_MAIR_EL1
with macros from asm/sysreg.h. The index macros can then be used when
constructing PTEs (instead of using raw numbers).
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-5-ricarkol@google.com
Deleting a memslot (when freeing a VM) is not closing the backing fd,
nor it's unmapping the alias mapping. Fix by adding the missing close
and munmap.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-4-ricarkol@google.com
Add a library function to get the PTE (a host virtual address) of a
given GVA. This will be used in a future commit by a test to clear and
check the access flag of a particular page.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-3-ricarkol@google.com
Move the generic userfaultfd code out of demand_paging_test.c into a
common library, userfaultfd_util. This library consists of a setup and a
stop function. The setup function starts a thread for handling page
faults using the handler callback function. This setup returns a
uffd_desc object which is then used in the stop function (to wait and
destroy the threads).
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-2-ricarkol@google.com
Currently, the debug-exceptions test always uses only
{break,watch}point#0 and the highest numbered context-aware
breakpoint. Modify the test to use all {break,watch}points and
context-aware breakpoints supported on the system.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-10-reijiw@google.com
Currently, the debug-exceptions test doesn't have a test case for
a linked watchpoint. Add a test case for the linked watchpoint to
the test. The new test case uses the highest numbered context-aware
breakpoint (for Context ID match), and the watchpoint#0, which is
linked to the context-aware breakpoint.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-9-reijiw@google.com
Currently, the debug-exceptions test doesn't have a test case for
a linked breakpoint. Add a test case for the linked breakpoint to
the test. The new test case uses a pair of breakpoints. One is the
higiest numbered context-aware breakpoint (for Context ID match),
and the other one is the breakpoint#0 (for Address Match), which
is linked to the context-aware breakpoint.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-8-reijiw@google.com
Change debug_version() to take the ID_AA64DFR0_EL1 value instead of
vcpu as an argument, and change its callsite to read ID_AA64DFR0_EL1
(and pass it to debug_version()).
Subsequent patches will reuse the register value in the callsite.
No functional change intended.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-7-reijiw@google.com
Currently, debug-exceptions test unnecessarily tracks some test stages
using GUEST_SYNC(). The code for it needs to be updated as test cases
are added or removed. Stop doing the unnecessary stage tracking,
as they are not so useful and are a bit pain to maintain.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-6-reijiw@google.com
Remove the hard-coded {break,watch}point #0 from the guest_code() in
debug-exceptions to allow {break,watch}point number to be specified.
Change reset_debug_state() to zeroing all dbg{b,w}{c,v}r_el0 registers
so that guest_code() can use the function to reset those registers
even when non-zero {break,watch}points are specified for guest_code().
Subsequent patches will add test cases for non-zero {break,watch}points.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-4-reijiw@google.com
Introduce helpers in the debug-exceptions test to write to
dbg{b,w}{c,v}r registers. Those helpers will be useful for
test cases that will be added to the test in subsequent patches.
No functional change intended.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-3-reijiw@google.com
Use FIELD_GET() macro to extract ID register fields for existing
aarch64 selftests code. No functional change intended.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-2-reijiw@google.com
The memory area in each slot should be aligned to host page size.
Otherwise, the test will fail. For example, the following command
fails with the following messages with 64KB-page-size-host and
4KB-pae-size-guest. It's not user friendly to abort the test.
Lets do something to report the optimal memory slots, instead of
failing the test.
# ./memslot_perf_test -v -s 1000
Number of memory slots: 999
Testing map performance with 1 runs, 5 seconds each
Adding slots 1..999, each slot with 8 pages + 216 extra pages last
==== Test Assertion Failure ====
lib/kvm_util.c:824: vm_adjust_num_guest_pages(vm->mode, npages) == npages
pid=19872 tid=19872 errno=0 - Success
1 0x00000000004065b3: vm_userspace_mem_region_add at kvm_util.c:822
2 0x0000000000401d6b: prepare_vm at memslot_perf_test.c:273
3 (inlined by) test_execute at memslot_perf_test.c:756
4 (inlined by) test_loop at memslot_perf_test.c:994
5 (inlined by) main at memslot_perf_test.c:1073
6 0x0000ffff7ebb4383: ?? ??:0
7 0x00000000004021ff: _start at :?
Number of guest pages is not compatible with the host. Try npages=16
Report the optimal memory slots instead of failing the test when
the memory area in each slot isn't aligned to host page size. With
this applied, the optimal memory slots is reported.
# ./memslot_perf_test -v -s 1000
Number of memory slots: 999
Testing map performance with 1 runs, 5 seconds each
Memslot count too high for this test, decrease the cap (max is 514)
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-7-gshan@redhat.com
The addresses and sizes passed to vm_userspace_mem_region_add() and
madvise() should be aligned to host page size, which can be 64KB on
aarch64. So it's wrong by passing additional fixed 4KB memory area
to various tests.
Fix it by passing additional fixed 64KB memory area to various tests.
We also add checks to ensure that none of host/guest page size exceeds
64KB. MEM_TEST_MOVE_SIZE is fixed up to 192KB either.
With this, the following command works fine on 64KB-page-size-host and
4KB-page-size-guest.
# ./memslot_perf_test -v -s 512
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-6-gshan@redhat.com
The test case is obviously broken on aarch64 because non-4KB guest
page size is supported. The guest page size on aarch64 could be 4KB,
16KB or 64KB.
This supports variable guest page size, mostly for aarch64.
- The host determines the guest page size when virtual machine is
created. The value is also passed to guest through the synchronization
area.
- The number of guest pages are unknown until the virtual machine
is to be created. So all the related macros are dropped. Instead,
their values are dynamically calculated based on the guest page
size.
- The static checks on memory sizes and pages becomes dependent
on guest page size, which is unknown until the virtual machine
is about to be created. So all the static checks are converted
to dynamic checks, done in check_memory_sizes().
- As the address passed to madvise() should be aligned to host page,
the size of page chunk is automatically selected, other than one
page.
- MEM_TEST_MOVE_SIZE has fixed and non-working 64KB. It will be
consolidated in next patch. However, the comments about how
it's calculated has been correct.
- All other changes included in this patch are almost mechanical
replacing '4096' with 'guest_page_size'.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-5-gshan@redhat.com
prepare_vm() is called in every iteration and run. The allowed memory
slots (KVM_CAP_NR_MEMSLOTS) are probed for multiple times. It's not
free and unnecessary.
Move the probing logic for the allowed memory slots to parse_args()
for once, which is upper layer of prepare_vm().
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-4-gshan@redhat.com
There are two loops in prepare_vm(), which have different conditions.
'slot' is treated as meory slot index in the first loop, but index of
the host virtual address array in the second loop. It makes it a bit
hard to understand the code.
Change the usage of 'slot' in the second loop, to treat it as the
memory slot index either.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-3-gshan@redhat.com
In prepare_vm(), 'data->nslots' is assigned with 'max_mem_slots - 1'
at the beginning, meaning they are interchangeable.
Use 'data->nslots' isntead of 'max_mem_slots - 1'. With this, it
becomes easier to move the logic of probing number of slots into
upper layer in subsequent patches.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-2-gshan@redhat.com
In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.
# ./dirty_log_test -M dirty-ring -c 65536 -m 5
Setting log mode to: 'dirty-ring'
Test iterations: 32, interval: 10 (ms)
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
guest physical test memory offset: 0xffbffe0000
vcpu stops because vcpu is kicked out...
Notifying vcpu to continue
vcpu continues now.
Iteration 1 collected 576 pages
<No more output afterwards>
Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-8-gshan@redhat.com
There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.
- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
thread. It's indicating if the vcpu exit due to full ring buffer.
The value can be carried from previous mode (VM_MODE_P40V48_4K) to
current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
supported.
- The current ring buffer index needs to be reset before next mode
(VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
carried from previous mode (VM_MODE_P40V48_4K).
# ./dirty_log_test -M dirty-ring
Setting log mode to: 'dirty-ring'
Test iterations: 32, interval: 10 (ms)
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
guest physical test memory offset: 0xffbfffc000
:
Dirtied 995328 pages
Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
Testing guest mode: PA-bits:40, VA-bits:48, 64K pages
guest physical test memory offset: 0xffbffc0000
vcpu stops because vcpu is kicked out...
vcpu continues now.
Notifying vcpu to continue
Iteration 1 collected 0 pages
vcpu stops because dirty ring is full...
vcpu continues now.
vcpu stops because dirty ring is full...
vcpu continues now.
vcpu stops because dirty ring is full...
==== Test Assertion Failure ====
dirty_log_test.c:369: cleared == count
pid=10541 tid=10541 errno=22 - Invalid argument
1 0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
2 0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
3 (inlined by) run_test at dirty_log_test.c:795
4 (inlined by) run_test at dirty_log_test.c:705
5 0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
6 0x0000000000401ccf: main at dirty_log_test.c:938
7 0x0000ffff9ecd279b: ?? ??:0
8 0x0000ffff9ecd286b: ?? ??:0
9 0x0000000000401def: _start at ??:?
Reset dirty pages (0) mismatch with collected (35566)
Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-7-gshan@redhat.com
In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.
# ./dirty_log_test -M dirty-ring -m 7
Setting log mode to: 'dirty-ring'
Test iterations: 32, interval: 10 (ms)
Testing guest mode: PA-bits:40, VA-bits:48, 64K pages
guest physical test memory offset: 0xffbffc0000
vcpu stops because vcpu is kicked out...
Notifying vcpu to continue
vcpu continues now.
==== Test Assertion Failure ====
lib/kvm_util.c:1477: addr == MAP_FAILED
pid=9000 tid=9000 errno=0 - Success
1 0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
2 0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
3 0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
4 (inlined by) run_test at dirty_log_test.c:778
5 (inlined by) run_test at dirty_log_test.c:691
6 0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
7 0x0000000000401ccf: main at dirty_log_test.c:921
8 0x0000ffffb06ec79b: ?? ??:0
9 0x0000ffffb06ec86b: ?? ??:0
10 0x0000000000401def: _start at ??:?
Dirty ring mapped private
Fix the issue by using host's page size to map the ring buffer.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-6-gshan@redhat.com
When showing the result of a test group, if one
of the subtests was skipped, while still having
passing subtests, the group result was marked as
SKIP. E.g.:
223/1 usdt/basic:SKIP
223/2 usdt/multispec:OK
223/3 usdt/urand_auto_attach:OK
223/4 usdt/urand_pid_attach:OK
223 usdt:SKIP
The test result of usdt in the example above
should be OK instead of SKIP, because the test
group did have passing tests and it would be
considered in "normal" state.
With this change, only if all of the subtests
were skipped, the group test is marked as SKIP.
When only some of the subtests are skipped, a
more detailed result is given, stating how
many of the subtests were skipped. E.g:
223/1 usdt/basic:SKIP
223/2 usdt/multispec:OK
223/3 usdt/urand_auto_attach:OK
223/4 usdt/urand_pid_attach:OK
223 usdt:OK (SKIP: 1/4)
Signed-off-by: Domenico Cerasuolo <dceras@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221109184039.3514033-1-cerasuolodomenico@gmail.com
Tests to verify the following behavior of `btf_dedup_resolve_fwds`:
- remapping for struct forward declarations;
- remapping for union forward declarations;
- no remapping if forward declaration kind does not match similarly
named struct or union declaration;
- no remapping if forward declaration name is ambiguous;
- base ids are considered for fwd resolution in split btf scenario.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20221109142611.879983-4-eddyz87@gmail.com
An update for libbpf's hashmap interface from void* -> void* to a
polymorphic one, allowing both long and void* keys and values.
This simplifies many use cases in libbpf as hashmaps there are mostly
integer to integer.
Perf copies hashmap implementation from libbpf and has to be
updated as well.
Changes to libbpf, selftests/bpf and perf are packed as a single
commit to avoid compilation issues with any future bisect.
Polymorphic interface is acheived by hiding hashmap interface
functions behind auxiliary macros that take care of necessary
type casts, for example:
#define hashmap_cast_ptr(p) \
({ \
_Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\
#p " pointee should be a long-sized integer or a pointer"); \
(long *)(p); \
})
bool hashmap_find(const struct hashmap *map, long key, long *value);
#define hashmap__find(map, key, value) \
hashmap_find((map), (long)(key), hashmap_cast_ptr(value))
- hashmap__find macro casts key and value parameters to long
and long* respectively
- hashmap_cast_ptr ensures that value pointer points to a memory
of appropriate size.
This hack was suggested by Andrii Nakryiko in [1].
This is a follow up for [2].
[1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
[2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
Test that locked bridge port configurations that are not supported by
mlxsw are rejected.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that packets received via a locked bridge port whose {SMAC, VID}
does not appear in the bridge's FDB or appears with a different port,
trigger the "locked_port" packet trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that packets with a destination MAC of 01:80:C2:00:00:03 trigger
the "eapol" packet trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merely checking whether a trap counter incremented or not without
logging a test result is useful on its own. Split this functionality to
a helper which will be used by subsequent patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
test_progs fails to be compiled in the 32-bit arch, log is as follows:
test_progs.c:1013:52: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
1013 | sprintf(buf, "MSG_TEST_LOG (cnt: %ld, last: %d)",
| ~~^
| |
| long int
| %d
1014 | strlen(msg->test_log.log_buf),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
Fix it.
Fixes: 91b2c0afd0 ("selftests/bpf: Add parallelism to test_progs")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221108015857.132457-1-yangjihong1@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When cross-compiling test_verifier for 32-bit platforms, the casting error is shown below:
test_verifier.c:1263:27: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
1263 | info.xlated_prog_insns = (__u64)*buf;
| ^
cc1: all warnings being treated as errors
Fix it by adding zero-extension for it.
Fixes: 933ff53191 ("selftests/bpf: specify expected instructions in test_verifier tests")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221108121945.4104644-1-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Since the newly added instruction is in the HINT space we can't reasonably
test for it actually being present.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add FEAT_CSSC to the set of features checked by the hwcap selftest.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
When using the flags in KVM_X86_SET_MSR_FILTER and
KVM_CAP_X86_USER_SPACE_MSR it is expected that an attempt to write to
any of the unused bits will fail. Add testing to walk over every bit
in each of the flag fields in MSR filtering and MSR exiting to verify
that unused bits return and error and used bits, i.e. valid bits,
succeed.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20220921151525.904162-6-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some users of KVM implement the UEFI variable store through a paravirtual device
that does not require the "SMM lockbox" component of edk2; allow them to
compile out system management mode, which is not a full implementation
especially in how it interacts with nested virtualization.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220929172016.319443-6-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Address a few problems with the initial test script version:
* On systems with ip6tables but no ip6tables-legacy, testing for
ip6tables was disabled by accident.
* Firewall setup phase did not respect possibly unavailable tools.
* Consistently call nft via '$nft'.
Fixes: 6e31ce831c ("selftests: netfilter: Test reverse path filtering")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let's trigger a R/O longterm pin on three cases of R/O mapped anonymous
pages:
* exclusive (never shared)
* shared (child still alive)
* previously shared (child no longer alive)
... and make sure that the pin is reliable: whatever we write via the page
tables has to be observable via the pin.
Link: https://lkml.kernel.org/r/20220927110120.106906-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
io_uring provides a simple mechanism to test long-term, R/W GUP pins
-- via fixed buffers -- and can be used to verify that GUP pins stay
in sync with the pages in the page table even if a page would
temporarily get mapped R/O or concurrent fork() could accidentially
end up sharing pinned pages with the child.
Note that this essentially re-introduces local_config support that was
removed recently in commit 6f83d6c74e ("Kselftests: remove support of
libhugetlbfs from kselftests").
[david@redhat.com: s/size_t/ssize_t/ on `cur', `total'.]
Link: https://lkml.kernel.org/r/445fe1ae-9e22-0d1d-4d09-272231d2f84a@redhat.com
Link: https://lkml.kernel.org/r/20220927110120.106906-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's run all existing test cases with all hugetlb sizes we're able to
detect.
Note that some tests cases still fail. This will, for example, be fixed
once vmsplice properly uses FOLL_PIN instead of FOLL_GET for pinning.
With 2 MiB and 1 GiB hugetlb on x86_64, the expected failures are:
# [RUN] vmsplice() + unmap in child ... with hugetlb (2048 kB)
not ok 23 No leak from parent into child
# [RUN] vmsplice() + unmap in child ... with hugetlb (1048576 kB)
not ok 24 No leak from parent into child
# [RUN] vmsplice() before fork(), unmap in parent after fork() ... with hugetlb (2048 kB)
not ok 35 No leak from child into parent
# [RUN] vmsplice() before fork(), unmap in parent after fork() ... with hugetlb (1048576 kB)
not ok 36 No leak from child into parent
# [RUN] vmsplice() + unmap in parent after fork() ... with hugetlb (2048 kB)
not ok 47 No leak from child into parent
# [RUN] vmsplice() + unmap in parent after fork() ... with hugetlb (1048576 kB)
not ok 48 No leak from child into parent
Link: https://lkml.kernel.org/r/20220927110120.106906-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add various THP variants that we'll run with our existing test
cases.
Link: https://lkml.kernel.org/r/20220927110120.106906-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We'll reuse it in the anon_cow test next.
Link: https://lkml.kernel.org/r/20220927110120.106906-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/vm: test COW handling of anonymous memory".
This is my current set of tests for testing COW handling of anonymous
memory, especially when interacting with GUP. I developed these tests
while working on PageAnonExclusive and managed to clean them up just now.
On current upstream Linux, all tests pass except the hugetlb tests that
rely on vmsplice -- these tests should pass as soon as vmsplice properly
uses FOLL_PIN instead of FOLL_GET.
I'm working on additional tests for COW handling in private mappings,
focusing on long-term R/O pinning e.g., of the shared zeropage, pagecache
pages and KSM pages. These tests, however, will go into a different file.
So this is everything I have regarding tests for anonymous memory.
This patch (of 7):
Let's start adding tests for our COW handling of anonymous memory. We'll
focus on basic tests that we can achieve without additional libraries or
gup_test extensions.
We'll add THP and hugetlb tests separately.
[david@redhat.com: s/size_t/ssize_t/ on `cur', `total', `transferred';]
Link: https://lkml.kernel.org/r/51302b9e-dc69-d709-3214-f23868028555@redhat.com
Link: https://lkml.kernel.org/r/20220927110120.106906-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220927110120.106906-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After converting all the three relevant testcases (uffd, madvise, mremap)
to use memfd, no test will need the hugetlb mount point anymore. Drop the
code.
Link: https://lkml.kernel.org/r/20221014144015.94039-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For dropping the hugetlb mountpoint in run_vmtests.sh. Cleaned it up a
little bit around the changed codes.
Link: https://lkml.kernel.org/r/20221014144013.94027-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For dropping the hugetlb mountpoint in run_vmtests.sh. Since no parameter
is needed, drop USAGE too.
Link: https://lkml.kernel.org/r/20221014143921.93887-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/vm: Drop hugetlb mntpoint in run_vmtests.sh", v2.
Clean the code up so we can use the same memfd for both hugetlb and shmem
which is cleaner.
This patch (of 4):
We already used memfd for shmem test, move it forward with hugetlb too so
that we don't need user to specify the hugetlb file path explicitly when
running hugetlb shared tests.
Link: https://lkml.kernel.org/r/20221014143921.93887-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221014143921.93887-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Along the development cycle, the testing code support for module/in-kernel
compiles was removed. Restore this functionality by moving any internal
API tests to the userspace side, as well as threading tests. Fix the
lockdep issues and add a way to reduce memory usage so the tests can
complete with KASAN + memleak detection. Make the tests work on 32 bit
hosts where possible and detect 32 bit hosts in the radix test suite.
[akpm@linux-foundation.org: fix module export]
[akpm@linux-foundation.org: fix it some more]
[liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()]
Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use ARRAY_SIZE to fix the following coccicheck warnings:
tools/testing/selftests/arm64/mte/check_buffer_fill.c:341:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:35:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:168:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:72:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:369:25-26:
WARNING: Use ARRAY_SIZE
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Link: https://lore.kernel.org/r/20221105073143.78521-1-tegongkang@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
The signal magic values are supposed to be allocated as somewhat meaningful
ASCII so if we encounter a bad magic value print the any alphanumeric
characters we find in it as well as the hex value to aid debuggability.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221102140543.98193-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
When fixing up support for extra_context in the signal handling tests I
didn't notice that there is a TODO file in the directory which lists this
as a thing to be done. Since it's been done remove it from the list.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221027110324.33802-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Especially when the test is configured to run for a longer time it can be
reassuring to users to see that the supervising program is running OK so
provide a message every second when the output timer expires.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221017144553.773176-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Currently we don't have an explicit check that when it's been a second
since we have seen output produced from the test programs starting up that
means all of them are running and we should start both sending signals and
timing out. This is not reliable, especially on very heavily loaded systems
where the test programs might take longer than a second to run.
We do skip sending signals to children that have not produced output yet
so we won't cause them to exit unexpectedly by sending a signal but this
can create confusion when interpreting output, for example appearing to
show the tests running for less time than expected or appearing to show
missed signal deliveries. Avoid issues by explicitly checking that we have
seen output from all the child processes before we start sending signals
or counting test run time.
This is especially likely on virtual platforms with large numbers of vector
lengths supported since the platforms are slow and there will be a lot of
tasks per CPU.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221017144553.773176-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Two selftests drivers exist under the copyleft-next license.
These drivers were added prior to SPDX practice taking full swing
in the kernel. Now that we have an SPDX tag for copyleft-next-0.3.1
documented, embrace it and remove the boiler plate.
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Kuno Woudt <kuno@frob.nl>
Cc: Richard Fontana <fontana@sharpeleven.org>
Cc: copyleft-next@lists.fedorahosted.org
Cc: Ciaran Farrell <Ciaran.Farrell@suse.com>
Cc: Christopher De Nicolo <Christopher.DeNicolo@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thorsten Leemhuis <linux@leemhuis.info>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Tim Bird <tim.bird@sony.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add tests for memblock_alloc_exact_nid_raw() where the simulated physical
memory is set up with multiple NUMA nodes. Additionally, all but one of
these tests set nid != NUMA_NO_NODE. All tests are run for both top-down
and bottom-up allocation directions.
The tested scenarios are:
Range unrestricted:
- region cannot be allocated:
+ there are no previously reserved regions, but requested node is
too small
+ the requested node is fully reserved
+ the requested node is partially reserved and does not have
enough space
+ none of the nodes have enough memory to allocate the region
Range restricted:
- region can be allocated in the specific node requested without
dropping min_addr:
+ the range fully overlaps with the node, and there are adjacent
reserved regions
- region cannot be allocated:
+ range partially overlaps with two different nodes, where the
second node is the requested node
+ range overlaps with multiple nodes along node boundaries, and
the requested node starts after max_addr
+ nid is set to NUMA_NO_NODE and the total range can fit the
region, but the range is split between two nodes and everything
else is reserved
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/51b14da46e6591428df3aefc5acc7dca9341a541.1667802195.git.remckee0@gmail.com
Add tests for memblock_alloc_exact_nid_raw() where the simulated physical
memory is set up with multiple NUMA nodes. Additionally, all of these
tests set nid != NUMA_NO_NODE. These tests are run with a bottom-up
allocation direction.
The tested scenarios are:
Range unrestricted:
- region can be allocated in the specific node requested:
+ there are no previously reserved regions
+ the requested node is partially reserved but has enough space
Range restricted:
- region can be allocated in the specific node requested after dropping
min_addr:
+ range partially overlaps with two different nodes, where the
first node is the requested node
+ range partially overlaps with two different nodes, where the
requested node ends before min_addr
+ range overlaps with multiple nodes along node boundaries, and
the requested node ends before min_addr
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/935f0eed5e06fd44dc67d9f49b277923d7896bd3.1667802195.git.remckee0@gmail.com
Add tests for memblock_alloc_exact_nid_raw() where the simulated physical
memory is set up with multiple NUMA nodes. Additionally, all of these
tests set nid != NUMA_NO_NODE. These tests are run with a top-down
allocation direction.
The tested scenarios are:
Range unrestricted:
- region can be allocated in the specific node requested:
+ there are no previously reserved regions
+ the requested node is partially reserved but has enough space
Range restricted:
- region can be allocated in the specific node requested after dropping
min_addr:
+ range partially overlaps with two different nodes, where the
first node is the requested node
+ range partially overlaps with two different nodes, where the
requested node ends before min_addr
+ range overlaps with multiple nodes along node boundaries, and
the requested node ends before min_addr
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/2cc0883243d68ddc3faf833d2d9e86f48534c1d7.1667802195.git.remckee0@gmail.com
Add TEST_F_EXACT flag, which specifies that tests should run
memblock_alloc_exact_nid_raw(). Introduce range tests for
memblock_alloc_exact_nid_raw() by using the TEST_F_EXACT flag to run the
range tests in alloc_nid_api.c, since memblock_alloc_exact_nid_raw() and
memblock_alloc_try_nid_raw() behave the same way when nid = NUMA_NO_NODE.
Rename tests and other functions in alloc_nid_api.c by removing "_try".
Since the test names will be displayed in verbose output, they need to
be general enough to refer to any of the memblock functions that the
tests may run.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/5a4b6d1b6130ab7375314e1c45a6d5813dfdabbd.1667802195.git.remckee0@gmail.com
Variable ret is compared with less than zero even though it was set as u32.
So u32 to int conversion is needed.
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20221105183656.86077-1-tegongkang@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
- Fix region creation crash with pass-through decoders
- Fix region creation crash when no decoder allocation fails
- Fix region creation crash when scanning regions to enforce the
increasing physical address order constraint that CXL mandates
- Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
1:1 memory-device-to-region associations.
- Fix a memory leak for cxl_region objects when regions with active
targets are deleted
- Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
emulated proximity domains.
- Fix region creation failure for switch attached devices downstream of
a single-port host-bridge
- Fix false positive memory leak of cxl_region objects by recycling
recently used region ids rather than freeing them
- Add regression test infrastructure for a pass-through decoder
configuration
- Fix some mailbox payload handling corner cases
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY2f0dwAKCRDfioYZHlFs
Z93zAQCHzy4qbEdw95SPQ/BpUJ2rxcWzruFZkaUTU1RHM5lApwEApP9Fjvdkgo9I
dlQTRON1nSqqoEXqSxbt8RU0I9Z11ws=
=pBN4
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"Several fixes for CXL region creation crashes, leaks and failures.
This is mainly fallout from the original implementation of dynamic CXL
region creation (instantiate new physical memory pools) that arrived
in v6.0-rc1.
Given the theme of "failures in the presence of pass-through decoders"
this also includes new regression test infrastructure for that case.
Summary:
- Fix region creation crash with pass-through decoders
- Fix region creation crash when no decoder allocation fails
- Fix region creation crash when scanning regions to enforce the
increasing physical address order constraint that CXL mandates
- Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
1:1 memory-device-to-region associations.
- Fix a memory leak for cxl_region objects when regions with active
targets are deleted
- Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
emulated proximity domains.
- Fix region creation failure for switch attached devices downstream
of a single-port host-bridge
- Fix false positive memory leak of cxl_region objects by recycling
recently used region ids rather than freeing them
- Add regression test infrastructure for a pass-through decoder
configuration
- Fix some mailbox payload handling corner cases"
* tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Recycle region ids
cxl/region: Fix 'distance' calculation with passthrough ports
tools/testing/cxl: Add a single-port host-bridge regression config
tools/testing/cxl: Fix some error exits
cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
cxl/region: Fix cxl_region leak, cleanup targets at region delete
cxl/region: Fix region HPA ordering validation
cxl/pmem: Use size_add() against integer overflow
cxl/region: Fix decoder allocation crash
ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.
cxl/region: Fix null pointer dereference due to pass through decoder commit
cxl/mbox: Add a check on input payload size
lld produces "fast" style build-ids by default, which is inconsistent
with ld's "sha1" style. Explicitly specify build-id style to be "sha1"
when linking liburandom_read.so the same way it is already done for
urandom_read.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20221104094016.102049-1-asavkov@redhat.com
Copy libbpf_strlcpy() from libbpf_internal.h to bpf_util.h, and rename it
to bpf_strlcpy(), then replace selftests strncpy()/libbpf_strlcpy() with
bpf_strlcpy(), fix compile warning.
The libbpf_internal.h header cannot be used directly here, because
references to cgroup_helpers.c in samples/bpf will generate compilation
errors. We also can't add libbpf_strlcpy() directly to bpf_util.h,
because the definition of libbpf_strlcpy() in libbpf_internal.h is
duplicated. In order not to modify the libbpf code, add a new function
bpf_strlcpy() to selftests bpf_util.h.
How to reproduce this compilation warning:
$ make -C samples/bpf
cgroup_helpers.c: In function ‘__enable_controllers’:
cgroup_helpers.c:80:17: warning: ‘strncpy’ specified bound 4097 equals destination size [-Wstringop-truncation]
80 | strncpy(enable, controllers, sizeof(enable));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_469D8AF32BD56816A29981BED06E96D22506@qq.com
-----BEGIN PGP SIGNATURE-----
iIYEABYIAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCY2VLRRAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSejwBAMOIza4vHY0nL5a+C6jQGBjEdueSic8m4K8P
cOOHEn3AAP44Bf1skIMIrgOL7VE+aGCJoIYWiN+rs/tBKYyPnO0gBQ==
=gW7y
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fix from Mickaël Salaün:
"Fix the test build for some distros"
* tag 'landlock-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Build without static libraries
A set of test cases to verify enum fwd resolution logic:
- verify that enum fwd can be resolved as full enum64;
- verify that enum64 fwd can be resolved as full enum;
- verify that enum size is considered when enums are compared for
equivalence.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221101235413.1824260-2-eddyz87@gmail.com
test_align selftest relies on BPF verifier log emitting register states
for specific instructions in expected format. Unfortunately, BPF
verifier precision backtracking log interferes with such expectations.
And instruction on which precision propagation happens sometimes don't
output full expected register states. This does indeed look like
something to be improved in BPF verifier, but is beyond the scope of
this patch set.
So to make test_align a bit more robust, inject few dummy R4 = R5
instructions which capture desired state of R5 and won't have precision
tracking logs on them. This fixes tests until we can improve BPF
verifier output in the presence of precision tracking.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221104163649.121784-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In some conditions, background processes in udpgro don't have enough
time to set up the sockets. When foreground processes start, this
results in the test failing with "./udpgso_bench_tx: sendmsg: Connection
refused". For instance, this happens from time to time on a Qualcomm
SA8540P SoC running CentOS Stream 9.
To fix this, increase the time given to background processes to
complete the startup before foreground processes start.
Signed-off-by: Adrien Thierry <athierry@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finally add support for filtering stats values, similar to
non-comparison mode filtering. For comparison mode 4 variants of stats
are important for filtering, as they allow to filter either A or B side,
but even more importantly they allow to filter based on value
difference, and for verdict stat value difference is MATCH/MISMATCH
classification. So with these changes it's finally possible to easily
check if there were any mismatches between failure/success outcomes on
two separate data sets. Like in an example below:
$ ./veristat -e file,prog,verdict,insns -C ~/baseline-results.csv ~/shortest-results.csv -f verdict_diff=mismatch
File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF)
------------------------------------- --------------------- ----------- ----------- -------------- --------- --------- -------------------
dynptr_success.bpf.linked1.o test_data_slice success failure MISMATCH 85 0 -85 (-100.00%)
dynptr_success.bpf.linked1.o test_read_write success failure MISMATCH 1992 0 -1992 (-100.00%)
dynptr_success.bpf.linked1.o test_ringbuf success failure MISMATCH 74 0 -74 (-100.00%)
kprobe_multi.bpf.linked1.o test_kprobe failure success MISMATCH 0 246 +246 (+100.00%)
kprobe_multi.bpf.linked1.o test_kprobe_manual failure success MISMATCH 0 246 +246 (+100.00%)
kprobe_multi.bpf.linked1.o test_kretprobe failure success MISMATCH 0 248 +248 (+100.00%)
kprobe_multi.bpf.linked1.o test_kretprobe_manual failure success MISMATCH 0 248 +248 (+100.00%)
kprobe_multi.bpf.linked1.o trigger failure success MISMATCH 0 2 +2 (+100.00%)
netcnt_prog.bpf.linked1.o bpf_nextcnt failure success MISMATCH 0 56 +56 (+100.00%)
pyperf600_nounroll.bpf.linked1.o on_event success failure MISMATCH 568128 1000001 +431873 (+76.02%)
ringbuf_bench.bpf.linked1.o bench_ringbuf success failure MISMATCH 8 0 -8 (-100.00%)
strobemeta.bpf.linked1.o on_event success failure MISMATCH 557149 1000001 +442852 (+79.49%)
strobemeta_nounroll1.bpf.linked1.o on_event success failure MISMATCH 57240 1000001 +942761 (+1647.03%)
strobemeta_nounroll2.bpf.linked1.o on_event success failure MISMATCH 501725 1000001 +498276 (+99.31%)
strobemeta_subprogs.bpf.linked1.o on_event success failure MISMATCH 65420 1000001 +934581 (+1428.59%)
test_map_in_map_invalid.bpf.linked1.o xdp_noop0 success failure MISMATCH 2 0 -2 (-100.00%)
test_mmap.bpf.linked1.o test_mmap success failure MISMATCH 46 0 -46 (-100.00%)
test_verif_scale3.bpf.linked1.o balancer_ingress success failure MISMATCH 845499 1000001 +154502 (+18.27%)
------------------------------------- --------------------- ----------- ----------- -------------- --------- --------- -------------------
Note that by filtering on verdict_diff=mismatch, it's now extremely easy and
fast to see any changes in verdict. Example above showcases both failure ->
success transitions (which are generally surprising) and success -> failure
transitions (which are expected if bugs are present).
Given veristat allows to query relative percent difference values, internal
logic for comparison mode is based on floating point numbers, so requires a bit
of epsilon precision logic, deviating from typical integer simple handling
rules.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce the concept of "stat variant", by which it's possible to
specify whether to use the value from A (baseline) side, B (comparison
or control) side, the absolute difference value or relative (percentage)
difference value.
To support specifying this, veristat recognizes `_a`, `_b`, `_diff`,
`_pct` suffixes, which can be appended to stat name(s). In
non-comparison mode variants are ignored (there is only `_a` variant
effectively), if no variant suffix is provided, `_b` is assumed, as
control group is of primary interest in comparison mode.
These stat variants can be flexibly combined with asc/desc orders.
Here's an example of ordering results first by verdict match/mismatch (or n/a
if one of the sides is missing; n/a is always considered to be the lowest
value), and within each match/mismatch/n/a group further sort by number of
instructions in B side. In this case we don't have MISMATCH cases, but N/A are
split from MATCH, demonstrating this custom ordering.
$ ./veristat -e file,prog,verdict,insns -s verdict_diff,insns_b_ -C ~/base.csv ~/comp.csv
File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF)
------------------ ------------------------------ ----------- ----------- -------------- --------- --------- --------------
bpf_xdp.o tail_lb_ipv6 N/A success N/A N/A 151895 N/A
bpf_xdp.o tail_nodeport_nat_egress_ipv4 N/A success N/A N/A 15619 N/A
bpf_xdp.o tail_nodeport_ipv6_dsr N/A success N/A N/A 1206 N/A
bpf_xdp.o tail_nodeport_ipv4_dsr N/A success N/A N/A 1162 N/A
bpf_alignchecker.o tail_icmp6_send_echo_reply N/A failure N/A N/A 74 N/A
bpf_alignchecker.o __send_drop_notify success N/A N/A 53 N/A N/A
bpf_host.o __send_drop_notify success N/A N/A 53 N/A N/A
bpf_host.o cil_from_host success N/A N/A 762 N/A N/A
bpf_xdp.o tail_lb_ipv4 success success MATCH 71736 73430 +1694 (+2.36%)
bpf_xdp.o tail_handle_nat_fwd_ipv4 success success MATCH 21547 20920 -627 (-2.91%)
bpf_xdp.o tail_rev_nodeport_lb6 success success MATCH 17954 17905 -49 (-0.27%)
bpf_xdp.o tail_handle_nat_fwd_ipv6 success success MATCH 16974 17039 +65 (+0.38%)
bpf_xdp.o tail_nodeport_nat_ingress_ipv4 success success MATCH 7658 7713 +55 (+0.72%)
bpf_xdp.o tail_rev_nodeport_lb4 success success MATCH 7126 6934 -192 (-2.69%)
bpf_xdp.o tail_nodeport_nat_ingress_ipv6 success success MATCH 6405 6397 -8 (-0.12%)
bpf_xdp.o tail_nodeport_nat_ipv6_egress failure failure MATCH 752 752 +0 (+0.00%)
bpf_xdp.o cil_xdp_entry success success MATCH 423 423 +0 (+0.00%)
bpf_xdp.o __send_drop_notify success success MATCH 151 151 +0 (+0.00%)
bpf_alignchecker.o tail_icmp6_handle_ns failure failure MATCH 33 33 +0 (+0.00%)
------------------ ------------------------------ ----------- ----------- -------------- --------- --------- --------------
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When comparing two datasets, if either side is missing corresponding
record with the same file and prog name, currently veristat emits
misleading zeros/failures, and even tried to calculate a difference,
even though there is no data to compare against.
This patch improves internal logic of handling such situations. Now
we'll emit "N/A" in places where data is missing and comparison is
non-sensical.
As an example, in an artificially truncated and mismatched Cilium
results, the output looks like below:
$ ./veristat -e file,prog,verdict,insns -C ~/base.csv ~/comp.csv
File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF)
------------------ ------------------------------ ----------- ----------- -------------- --------- --------- --------------
bpf_alignchecker.o __send_drop_notify success N/A N/A 53 N/A N/A
bpf_alignchecker.o tail_icmp6_handle_ns failure failure MATCH 33 33 +0 (+0.00%)
bpf_alignchecker.o tail_icmp6_send_echo_reply N/A failure N/A N/A 74 N/A
bpf_host.o __send_drop_notify success N/A N/A 53 N/A N/A
bpf_host.o cil_from_host success N/A N/A 762 N/A N/A
bpf_xdp.o __send_drop_notify success success MATCH 151 151 +0 (+0.00%)
bpf_xdp.o cil_xdp_entry success success MATCH 423 423 +0 (+0.00%)
bpf_xdp.o tail_handle_nat_fwd_ipv4 success success MATCH 21547 20920 -627 (-2.91%)
bpf_xdp.o tail_handle_nat_fwd_ipv6 success success MATCH 16974 17039 +65 (+0.38%)
bpf_xdp.o tail_lb_ipv4 success success MATCH 71736 73430 +1694 (+2.36%)
bpf_xdp.o tail_lb_ipv6 N/A success N/A N/A 151895 N/A
bpf_xdp.o tail_nodeport_ipv4_dsr N/A success N/A N/A 1162 N/A
bpf_xdp.o tail_nodeport_ipv6_dsr N/A success N/A N/A 1206 N/A
bpf_xdp.o tail_nodeport_nat_egress_ipv4 N/A success N/A N/A 15619 N/A
bpf_xdp.o tail_nodeport_nat_ingress_ipv4 success success MATCH 7658 7713 +55 (+0.72%)
bpf_xdp.o tail_nodeport_nat_ingress_ipv6 success success MATCH 6405 6397 -8 (-0.12%)
bpf_xdp.o tail_nodeport_nat_ipv6_egress failure failure MATCH 752 752 +0 (+0.00%)
bpf_xdp.o tail_rev_nodeport_lb4 success success MATCH 7126 6934 -192 (-2.69%)
bpf_xdp.o tail_rev_nodeport_lb6 success success MATCH 17954 17905 -49 (-0.27%)
------------------ ------------------------------ ----------- ----------- -------------- --------- --------- --------------
Internally veristat now separates joining two datasets and remembering the
join, and actually emitting a comparison view. This will come handy when we add
support for filtering and custom ordering in comparison mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make veristat distinguish between table and CSV output formats and use
different default set of stats (columns) that are emitted. While for
human-readable table output it doesn't make sense to output all known
stats, it is very useful for CSV mode to record all possible data, so
that it can later be queried and filtered in replay or comparison mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Define simple expressions to filter not just by file and program name,
but also by resulting values of collected stats. Support usual
equality and inequality operators. Verdict, which is a boolean-like
field can be also filtered either as 0/1, failure/success (with f/s,
fail/succ, and failure/success aliases) symbols, or as false/true (f/t).
Aliases are case insensitive.
Currently this filtering is honored only in verification and replay
modes. Comparison mode support will be added in next patch.
Here's an example of verifying a bunch of BPF object files and emitting
only results for successfully validated programs that have more than 100
total instructions processed by BPF verifier, sorted by number of
instructions in ascending order:
$ sudo ./veristat *.bpf.o -s insns^ -f 'insns>100'
There can be many filters (both allow and deny flavors), all of them are
combined.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allow to specify '^' at the end of stat name to designate that it should
be sorted in ascending order. Similarly, allow any of 'v', 'V', '.',
'!', or '_' suffix "symbols" to designate descending order. It's such
a zoo for descending order because there is no single intuitive symbol
that could be used (using 'v' looks pretty weird in practice), so few
symbols that are "downwards leaning or pointing" were chosen. Either
way, it shouldn't cause any troubles in practice.
This new feature allows to customize sortering order to match user's
needs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Always fall back to unique file/prog comparison if user's custom order
specs are ambiguous. This ensures stable output no matter what.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Slightly change rules of specifying file/prog glob filters. In practice
it's quite often inconvenient to do `*/<prog-glob>` if that program glob
is unique enough and won't accidentally match any file names.
This patch changes the rules so that `-f <glob>` will apply specified
glob to both file and program names. User still has all the control by
doing '*/<prog-only-glob>' or '<file-only-glob/*'. We also now allow
'/<prog-glob>' and '<file-glob/' (all matching wildcard is assumed if
missing).
Also, internally unify file-only and file+prog checks
(should_process_file and should_process_prog are now
should_process_file_prog that can handle prog name as optional). This
makes maintaining and extending this code easier.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In comparison mode the "Total " part is pretty useless, but takes
a considerable amount of horizontal space. Drop the "Total " parts.
Also make sure that table headers for numerical columns are aligned in
the same fashion as integer values in those columns. This looks better
and is now more obvious with shorter "Insns" and "States" column
headers.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replay mode allow to parse previously stored CSV file with verification
results and present it in desired output (presumable human-readable
table, but CSV to CSV convertion is supported as well). While doing
that, it's possible to use veristat's sorting rules, specify subset of
columns, and filter by file and program name.
In subsequent patches veristat's filtering capabilities will just grow
making replay mode even more useful in practice for post-processing
results.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221103055304.2904589-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add four test cases to verify MAB functionality:
* Verify that a locked FDB entry can be generated by the bridge,
preventing a host from communicating via the bridge. Test that user
space can clear the "locked" flag by replacing the entry, thereby
authenticating the host and allowing it to communicate via the bridge.
* Test that an entry cannot roam to a locked port, but that it can roam
to an unlocked port.
* Test that MAB can only be enabled on a port that is both locked and
has learning enabled.
* Test that locked FDB entries are flushed from a port when MAB is
disabled.
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY2RS7QAKCRDbK58LschI
g6RVAQC1FdSXMrhn369NGCG1Vox1QYn2/5P32LSIV1BKqiQsywEAsxgYNrdCPTua
ie91Q5IJGT9pFl1UR50UrgL11DI5BgI=
=sdhO
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
bpf 2022-11-04
We've added 8 non-merge commits during the last 3 day(s) which contain
a total of 10 files changed, 113 insertions(+), 16 deletions(-).
The main changes are:
1) Fix memory leak upon allocation failure in BPF verifier's stack state
tracking, from Kees Cook.
2) Fix address leakage when BPF progs release reference to an object,
from Youlin Li.
3) Fix BPF CI breakage from buggy in.h uapi header dependency,
from Andrii Nakryiko.
4) Fix bpftool pin sub-command's argument parsing, from Pu Lehui.
5) Fix BPF sockmap lockdep warning by cancelling psock work outside
of socket lock, from Cong Wang.
6) Follow-up for BPF sockmap to fix sk_forward_alloc accounting,
from Wang Yufen.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add verifier test for release_reference()
bpf: Fix wrong reg type conversion in release_reference()
bpf, sock_map: Move cancel_work_sync() out of sock lock
tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI
net/ipv4: Fix linux/in.h header dependencies
bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE
bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues
bpf, verifier: Fix memory leak in array reallocation for stack state
====================
Link: https://lore.kernel.org/r/20221104000445.30761-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a test case to ensure that released pointer registers will not be
leaked into the map.
Before fix:
./test_verifier 984
984/u reference tracking: try to leak released ptr reg FAIL
Unexpected success to load!
verification time 67 usec
stack depth 4
processed 23 insns (limit 1000000) max_states_per_insn 0 total_states 2
peak_states 2 mark_read 1
984/p reference tracking: try to leak released ptr reg OK
Summary: 1 PASSED, 0 SKIPPED, 1 FAILED
After fix:
./test_verifier 984
984/u reference tracking: try to leak released ptr reg OK
984/p reference tracking: try to leak released ptr reg OK
Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Youlin Li <liulin063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221103093440.3161-2-liulin063@gmail.com
This Kselftest fixes update for Linux 6.1-rc4 consists of fixes to
pidfd test.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmNjwB0ACgkQCwJExA0N
QxxRmBAA3+k/OJuQU6dFGrYfUi1UOOXsGWfNBIz6C1BHnXqXxbdgd76cxN6bYGGd
uE10fVZObHi8LXaFhVadKuT0Y17nLa+OQEbHY8FvHgSm0jbF+uO4YDjHmkv7L1ba
tKISeVie7Wg9FcioCjaUnuwzmcK9lgeEW0KDkb8MBfng6V6GniASbLpEE8heJMZj
Uca/amX/vMOUyHCoI1ioZSfiEPwQIppD8X9RURWsKjT72lsKdASZ6eM6/WyIkZ8g
tQ8GiSq5LXbk827ec3buH//vvCZSHZpFmU4zG8fBDIzEOTTcapDWzRjyt1VI8Qep
ZTs2xMhfMpbWVOB/ZO7FHglEjHRgNIwvKPZVMNc+K5QjOlGN6TepFgy546tjHWdm
FmjSwRnj4JMvpvuCeinvLNX1uwVBNUZ0jsHSSjX/6BfUMnlvwqRrlSe5fM81NpcM
m7aB548JpBDWDl66376THvEAJ3cLGioLMthQIpSw7VOVx2fystxEfRuGwBouEWYV
IMe83G8ZPIMs19Za1jIofEpbY6UblNyN45o2NV69El9mWRo6calepTB7tuCPqOGy
sYz4rx3stpMRJbMy4VATbsNFkZnQTv4lwU4HVgWarDcx1gxIvvhCButYhBJlGGFA
lAZDaCulr+aGGgamzUlZVpO+6T3zaGbDyW8PeWJR8k6j+Z2qFO0=
=nB5s
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Fixes to the pidfd test"
* tag 'linux-kselftest-fixes-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/pidfd_test: Remove the erroneous ','
selftests: pidfd: Fix compling warnings
ksefltests: pidfd: Fix wait_states: Test terminated by timeout
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY2GuKgAKCRDbK58LschI
gy32AP9PI0e/bUGDExKJ8g97PeeEtnpj4TTI6g+XKILtYnyXlgD/Rk4j2D/f3IBF
Ha9TmqYvAUim+U/g50vUrNuoNLNJ5w8=
=OKC1
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next 2022-11-02
We've added 70 non-merge commits during the last 14 day(s) which contain
a total of 96 files changed, 3203 insertions(+), 640 deletions(-).
The main changes are:
1) Make cgroup local storage available to non-cgroup attached BPF programs
such as tc BPF ones, from Yonghong Song.
2) Avoid unnecessary deadlock detection and failures wrt BPF task storage
helpers, from Martin KaFai Lau.
3) Add LLVM disassembler as default library for dumping JITed code
in bpftool, from Quentin Monnet.
4) Various kprobe_multi_link fixes related to kernel modules,
from Jiri Olsa.
5) Optimize x86-64 JIT with emitting BMI2-based shift instructions,
from Jie Meng.
6) Improve BPF verifier's memory type compatibility for map key/value
arguments, from Dave Marchevsky.
7) Only create mmap-able data section maps in libbpf when data is exposed
via skeletons, from Andrii Nakryiko.
8) Add an autoattach option for bpftool to load all object assets,
from Wang Yufen.
9) Various memory handling fixes for libbpf and BPF selftests,
from Xu Kuohai.
10) Initial support for BPF selftest's vmtest.sh on arm64,
from Manu Bretelle.
11) Improve libbpf's BTF handling to dedup identical structs,
from Alan Maguire.
12) Add BPF CI and denylist documentation for BPF selftests,
from Daniel Müller.
13) Check BPF cpumap max_entries before doing allocation work,
from Florian Lehner.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits)
samples/bpf: Fix typo in README
bpf: Remove the obsolte u64_stats_fetch_*_irq() users.
bpf: check max_entries before allocating memory
bpf: Fix a typo in comment for DFS algorithm
bpftool: Fix spelling mistake "disasembler" -> "disassembler"
selftests/bpf: Fix bpftool synctypes checking failure
selftests/bpf: Panic on hard/soft lockup
docs/bpf: Add documentation for new cgroup local storage
selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x
selftests/bpf: Add selftests for new cgroup local storage
selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str
bpftool: Support new cgroup local storage
libbpf: Support new cgroup local storage
bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
bpf: Refactor some inode/task/sk storage functions for reuse
bpf: Make struct cgroup btf id global
selftests/bpf: Tracing prog can still do lookup under busy lock
selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection
bpf: Add new bpf_task_storage_delete proto with no deadlock detection
bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check
...
====================
Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It doesn't make sense batch submitting io_uring requests to a single TCP
socket without linking or some other kind of ordering. Moreover, it
causes spurious -EINTR fails due to interaction with task_work. Disable
it for now and keep queue depth=1.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b547698d5938b1b1a898af1c260188d8546ded9a.1666700897.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- fix lock initialization race in gfn-to-pfn cache (+selftests)
- fix two refcounting errors
- emulator fixes
- mask off reserved bits in CPUID
- fix bug with disabling SGX
RISC-V:
- update MAINTAINERS
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNcYawUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPMzwf/Uh1lO2Op6IJ3/no2gPfShc9bdqgM
LiCkzfJp9PQAuDl/hs44CiQHUlPEfeIsI/ns0euNj37TlnB3zKmm46mtiWhEefIH
rwcm/ngKgw3283pZEf8FeMTDfNexOaBg2ZNoODR7JQsU50tbToY4TNE2nNRgbdL5
SNmzOwox1rZIQHxEa2r/k2B/HdRbeCFUU82EjwFqaNzH1yhzBXMcokdSCmGCBMsE
3xfCzQ7uMkXw/rlkkG0be65+5dTNmhfiKQYGAQe4s7PycVPMD79D2EhCfbpvbK7t
EmgOXStmvtW6+ukqPATHbRVCDwW0VmiQv5IWOGbLB1Qdy5/REynJ5ObC8g==
=Hvro
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86:
- fix lock initialization race in gfn-to-pfn cache (+selftests)
- fix two refcounting errors
- emulator fixes
- mask off reserved bits in CPUID
- fix bug with disabling SGX
RISC-V:
- update MAINTAINERS"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign()
KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
KVM: x86: emulator: update the emulation mode after CR0 write
KVM: x86: emulator: update the emulation mode after rsm
KVM: x86: emulator: introduce emulator_recalc_and_set_mode
KVM: x86: emulator: em_sysexit should update ctxt->mode
KVM: selftests: Mark "guest_saw_irq" as volatile in xen_shinfo_test
KVM: selftests: Add tests in xen_shinfo_test to detect lock races
KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache
KVM: Initialize gfn_to_pfn_cache locks in dedicated helper
KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
KVM: x86: Exempt pending triple fault from event injection sanity check
MAINTAINERS: git://github -> https://github.com for kvm-riscv
KVM: debugfs: Return retval of simple_attr_open() if it fails
KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open()
KVM: x86: Mask off reserved bits in CPUID.8000001FH
KVM: x86: Mask off reserved bits in CPUID.8000001AH
KVM: x86: Mask off reserved bits in CPUID.80000008H
KVM: x86: Mask off reserved bits in CPUID.80000006H
KVM: x86: Mask off reserved bits in CPUID.80000001H
Add gitsource.sh trigger the gitsource testing and monitor the cpu desire
performance, frequency, load, power consumption and throughput etc.
1) Download and tar gitsource codes.
2) Run gitsource benchmark on specific governors, ondemand or schedutil.
3) Run tbench benchmark comparative test on acpi-cpufreq kernel driver.
4) Get desire performance, frequency, load by perf.
5) Get power consumption and throughput by amd_pstate_trace.py.
6) Get run time by /usr/bin/time.
7) Analyse test results and save it in file selftest.gitsource.csv.
8) Plot png images about time, energy and performance per watt
for each test.
Fixed whitespace error during commit:
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add tbench.sh trigger the tbench testing and monitor the cpu desire
performance, frequency, load, power consumption and throughput etc.
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Split basic.sh into run.sh and basic.sh.
The modification makes basic.sh more pure, just for test basic kernel
functions. The file of run.sh mainly contains functions such as test
entry, parameter check, prerequisite and log clearing etc.
Then you can specify test case in kselftest/amd-pstate, for example:
sudo ./run.sh -c basic. The detail please run the below script.
./run.sh --help
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Rename amd-pstate-ut.sh to basic.sh.
The purpose of this modification is to facilitate the subsequent
addition of gitsource, tbench and other tests.
Signed-off-by: Meng Li <li.meng@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
E.g. all the hw_breakpoint tests are failing right now.
So if I run `kunit.py run --altests --arch=x86_64`, then I see
> Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7
Seeing which 9 tests failed out of the hundreds is annoying.
If my terminal doesn't have scrollback support, I have to resort to
looking at `.kunit/test.log` for the `not ok` lines.
Teach kunit.py to print a summarized list of failures if the # of tests
reachs an arbitrary threshold (>=100 tests).
To try and keep the output from being too long/noisy, this new logic
a) just reports "parent_test failed" if every child test failed
b) won't print anything if there are >10 failures (also arbitrary).
With this patch, we get an extra line of output showing:
> Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7
> Failures: hw_breakpoint
This also works with parameterized tests, e.g. if I add a fake failure
> Failures: kcsan.test_atomic_builtins_missing_barrier.threads=6
Note: we didn't have enough tests for this to be a problem before.
But with commit 980ac3ad05 ("kunit: tool: rename all_test_uml.config,
use it for --alltests"), --alltests works and thus running >100 tests
will probably become more common.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, if you run
$ ./tools/testing/kunit/kunit_tool_test.py
you'll see a lot of output from the parser as we feed it testdata.
This makes the output hard to read and fairly confusing, esp. since our
testdata includes example failures, which get printed out in red.
Silence that output so real failures are easier to see.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Reserve 129th region in the memblock, and this will trigger the
memblock_double_array() function, this needs valid memory regions. So
using dummy_physical_memory_init() to allocate a valid memory region.
At the same time, reserve 128 faked memory region, and make sure these
reserved region not intersect with the valid memory region. So
memblock_double_array() will choose the valid memory region, and it will
success.
Also need to restore the reserved.regions after memblock_double_array(),
to make sure the subsequent tests can run as normal.
Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20221011062128.49359-3-shaoqin.huang@intel.com
Add 129th region into the memblock, and this will trigger the
memblock_double_array() function, this needs valid memory regions. So
using dummy_physical_memory_init() to allocate a large enough memory
region, and split it into a large enough memory which can be choosed by
memblock_double_array(), and the left memory will be split into small
memory region, and add them into the memblock. It make sure the
memblock_double_array() will always choose the valid memory region that
is allocated by the dummy_physical_memory_init().
So memblock_double_array() must success.
Another thing should be done is to restore the memory.regions after
memblock_double_array(), due to now the memory.regions is pointing to a
memory region allocated by dummy_physical_memory_init(). And it will
affect the subsequent tests if we don't restore the memory region. So
simply record the origin region, and restore it after the test.
Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20221011062128.49359-2-shaoqin.huang@intel.com
Fix warnings and enable Wall.
pidfd_wait.c: In function ‘wait_nonblock’:
pidfd_wait.c:150:13: warning: unused variable ‘status’ [-Wunused-variable]
150 | int pidfd, status = 0;
| ^~~~~~
...
pidfd_test.c: In function ‘child_poll_exec_test’:
pidfd_test.c:438:1: warning: no return statement in function returning non-void [-Wreturn-type]
438 | }
| ^
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
v2: fix mistake assignment to pidfd
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
0Day/LKP observed that the kselftest blocks forever since one of the
pidfd_wait doesn't terminate in 1 of 30 runs. After digging into
the source, we found that it blocks at:
ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0);
wait_states has below testing flow:
CHILD PARENT
---------------+--------------
1 STOP itself
2 WAIT for CHILD STOPPED
3 SIGNAL CHILD to CONT
4 CONT
5 STOP itself
5' WAIT for CHILD CONT
6 WAIT for CHILD STOPPED
The problem is that the kernel cannot ensure the order of 5 and 5', once
5 goes first, the test will fail.
we can reproduce it by:
$ while true; do make run_tests -C pidfd; done
Introduce a blocking read in child process to make sure the parent can
check its WCONTINUED.
CC: Philip Li <philip.li@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Paul and I got trapped a few times by not seeing the effects of applying
a patch to the nolibc source code until a "make clean" was issued in
the nolibc directory. It's particularly annoying when trying to confirm
that a proposed patch really solves a problem (or that reverting it
reintroduces the problem).
The reason for the sysroot not being rebuilt was that it can be quite
slow. But in fact it's only slow after a "make clean" issued at the
kernel's topdir, because it's the main "make headers" that can take a
tens of seconds; as long as "usr/include" still contains headers, the
"headers_install" phase is only a quick "rsync", and rebuilding the
whole nolibc sysroot takes a bit less than one second, which is perfectly
acceptable for a test, even more once the time lost caused by misleading
results is factored in.
This patch marks the sysroot target as phony and starts by clearing
the previous sysroot for the current architecture before reinstalling
it. Thanks to this, applying a patch to nolibc makes the effect
immediately visible to "make nolibc-test":
$ time make -j -C tools/testing/selftests/nolibc nolibc-test
make: Entering directory '/k/tools/testing/selftests/nolibc'
MKDIR sysroot/x86/include
make[1]: Entering directory '/k/tools/include/nolibc'
make[2]: Entering directory '/k'
make[2]: Leaving directory '/k'
make[2]: Entering directory '/k'
INSTALL /k/tools/testing/selftests/nolibc/sysroot/sysroot/include
make[2]: Leaving directory '/k'
make[1]: Leaving directory '/k/tools/include/nolibc'
CC nolibc-test
make: Leaving directory '/k/tools/testing/selftests/nolibc'
real 0m0.869s
user 0m0.716s
sys 0m0.149s
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Link: https://lore.kernel.org/all/20221021155645.GK5600@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This adds 7 combinations of input values for memcmp() using signed and
unsigned bytes, which will trigger on the original code before Rasmus'
fix. This is mostly aimed at helping backporters verify their work, and
showing how tests for corner cases can be added to the selftests suite.
Before the fix it reports:
12 memcmp_20_20 = 0 [OK]
13 memcmp_20_60 = -64 [OK]
14 memcmp_60_20 = 64 [OK]
15 memcmp_20_e0 = 64 [FAIL]
16 memcmp_e0_20 = -64 [FAIL]
17 memcmp_80_e0 = -96 [OK]
18 memcmp_e0_80 = 96 [OK]
And after:
12 memcmp_20_20 = 0 [OK]
13 memcmp_20_60 = -64 [OK]
14 memcmp_60_20 = 64 [OK]
15 memcmp_20_e0 = -192 [OK]
16 memcmp_e0_20 = 192 [OK]
17 memcmp_80_e0 = -96 [OK]
18 memcmp_e0_80 = 96 [OK]
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tag "guest_saw_irq" as "volatile" to ensure that the compiler will never
optimize away lookups. Relying on the compiler thinking that the flag
is global and thus might change also works, but it's subtle, less robust,
and looks like a bug at first glance, e.g. risks being "fixed" and
breaking the test.
Make the flag "static" as well since convincing the compiler it's global
is no longer necessary.
Alternatively, the flag could be accessed with {READ,WRITE}_ONCE(), but
literally every access would need the wrappers, and eking out performance
isn't exactly top priority for selftests.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013211234.1318131-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tests for races between shinfo_cache (de)activation and hypercall+ioctl()
processing. KVM has had bugs where activating the shared info cache
multiple times and/or with concurrent users results in lock corruption,
NULL pointer dereferences, and other fun.
For the timer injection testcase (#22), re-arm the timer until the IRQ
is successfully injected. If the timer expires while the shared info
is deactivated (invalid), KVM will drop the event.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221013211234.1318131-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we have a good way to specify dependency of tests on programs,
convert some of the tracer tests to use this method for specifying
dependency on 'chrt'.
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
All these tests depend on the ping command and will fail if it is not
found. Allow tests to specify dependencies on programs through the
'requires' field. Add dependency on 'ping' for some of the trigger
tests.
Link: https://lore.kernel.org/all/20221017104312.16af5467@gandalf.local.home/
Reported-by: Akanksha J N <akanksha@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
2871edb32f ("can: kvaser_usb: Fix possible completions during init_completion")
abb8670938 ("can: kvaser_usb_leaf: Ignore stale bus-off after start")
8d21f5927a ("can: kvaser_usb_leaf: Fix improved state not being reported")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previous commit resolves a WARN splat that can be difficult to reproduce,
but with the ovs-dpctl.py utility, it can be trivial. Introduce a test
case which creates a DP, and then downgrades the feature set. This will
include a utility 'ovs-dpctl.py' that can be extended to do additional
tests and diagnostics.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is a spelling mistake in a print statement. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This allows the use of a matchJSON field in tests to match
against JSON output from the command under test, if that
command outputs JSON.
You specify what you want to match against as a JSON array
or object in the test's matchJSON field. You can leave out
any fields you don't want to match against that are present
in the output and they will be skipped.
An example matchJSON value would look like this:
"matchJSON": [
{
"Value": {
"neighIP": {
"family": 4,
"addr": "AQIDBA==",
"width": 32
},
"nsflags": 142,
"ncflags": 0,
"LLADDR": "ESIzRFVm"
}
}
]
The real output from the command under test might have some
extra fields that we don't care about for matching, and
since we didn't include them in our matchJSON value, those
fields will not be attempted to be matched. If everything
we included above has the same values as the real command
output, the test will pass.
The matchJSON field's type must be the same as the command
output's type, otherwise the test will fail. So if the
command outputs an array, then the value of matchJSON must
also be an array.
If matchJSON is an array, it must not contain more elements
than the command output's array, otherwise the test will
fail.
Signed-off-by: Jeremy Carter <jeremy@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20221024111603.2185410-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kernel-patches/bpf failed with error:
Running bpftool checks...
Comparing /data/users/ast/net-next/tools/include/uapi/linux/bpf.h (bpf_map_type) and
/data/users/ast/net-next/tools/bpf/bpftool/map.c (do_help() TYPE):
{'cgroup_storage_deprecated', 'cgroup_storage'}
Comparing /data/users/ast/net-next/tools/include/uapi/linux/bpf.h (bpf_map_type) and
/data/users/ast/net-next/tools/bpf/bpftool/Documentation/bpftool-map.rst (TYPE):
{'cgroup_storage_deprecated', 'cgroup_storage'}
The selftests/bpf/test_bpftool_synctypes.py runs checking in the above.
The failure is introduced by Commit c4bcfb38a95e("bpf: Implement cgroup storage available
to non-cgroup-attached bpf progs"). The commit introduced BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED
which has the same enum value as BPF_MAP_TYPE_CGROUP_STORAGE.
In test_bpftool_synctypes.py, one test is to compare uapi bpf.h map types and
bpftool supported maps. The tool picks 'cgroup_storage_deprecated' from bpf.h
while bpftool supported map is displayed as 'cgroup_storage'. The test failure
can be fixed by explicitly replacing 'cgroup_storage_deprecated' with 'cgroup_storage'
in uapi bpf.h map types.
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20221026163014.470732-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When running tests, we should probably accept any help we can get when
it comes to detecting issues early or making them more debuggable. We
have seen a few cases where a test_progs_noalu32 run, for example,
encountered a soft lockup and stopped making progress. It was only
interrupted once we hit the overall test timeout [0]. We can not and do
not want to necessarily rely on test timeouts, because those rely on
infrastructure provided by the environment we run in (and which is not
present in tools/testing/selftests/bpf/vmtest.sh, for example).
To that end, let's enable panics on soft as well as hard lockups to fail
fast should we encounter one. That's happening in the configuration
indented to be used for selftests (including when using vmtest.sh or
when running in BPF CI).
[0] https://github.com/kernel-patches/bpf/runs/7844499997
Signed-off-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20221025231546.811766-1-deso@posteo.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test cgrp_local_storage have some programs utilizing trampoline.
Arch s390x does not support trampoline so add the test to
the corresponding DENYLIST file.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042917.675685-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add four tests for new cgroup local storage, (1) testing bpf program helpers
and user space map APIs, (2) testing recursive fentry triggering won't deadlock,
(3) testing progs attached to cgroups, and (4) a negative test if the
bpf_cgrp_storage_get() helper key is not a cgroup btf id.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042911.675546-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previous bpf patch made a change to uapi bpf.h like
@@ -922,7 +922,14 @@ enum bpf_map_type {
BPF_MAP_TYPE_SOCKHASH,
- BPF_MAP_TYPE_CGROUP_STORAGE,
+ BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
+ BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
where BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED and BPF_MAP_TYPE_CGROUP_STORAGE
have the same enum value. This will cause selftest test_libbpf_str/bpf_map_type_str
failing. This patch fixed the issue by avoid the check for
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED in the test.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042906.674830-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch modifies the task_ls_recursion test to check that
the first bpf_task_storage_get(&map_a, ...) in BPF_PROG(on_update)
can still do the lockless lookup even it cannot acquire the percpu
busy lock. If the lookup succeeds, it will increment the value
by 1 and the value in the task storage map_a will become 200+1=201.
After that, BPF_PROG(on_update) tries to delete from map_a and
should get -EBUSY because it cannot acquire the percpu busy lock
after finding the data.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-10-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a test to check for deadlock failure
in bpf_task_storage_{get,delete} when called by a sleepable bpf_lsm prog.
It also checks if the prog_info.recursion_misses is non zero.
The test starts with 32 threads and they are affinitized to one cpu.
In my qemu setup, with CONFIG_PREEMPT=y, I can reproduce it within
one second if it is run without the previous patches of this set.
Here is the test error message before adding the no deadlock detection
version of the bpf_task_storage_{get,delete}:
test_nodeadlock:FAIL:bpf_task_storage_get busy unexpected bpf_task_storage_get busy: actual 2 != expected 0
test_nodeadlock:FAIL:bpf_task_storage_delete busy unexpected bpf_task_storage_delete busy: actual 2 != expected 0
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test for WDIOC_GETTEMP and this ioctl might not be supported by some
devices and if it is this test will print the following message:
Inappropriate ioctl for device
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Change show watchdog_info output to print option strings instead
of hex values to make it user friendly and human readable.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The test prints reset reason when WDIOC_GETBOOTSTATUS returns
a 0 flag value and all other cases simply say watchdog.
Parse the flag values returned by WDIOC_GETBOOTSTATUS and print
corresponding reset reason for each WDIOF_* boot status.
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
* check that a child process is in parent's time namespace after vfork.
* check that a child process is in the target namespace after exec.
Output on success:
1..4
ok 1 parent before vfork
ok 2 child after exec
ok 3 wait for child
ok 4 parent after vfork
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221013173154.291597-1-avagin@google.com
Adding kprobe_multi kmod attach api tests that attach bpf_testmod
functions via bpf_program__attach_kprobe_multi_opts.
Running it as serial test, because we don't want other tests to
reload bpf_testmod while it's running.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding test that makes sure the kernel module won't be removed
if there's kprobe multi link defined on top of it.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding 3 bpf_testmod_fentry_* functions to have a way to test
kprobe multi link on kernel module. They follow bpf_fentry_test*
functions prototypes/code.
Adding equivalent functions to all bpf_fentry_test* does not
seems necessary at the moment, could be added later.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding load_kallsyms_refresh function to re-read symbols from
/proc/kallsyms file.
This will be needed to get proper functions addresses from
bpf_testmod.ko module, which is loaded/unloaded several times
during the tests run, so symbols might be already old when
we need to use them.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Some highly optimised applications use SO_INCOMING_CPU to make them
efficient, but they didn't test if it's working correctly by getsockopt()
to avoid slowing down. As a result, no one noticed it had been broken
for years, so it's a good time to add a test to catch future regression.
The test does
1) Create $(nproc) TCP listeners associated with each CPU.
2) Create 32 child sockets for each listener by calling
sched_setaffinity() for each CPU.
3) Check if accept()ed sockets' sk_incoming_cpu matches
listener's one.
If we see -EAGAIN, SO_INCOMING_CPU is broken. However, we might not see
any error even if broken; the kernel could miraculously distribute all SYN
to correct listeners. Not to let that happen, we must increase the number
of clients and CPUs to some extent, so the test requires $(nproc) >= 2 and
creates 64 sockets at least.
Test:
$ nproc
96
$ ./so_incoming_cpu
Before the previous patch:
# Starting 12 tests from 5 test cases.
# RUN so_incoming_cpu.before_reuseport.test1 ...
# so_incoming_cpu.c:191:test1:Expected cpu (5) == i (0)
# test1: Test terminated by assertion
# FAIL so_incoming_cpu.before_reuseport.test1
not ok 1 so_incoming_cpu.before_reuseport.test1
...
# FAILED: 0 / 12 tests passed.
# Totals: pass:0 fail:12 xfail:0 xpass:0 skip:0 error:0
After:
# Starting 12 tests from 5 test cases.
# RUN so_incoming_cpu.before_reuseport.test1 ...
# so_incoming_cpu.c:199:test1:SO_INCOMING_CPU is very likely to be working correctly with 3072 sockets.
# OK so_incoming_cpu.before_reuseport.test1
ok 1 so_incoming_cpu.before_reuseport.test1
...
# PASSED: 12 / 12 tests passed.
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Current release - regressions:
- eth: fman: re-expose location of the MAC address to userspace,
apparently some udev scripts depended on the exact value
Current release - new code bugs:
- bpf:
- wait for busy refill_work when destroying bpf memory allocator
- allow bpf_user_ringbuf_drain() callbacks to return 1
- fix dispatcher patchable function entry to 5 bytes nop
Previous releases - regressions:
- net-memcg: avoid stalls when under memory pressure
- tcp: fix indefinite deferral of RTO with SACK reneging
- tipc: fix a null-ptr-deref in tipc_topsrv_accept
- eth: macb: specify PHY PM management done by MAC
- tcp: fix a signed-integer-overflow bug in tcp_add_backlog()
Previous releases - always broken:
- eth: amd-xgbe: SFP fixes and compatibility improvements
Misc:
- docs: netdev: offer performance feedback to contributors
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmNW024ACgkQMUZtbf5S
IrvX7w//SP/zKZwgzC13zd2rrCP16TX2QvkHPmSLvcldQDXdCypmsoc5Vb8UNpkG
jwAuy2pxqPy2oxTwTBQv9TNRT2oqEFOsFTK+w410whlL7g1wZ02aXU8qFhV2XumW
o4gRtM+UISPUKFbOnawdK1XlrNdeLF3bjETvW2GP9zxCb0iqoQXtDDNKxv2B2iQA
MSyTtzHA4n9GS7LKGtPgsP2Ose7h1Z+AjTIpQH1nvfEHJUf/wmxUdCK+fuwfeLjY
PhmYaPG/333j1bfBk1Ms/nUYA5KRXlEj9A/7jDtxhxNEwaTNKyLB19a6oVxXxpSQ
x/k+nZP1RColn5xeco5a1X9aHHQ46PJQ8wVAmxYDIeIA5XPMgShNmhAyjrq1ac+o
9vYeYpmnMGSTLdBMvGbWpynWHe7SddgF8LkbnYf2HLKbxe4bgkOnmxOUH4q9iinZ
MfVSknjax4DP0C7X1kGgR6WyltWnkrahOdUkINsIUNxj0KxJa/eStpJIbJrfkxNV
gHbOjB2/bF3SXENrS4A0IJCgsbO9YugN83Eyu0WDWQOw9wVgopzxOJx9R+H0wkVH
XpGGP8qi1DZiTE3iQiq1LHj6f6kirFmtt9QFH5yzaqtKBaqXakHaXwUO4VtD+BI9
NPFKvFL6jrp8EAn0PTM/RrvhJZN+V0bFXiyiMe0TLx+aR0UMxGc=
=dD6N
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf.
The net-memcg fix stands out, the rest is very run-off-the-mill. Maybe
I'm biased.
Current release - regressions:
- eth: fman: re-expose location of the MAC address to userspace,
apparently some udev scripts depended on the exact value
Current release - new code bugs:
- bpf:
- wait for busy refill_work when destroying bpf memory allocator
- allow bpf_user_ringbuf_drain() callbacks to return 1
- fix dispatcher patchable function entry to 5 bytes nop
Previous releases - regressions:
- net-memcg: avoid stalls when under memory pressure
- tcp: fix indefinite deferral of RTO with SACK reneging
- tipc: fix a null-ptr-deref in tipc_topsrv_accept
- eth: macb: specify PHY PM management done by MAC
- tcp: fix a signed-integer-overflow bug in tcp_add_backlog()
Previous releases - always broken:
- eth: amd-xgbe: SFP fixes and compatibility improvements
Misc:
- docs: netdev: offer performance feedback to contributors"
* tag 'net-6.1-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
net-memcg: avoid stalls when under memory pressure
tcp: fix indefinite deferral of RTO with SACK reneging
tcp: fix a signed-integer-overflow bug in tcp_add_backlog()
net: lantiq_etop: don't free skb when returning NETDEV_TX_BUSY
net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed
docs: netdev: offer performance feedback to contributors
kcm: annotate data-races around kcm->rx_wait
kcm: annotate data-races around kcm->rx_psock
net: fman: Use physical address for userspace interfaces
net/mlx5e: Cleanup MACsec uninitialization routine
atlantic: fix deadlock at aq_nic_stop
nfp: only clean `sp_indiff` when application firmware is unloaded
amd-xgbe: add the bit rate quirk for Molex cables
amd-xgbe: fix the SFP compliance codes check for DAC cables
amd-xgbe: enable PLL_CTL for fixed PHY modes only
amd-xgbe: use enums for mailbox cmd and sub_cmds
amd-xgbe: Yellow carp devices do not need rrc
bpf: Use __llist_del_all() whenever possbile during memory draining
bpf: Wait for busy refill_work when destroying bpf memory allocator
MAINTAINERS: add keyword match on PTP
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmNVkYkACgkQ6rmadz2v
bTqzHw/+NYMwfLm5Ck+BK0+HiYU5VVLoG4jp8G7B3sJL/6nUDduajzoqa+nM19Xl
+HEjbMza7CizmhkCRkzIs1VVtx8mtvKdTxbhvm77SU2+GBn+X1es+XhtFd4EOpok
MINNHs+cOC/HlnPD/QbFgvxKiKkjyjWxInjUp6Y/mLMcKCn7l9KOkc07/la9Dj3j
RI0gXCywq1pJaPuTCnt0/wcYLJvzn6QsZnKmmksQwt59GQqOd11HWid3rBWZhDp6
beEoHDIMGHROtu60vm4DB0p4l6tauZfeXyPCeu3Tx5ZSsypJIyU1iTdKiIUjG963
ilpy55nrX9bWxadB7LIKHyYfW3in4o+D1ZZaUvLIato/69CZJZ7Uc4kU1RF4Ay1F
Df1Fmal2WeNAxxETPmQPvVeCePvQvwLHl4KNogdZZvd/67cyc1cDhnuTJp37iPak
FALHaaw0VOrTdTvxsWym7yEbkhPbCHpPrKYFZFHgGrRTFk/GM2k38mM07lcLxFGw
aKyooS+eoIZMEgtK5Hma2wpeIVSlkJiJk1d0K20OxdnIUyYEsMXmI+uV1gMxq/8z
EHNi0+296xOoxy22I1Bd5Tu7fIeecHFN44q7YFmpGsB54UNLpFsP0vYUmYT/6hLI
Y0KVZu4c3oQDX7ttifMvkeOCURDJBPrZx37bpNpNXF55fB5ehNk=
=eV7W
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2022-10-23
We've added 7 non-merge commits during the last 18 day(s) which contain
a total of 8 files changed, 69 insertions(+), 5 deletions(-).
The main changes are:
1) Wait for busy refill_work when destroying bpf memory allocator, from Hou.
2) Allow bpf_user_ringbuf_drain() callbacks to return 1, from David.
3) Fix dispatcher patchable function entry to 5 bytes nop, from Jiri.
4) Prevent decl_tag from being referenced in func_proto, from Stanislav.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Use __llist_del_all() whenever possbile during memory draining
bpf: Wait for busy refill_work when destroying bpf memory allocator
bpf: Fix dispatcher patchable function entry to 5 bytes nop
bpf: prevent decl_tag from being referenced in func_proto
selftests/bpf: Add reproducer for decl_tag in func_proto return type
selftests/bpf: Make bpf_user_ringbuf_drain() selftest callback return 1
bpf: Allow bpf_user_ringbuf_drain() callbacks to return 1
====================
Link: https://lore.kernel.org/r/20221023192244.81137-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fix compilation without RISCV_ISA_ZICBOM
- Fix kvm_riscv_vcpu_timer_pending() for Sstc
ARM:
- Fix a bug preventing restoring an ITS containing mappings
for very large and very sparse device topology
- Work around a relocation handling error when compiling
the nVHE object with profile optimisation
- Fix for stage-2 invalidation holding the VM MMU lock
for too long by limiting the walk to the largest
block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
x86:
- add compat implementation for KVM_X86_SET_MSR_FILTER ioctl
selftests:
- synchronize includes between include/uapi and tools/include/uapi
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNT2hQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNeVwgAkGGk2F2SF5s+MQUQ9tDPxyuRbddN
NPo/YRTKszKc8rK6d1TCbQi56I3e8Oa7kNkMF7CiBlAekB7B1r1ySg5qc+3lQebx
moME30Ru4nmfqPcZ7971MA8Me7zZxGzvIviL5KIwm1ownGifdTsPZ9jCvu4EPdzv
3dd10guH3GeBIq8QeQGEqNP4fticziwhE+IA3HZstcWsq96800Le7WNAgklfzdC+
YTB81QU6whHv6N/7YvRcTbp+tER3VIKdFMmRD1FwC90flhXMbxTymESFXULfHCM2
x/arGz2E31/QGgJo0/Yy2VPenr5ZMU57dL4SYWR02mwSfJQnJWb1cRdWnw==
=rxQ7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"RISC-V:
- Fix compilation without RISCV_ISA_ZICBOM
- Fix kvm_riscv_vcpu_timer_pending() for Sstc
ARM:
- Fix a bug preventing restoring an ITS containing mappings for very
large and very sparse device topology
- Work around a relocation handling error when compiling the nVHE
object with profile optimisation
- Fix for stage-2 invalidation holding the VM MMU lock for too long
by limiting the walk to the largest block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
x86:
- add compat implementation for KVM_X86_SET_MSR_FILTER ioctl
selftests:
- synchronize includes between include/uapi and tools/include/uapi"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools: include: sync include/api/linux/kvm.h
KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
kvm: Add support for arch compat vm ioctls
RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc
RISC-V: Fix compilation without RISCV_ISA_ZICBOM
KVM: arm64: vgic: Fix exit condition in scan_its_table()
KVM: arm64: nvhe: Fix build with profile optimization
KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test
KVM: arm64: selftests: Fix multiple versions of GIC creation
KVM: arm64: Enable stack protection and branch profiling for VHE
KVM: arm64: Limit stage2_apply_range() batch size to largest block
KVM: arm64: Work out supported block level at compile time
- Rework how SIGTRAPs get delivered to events to address a bunch of
problems with it. Add a selftest for that too
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNVE9UACgkQEsHwGGHe
VUpodRAAsn3s+xX02qfxRG0mYX/1nnOmnFnDhmUEPAZpDXjB6g3PFXAh26F9tw92
dLm8fbfp8W3tK7TQrGyOGWMnb7oj4gEoXAcQBXvsq2KOJMVwRHHKwCeSSJ89DLAW
zZEl2nzgea2cGyZn2RZJvhmbm8YdFON3v++Vv34yovs1MMoCcD7FOPLLGWkD2gk5
6PK6lIlWEI/+vYKWhZdxgk6/PanBInXRUnbaBVj42US1XwfDLzPBEi9yyUUJQrht
CQhfTpHn4Z5MC/hJTlFat4Jlaajql4JBcQ1SS5LW59M+6gdlBK4tL6zFt10zvU2m
+kywOOIWiYLRRgFf6idGO45P5BuWOdmsXEaEg5KW7b6nJfGvgkd7WUYgCVOgtEY1
r4Mf4hAQUunDHGQ4e8eHk7XemFJsoBSweYCTQ2O0yr/QzO2M6QBi/BR9PzUajyH+
yShKEfrxXt4595BMH0nonSMpTKcE4Zxdj06LZSnGecEN8UUlx/n49uYwhFNdUkqM
s6Wz6kSR76YRlKUmYnNzP1gkY6nJZ1nR6z7SjmMkioxf3VxhT9SY8K587r6hRUlr
/NVA69iUhJy75VdttxZEmZzB03A7AjdudmZEisF0ImEmB1hxzYLHcDKJMTIj/r4/
f8OXCg5ACKhFlnx1SdBVRtA+6+5ab368Fs2rItJQ4dzdxRi6VVY=
=DXG7
-----END PGP SIGNATURE-----
Merge tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Fix raw data handling when perf events are used in bpf
- Rework how SIGTRAPs get delivered to events to address a bunch of
problems with it. Add a selftest for that too
* tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bpf: Fix sample_flags for bpf_perf_event_output
selftests/perf_events: Add a SIGTRAP stress test with disables
perf: Fix missing SIGTRAPs
- Fix for stage-2 invalidation holding the VM MMU lock
for too long by limiting the walk to the largest
block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmNIEAUPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDIXcP/AyWlCEJlmc1Jcd9rlaW1Wenr82U+StLVeMy
qP5P02gMWdbGExWIWEi4zkt+pAm7K2WRgXid9z5Vjw7kZY/+WwswTzKHWcQhVuZv
cBHfeOqgtoHVGR8NcwX6xcp406y3WRqYIsyAmbc5qmo75L8Ew1o3m+3eDfFtAq7l
3XuTCv+lQGNSGMhXHN2SVewZ+pCAo3XJmuHfCBXTqRjwqH4Tzh+54IKzo+9mqBWW
7yeIm5qcbIKGuXLuLL7XCf99gWy/3kQ0xQ1yJeXLAyiHswHqEISZXGHnKeATvD+6
RdbmQ9oRmIYfZfoDKZRUJg8TyTvW1rIKokFbe0q2iyuDnI5D/fAJ48epZaLw+kEf
PUzdB3UgPk19SLwgZKQddqY4wOD420ZD5x1TUFUQuLL7sjVv1vUILDvuCLWpq7F7
GyfSB+LEMgexHGsZ1wjslN/ivTbG+dQgaSS9mlV8/WDOLPtD2uOf65vYR3P28hAX
zOHrwm3e2+UV83BsEFEY2FQiiIBD24JmSecMbmAIHY09MCSZ+vJ/WbF4J1PcPP8C
3vjueIYTcjhzLtQrfIkGZcS7+wC9ji/RRmpJjbg79EpwrjhEs9G8h1+HyL9+zBZ4
Xn6X+ZG/cv0/ZYdin0ZRzJMvM0RutbsR77blVCLY97PBuLtBlqJDcxr+lmmjyIZ2
Db8Qd6uW
=IOxM
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.1, take #1
- Fix for stage-2 invalidation holding the VM MMU lock
for too long by limiting the walk to the largest
block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
Modify iter prog in existing bpf_iter_bpf_array_map.c, which currently
dumps arraymap key/val, to also do a write of (val, key) into a
newly-added hashmap. Confirm that the write succeeds as expected by
modifying the userspace runner program.
Before a change added in an earlier commit - considering PTR_TO_BUF reg
a valid input to helpers which expect MAP_{KEY,VAL} - the verifier
would've rejected this prog change due to type mismatch. Since using
current iter's key/val to access a separate map is a reasonable usecase,
let's add support for it.
Note that the test prog cannot directly write (val, key) into hashmap
via bpf_map_update_elem when both come from iter context because key is
marked MEM_RDONLY. This is due to bpf_map_update_elem - and other basic
map helpers - taking ARG_PTR_TO_MAP_{KEY,VALUE} w/o MEM_RDONLY type
flag. bpf_map_{lookup,update,delete}_elem don't modify their
input key/val so it should be possible to tag their args READONLY, but
due to the ubiquitous use of these helpers and verifier checks for
type == MAP_VALUE, such a change is nontrivial and seems better to
address in a followup series.
Also fixup some 'goto's in test runner's map checking loop.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221020160721.4030492-4-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test_ringbuf_map_key test prog, borrowing heavily from extant
test_ringbuf.c. The program tries to use the result of
bpf_ringbuf_reserve as map_key, which was not possible before previouis
commits in this series. The test runner added to prog_tests/ringbuf.c
verifies that the program loads and does basic sanity checks to confirm
that it runs as expected.
Also, refactor test_ringbuf such that runners for existing test_ringbuf
and newly-added test_ringbuf_map_key are subtests of 'ringbuf' top-level
test.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221020160721.4030492-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Those tests are currently failing on aarch64, ignore them until they are
individually addressed.
Using this deny list, vmtest.sh ran successfully using
LLVM_STRIP=llvm-strip-16 CLANG=clang-16 \
tools/testing/selftests/bpf/vmtest.sh -- \
./test_progs -d \
\"$(cat tools/testing/selftests/bpf/DENYLIST{,.aarch64} \
| cut -d'#' -f1 \
| sed -e 's/^[[:space:]]*//' \
-e 's/[[:space:]]*$//' \
| tr -s '\n' ','\
)\"
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221021210701.728135-5-chantr4@gmail.com
Add handling of aarch64 when setting QEMU options and provide the right
path to aarch64 kernel image.
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221021210701.728135-4-chantr4@gmail.com
config.aarch64, similarly to config.{s390x,x86_64} is a config enabling
building a kernel on aarch64 to be used in bpf's
selftests/kernel-patches CI.
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221021210701.728135-3-chantr4@gmail.com
`config.s390x` had entries already present in `config`.
When generating the config used by vmtest, we concatenate the `config`
file with the `config.{arch}` one, making those entries duplicated.
This patch removes that duplication.
Before:
$ comm -1 -2 <(sort tools/testing/selftests/bpf/config.s390x) <(sort
tools/testing/selftests/bpf/config)
CONFIG_MODULE_SIG=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
$
Ater:
$ comm -1 -2 <(sort tools/testing/selftests/bpf/config.s390x) <(sort
tools/testing/selftests/bpf/config)
$
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221021210701.728135-2-chantr4@gmail.com
BPF CI has revealed flakiness in the task_local_storage/exit_creds test.
The failure point in CI [1] is that null_ptr_count is equal to 0,
which indicates that the program hasn't run yet. This points to the
kern_sync_rcu (sys_membarrier -> synchronize_rcu underneath) not
waiting sufficiently.
Indeed, synchronize_rcu only waits for read-side sections that started
before the call. If the program execution starts *during* the
synchronize_rcu invocation (due to, say, preemption), the test won't
wait long enough.
As a speculative fix, make the synchornize_rcu calls in a loop until
an explicit run counter has gone up.
[1]: https://github.com/kernel-patches/bpf/actions/runs/3268263235/jobs/5374940791
Signed-off-by: Delyan Kratunov <delyank@meta.com>
Link: https://lore.kernel.org/r/156d4ef82275a074e8da8f4cffbd01b0c1466493.camel@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
lag_lib.sh creates the interfaces dummy1 and dummy2 whereas
dev_addr_lists.sh:destroy() deletes the interfaces dummy0 and dummy1. Fix
the mismatch in names.
Fixes: bbb774d921 ("net: Add tests for bonding and team address list management")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When exporting and running a subset of selftests via kselftest, files from
parts of the source tree which were not exported are not available. A few
tests are trying to source such files. Address the problem by using
symlinks.
The problem can be reproduced by running:
make -C tools/testing/selftests gen_tar TARGETS="drivers/net/bonding"
[... extract archive ...]
./run_kselftest.sh
or:
make kselftest KBUILD_OUTPUT=/tmp/kselftests TARGETS="drivers/net/bonding"
Fixes: bbb774d921 ("net: Add tests for bonding and team address list management")
Fixes: eccd0a80dc ("selftests: net: dsa: add a stress test for unlocked FDB operations")
Link: https://lore.kernel.org/netdev/40f04ded-0c86-8669-24b1-9a313ca21076@redhat.com/
Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit afef88e655 ("selftests/bpf: Store BPF object files with
.bpf.o extension"), we should use *.bpf.o instead of *.o.
In addition, use the BPF_FILE variable to save the BPF object file name,
which can be better identified and modified.
Fixes: afef88e655 ("selftests/bpf: Store BPF object files with .bpf.o extension")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1666235134-562-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, if the torture.sh allmodconfig step fails, this is counted as
an error (as it should be), but there is also an extraneous complaint
about a missing log file. This commit therefore adds that log file,
which is hoped to reduce confused reactions to the error report.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, torture.sh will compress the vmlinux files for KASAN and
KCSAN runs. But it will compress all of the files, including those
copied verbatim by the kvm-again.sh script. Compression takes around ten
minutes, so this is not a good thing. This commit therefore compresses
only one of a given set of identical vmlinux files, and then hard-links
it to the directories produced by kvm-again.sh.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This change enables to extend CFLAGS and LDFLAGS from command line, e.g.
to extend compiler checks: make USERCFLAGS=-Werror USERLDFLAGS=-static
USERCFLAGS and USERLDFLAGS are documented in
Documentation/kbuild/makefiles.rst and Documentation/kbuild/kbuild.rst
This should be backported (down to 5.10) to improve previous kernel
versions testing as well.
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220909103901.1503436-1-mic@digikod.net
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Current tests cover only shifts with an immediate as the source
operand/shift counts; add a new test case to cover register operand.
Signed-off-by: Jie Meng <jmeng@fb.com>
Link: https://lore.kernel.org/r/20221007202348.1118830-4-jmeng@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add non-mmapable data section to test_skeleton selftest and make sure it
really isn't mmapable by trying to mmap() it anyways.
Also make sure that libbpf doesn't report BPF_F_MMAPABLE flag to users.
Additional, some more manual testing was performed that this feature
works as intended.
Looking at created map through bpftool shows that flags passed to kernel are
indeed zero:
$ bpftool map show
...
1782: array name .data.non_mmapa flags 0x0
key 4B value 16B max_entries 1 memlock 4096B
btf_id 1169
pids test_progs(8311)
...
Checking BTF uploaded to kernel for this map shows that zero_key and
zero_value are indeed marked as static, even though zero_key is actually
original global (but STV_HIDDEN) variable:
$ bpftool btf dump id 1169
...
[51] VAR 'zero_key' type_id=2, linkage=static
[52] VAR 'zero_value' type_id=7, linkage=static
...
[62] DATASEC '.data.non_mmapable' size=16 vlen=2
type_id=51 offset=0 size=4 (VAR 'zero_key')
type_id=52 offset=4 size=12 (VAR 'zero_value')
...
And original BTF does have zero_key marked as linkage=global:
$ bpftool btf dump file test_skeleton.bpf.linked3.o
...
[51] VAR 'zero_key' type_id=2, linkage=global
[52] VAR 'zero_value' type_id=7, linkage=static
...
[62] DATASEC '.data.non_mmapable' size=16 vlen=2
type_id=51 offset=0 size=4 (VAR 'zero_key')
type_id=52 offset=4 size=12 (VAR 'zero_value')
Bpftool didn't require any changes at all because it checks whether internal
map is mmapable already, but just to double-check generated skeleton, we
see that .data.non_mmapable neither sets mmaped pointer nor has
a corresponding field in the skeleton:
$ grep non_mmapable test_skeleton.skel.h
struct bpf_map *data_non_mmapable;
s->maps[7].name = ".data.non_mmapable";
s->maps[7].map = &obj->maps.data_non_mmapable;
But .data.read_mostly has all of those things:
$ grep read_mostly test_skeleton.skel.h
struct bpf_map *data_read_mostly;
struct test_skeleton__data_read_mostly {
int read_mostly_var;
} *data_read_mostly;
s->maps[6].name = ".data.read_mostly";
s->maps[6].map = &obj->maps.data_read_mostly;
s->maps[6].mmaped = (void **)&obj->data_read_mostly;
_Static_assert(sizeof(s->data_read_mostly->read_mostly_var) == 4, "unexpected size of 'read_mostly_var'");
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221019002816.359650-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The only (forced) static test binary doesn't depend on libcap. Because
using -lcap on systems that don't have such static library would fail
(e.g. on Arch Linux), let's be more specific and require only dynamic
libcap linking.
Fixes: a52540522c ("selftests/landlock: Fix out-of-tree builds")
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20221019200536.2771316-1-mic@digikod.net
This change adds a brief summary of the BPF continuous integration (CI)
to the BPF selftest documentation. The summary focuses not so much on
actual workings of the CI, as it is maintained outside of the
repository, but aims to document the few bits of it that are sourced
from this repository and that developers may want to adjust as part of
patch submissions: the BPF kernel configuration and the deny list
file(s).
Changelog:
- v1->v2:
- use s390x instead of s390 for consistency
Signed-off-by: Daniel Müller <deso@posteo.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221018164015.1970862-1-deso@posteo.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This test runs a simple ingress tc setup between two veth pairs,
then adds a egress->ingress rule to test the chaining of tc ingress
pipeline to tc egress piepline.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test group address is added and removed in v2reportleave_test().
There is no need to delete it again during cleanup as it results in the
following error message:
# bash -x ./bridge_igmp.sh
[...]
+ cleanup
+ pre_cleanup
[...]
+ ip address del dev swp4 239.10.10.10/32
RTNETLINK answers: Cannot assign requested address
+ h2_destroy
Solve by removing the unnecessary address deletion.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qdiscs are added during setup, but not deleted during cleanup,
resulting in the following error messages:
# ./bridge_vlan_mcast.sh
[...]
# ./bridge_vlan_mcast.sh
Error: Exclusivity flag on, cannot modify.
Error: Exclusivity flag on, cannot modify.
Solve by deleting the qdiscs during cleanup.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All file descriptors that are truncatable need to have the Landlock
access rights set correctly on the file's Landlock security blob. This
is also the case for files that are opened by other means than
open(2).
Test coverage for security/landlock is 94.7% of 838 lines according to
gcc/gcov-11.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20221018182216.301684-10-gnoack3000@gmail.com
[mic: Add test coverage in commit message]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
A file descriptor created in a restricted process carries Landlock
restrictions with it which will apply even if the same opened file is
used from an unrestricted process.
This change extracts suitable FD-passing helpers from base_test.c and
moves them to common.h. We use the fixture variants from the ftruncate
fixture to exercise the same scenarios as in the open_and_ftruncate
test, but doing the Landlock restriction and open() in a different
process than the ftruncate() call.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20221018182216.301684-9-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
The checkpatch tool started to flag __attribute__(__unused__), which
we previously used. The header where this is normally defined is not
currently compatible with selftests.
This is the same approach as used in selftests/net/psock_lib.h.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20221018182216.301684-8-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
These tests exercise the following truncation operations:
* truncate() (truncate by path)
* ftruncate() (truncate by file descriptor)
* open with the O_TRUNC flag
* special case: creat(), which is open with O_CREAT|O_WRONLY|O_TRUNC.
in the following scenarios:
* Files with read, write and truncate rights.
* Files with read and truncate rights.
* Files with the truncate right.
* Files without the truncate right.
In particular, the following scenarios are enforced with the test:
* open() with O_TRUNC requires the truncate right, if it truncates a file.
open() already checks security_path_truncate() in this case,
and it required no additional check in the Landlock LSM's file_open hook.
* creat() requires the truncate right
when called with an existing filename.
* creat() does *not* require the truncate right
when it's creating a new file.
* ftruncate() requires that the file was opened by a thread that had
the truncate right for the file at the time of open(). (The rights
are carried along with the opened file.)
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20221018182216.301684-6-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Introduce the LANDLOCK_ACCESS_FS_TRUNCATE flag for file truncation.
This flag hooks into the path_truncate, file_truncate and
file_alloc_security LSM hooks and covers file truncation using
truncate(2), ftruncate(2), open(2) with O_TRUNC, as well as creat().
This change also increments the Landlock ABI version, updates
corresponding selftests, and updates code documentation to document
the flag.
In security/security.c, allocate security blobs at pointer-aligned
offsets. This fixes the problem where one LSM's security blob can
shift another LSM's security blob to an unaligned address (reported
by Nathan Chancellor).
The following operations are restricted:
open(2): requires the LANDLOCK_ACCESS_FS_TRUNCATE right if a file gets
implicitly truncated as part of the open() (e.g. using O_TRUNC).
Notable special cases:
* open(..., O_RDONLY|O_TRUNC) can truncate files as well in Linux
* open() with O_TRUNC does *not* need the TRUNCATE right when it
creates a new file.
truncate(2) (on a path): requires the LANDLOCK_ACCESS_FS_TRUNCATE
right.
ftruncate(2) (on a file): requires that the file had the TRUNCATE
right when it was previously opened. File descriptors acquired by
other means than open(2) (e.g. memfd_create(2)) continue to support
truncation with ftruncate(2).
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM)
Link: https://lore.kernel.org/r/20221018182216.301684-5-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY08JQQAKCRDbK58LschI
g0M0AQCWGrJcnQFut1qwR9efZUadwxtKGAgpaA/8Smd8+v7c8AD/SeHQuGfkFiD6
rx18hv1mTfG0HuPnFQy6YZQ98vmznwE=
=DaeS
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-10-18
We've added 33 non-merge commits during the last 14 day(s) which contain
a total of 31 files changed, 874 insertions(+), 538 deletions(-).
The main changes are:
1) Add RCU grace period chaining to BPF to wait for the completion
of access from both sleepable and non-sleepable BPF programs,
from Hou Tao & Paul E. McKenney.
2) Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values. In the wild we have seen OS vendors doing buggy backports
where helper call numbers mismatched. This is an attempt to make
backports more foolproof, from Andrii Nakryiko.
3) Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions,
from Roberto Sassu.
4) Fix libbpf's BTF dumper for structs with padding-only fields,
from Eduard Zingerman.
5) Fix various libbpf bugs which have been found from fuzzing with
malformed BPF object files, from Shung-Hsi Yu.
6) Clean up an unneeded check on existence of SSE2 in BPF x86-64 JIT,
from Jie Meng.
7) Fix various ASAN bugs in both libbpf and selftests when running
the BPF selftest suite on arm64, from Xu Kuohai.
8) Fix missing bpf_iter_vma_offset__destroy() call in BPF iter selftest
and use in-skeleton link pointer to remove an explicit bpf_link__destroy(),
from Jiri Olsa.
9) Fix BPF CI breakage by pointing to iptables-legacy instead of relying
on symlinked iptables which got upgraded to iptables-nft,
from Martin KaFai Lau.
10) Minor BPF selftest improvements all over the place, from various others.
* tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (33 commits)
bpf/docs: Update README for most recent vmtest.sh
bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
bpf: Use rcu_trace_implies_rcu_gp() in local storage map
bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
rcu-tasks: Provide rcu_trace_implies_rcu_gp()
selftests/bpf: Use sys_pidfd_open() helper when possible
libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
libbpf: Deal with section with no data gracefully
libbpf: Use elf_getshdrnum() instead of e_shnum
selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c
selftests/bpf: Fix error failure of case test_xdp_adjust_tail_grow
selftest/bpf: Fix memory leak in kprobe_multi_test
selftests/bpf: Fix memory leak caused by not destroying skeleton
libbpf: Fix memory leak in parse_usdt_arg()
libbpf: Fix use-after-free in btf_dump_name_dups
selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test
selftests/bpf: Alphabetize DENYLISTs
selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id()
libbpf: Introduce bpf_link_get_fd_by_id_opts()
libbpf: Introduce bpf_btf_get_fd_by_id_opts()
...
====================
Link: https://lore.kernel.org/r/20221018210631.11211-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit causes torture.sh to use the new --bootargs and --datestamp
parameters to kvm-again.sh in order to avoid redundant kernel builds
during rcuscale and refscale testing. This trims the better part of an
hour off of torture.sh runs that use --do-kasan.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds a --datestamp parameter to kvm-again.sh, which, in
contrast to the existing --rundir argument, specifies only the last
segments of the pathname. This addition enables torture.sh to use
kvm-again.sh in order to avoid redundant kernel builds.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
As it should, the kvm-recheck.sh script sets the TORTURE_SUITE bash
variable based on the type of rcutorture test being run. However,
it does not export it. Which is OK, at least until you try running
kvm-again.sh on either a rcuscale or a refscale test, at which point you
get false-positive "no success message, N successful version messages"
errors. This commit therefore causes the kvm-recheck.sh script to export
TORTURE_SUITE, suppressing these false positives.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The kvm-again.sh script, when running locally, can place the QEMU output
into kvm-test-1-run-qemu.sh.out instead of kvm-test-1-run.sh.out. This
commit therefore makes kvm-test-1-run-qemu.sh check both locations.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit drags the rcutorture scripting kicking and screaming into the
twenty-first century by making use of the BSD-derived mktemp command to
create temporary files and directories. In happy contrast to many of its
ill-behaved predecessors, mktemp seems to actually work reasonably reliably!
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The kvm-again.sh script can be used to repeat short boot-time tests,
but the kernel boot arguments cannot be changed. This means that every
change in kernel boot arguments currently necessitates a kernel build,
which greatly increases the duration of kernel-boot testing.
This commit therefore adds a --bootargs parameter to kvm-again.sh,
which allows a given kernel to be repeatedly booted, but overriding
old and adding new kernel boot parameters. This allows an old kernel
to be booted with new kernel boot parameters, avoiding the overhead of
rebuilding the kernel under test.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
commit 95c104c378 ("tracing: Auto generate event name when creating a
group of events") changed the syntax in the ftrace README file which is
used by the selftests to check what features are support. Adjust the
string to make test_duplicates.tc and trigger-synthetic-eprobe.tc work
again.
Fixes: 95c104c378 ("tracing: Auto generate event name when creating a group of events")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Remove the redundant warning information of online_all_offline_memory()
since there is a warning in online_memory_expect_success().
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Handle the scenario where the build is launched with the ARCH envvar
defined as x86_64.
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Handle the scenario where the build is launched with the ARCH envvar
defined as x86_64.
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Don't use the test-specific header files as source files to force a
target dependency, as clang will complain if more than one source file
is used for a compile command with a single '-o' flag.
Use the proper Makefile variables instead as defined in
tools/testing/selftests/lib.mk.
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since commit 40b09653b1 ("selftests/bpf: Adjust vmtest.sh to use local
kernel configuration") the vmtest.sh script no longer downloads a kernel
configuration but uses the local, in-repository one.
This change updates the README, which still mentions the old behavior.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221017232458.1272762-1-deso@posteo.net
Add a SIGTRAP stress test that exercises repeatedly enabling/disabling
an event while it concurrently keeps firing.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/Y0E3uG7jOywn7vy3@elver.google.com/
refcounting errors in ZONE_DEVICE pages.
- Peter Xu fixes some userfaultfd test harness instability.
- Various other patches in MM, mainly fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0j6igAKCRDdBJ7gKXxA
jnGxAP99bV39ZtOsoY4OHdZlWU16BUjKuf/cb3bZlC2G849vEwD+OKlij86SG20j
MGJQ6TfULJ8f1dnQDd6wvDfl3FMl7Qc=
=tbdp
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- fix a race which causes page refcounting errors in ZONE_DEVICE pages
(Alistair Popple)
- fix userfaultfd test harness instability (Peter Xu)
- various other patches in MM, mainly fixes
* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
highmem: fix kmap_to_page() for kmap_local_page() addresses
mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
mm/selftest: uffd: explain the write missing fault check
mm/hugetlb: use hugetlb_pte_stable in migration race check
mm/hugetlb: fix race condition of uffd missing/minor handling
zram: always expose rw_page
LoongArch: update local TLB if PTE entry exists
mm: use update_mmu_tlb() on the second thread
kasan: fix array-bounds warnings in tests
hmm-tests: add test for migrate_device_range()
nouveau/dmem: evict device private memory during release
nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
mm/migrate_device.c: add migrate_device_range()
mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
mm/memremap.c: take a pgmap reference on page allocation
mm: free device private pages have zero refcount
mm/memory.c: fix race when faulting a device private page
mm/damon: use damon_sz_region() in appropriate place
mm/damon: move sz_damon_region to damon_sz_region
lib/test_meminit: add checks for the allocation functions
...
SYS_pidfd_open may be undefined for old glibc, so using sys_pidfd_open()
helper defined in task_local_storage_helpers.h instead to fix potential
build failure.
And according to commit 7615d9e178 ("arch: wire-up pidfd_open()"), the
syscall number of pidfd_open is always 434 except for alpha architure,
so update the definition of __NR_pidfd_open accordingly.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221011071249.3471760-1-houtao@huaweicloud.com
xdp_adjust_tail.c calls ASSERT_OK() to check the return value of
bpf_prog_test_load(), but the condition is not correct. Fix it.
Fixes: 791cad0250 ("bpf: selftests: Get rid of CHECK macro in xdp_adjust_tail.c")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-7-xukuohai@huaweicloud.com
test_xdp_adjust_tail_grow failed with ipv6:
test_xdp_adjust_tail_grow:FAIL:ipv6 unexpected error: -28 (errno 28)
The reason is that this test case tests ipv4 before ipv6, and when ipv4
test finished, topts.data_size_out was set to 54, which is smaller than the
ipv6 output data size 114, so ipv6 test fails with NOSPC error.
Fix it by reset topts.data_size_out to sizeof(buf) before testing ipv6.
Fixes: 04fcb5f9a1 ("selftests/bpf: Migrate from bpf_prog_test_run")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-6-xukuohai@huaweicloud.com
The get_syms() function in kprobe_multi_test.c does not free the string
memory allocated by sscanf correctly. Fix it.
Fixes: 5b6c7e5c44 ("selftests/bpf: Add attach bench test")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-5-xukuohai@huaweicloud.com
Some test cases does not destroy skeleton object correctly, causing ASAN
to report memory leak warning. Fix it.
Fixes: 0ef6740e97 ("selftests/bpf: Add tests for kptr_ref refcounting")
Fixes: 1642a3945e ("selftests/bpf: Add struct argument tests with fentry/fexit programs.")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20221011120108.782373-4-xukuohai@huaweicloud.com
Current release - regressions:
- Revert "net/sched: taprio: make qdisc_leaf() see
the per-netdev-queue pfifo child qdiscs", it may cause crashes
when the qdisc is reconfigured
- inet: ping: fix splat due to packet allocation refactoring in inet
- tcp: clean up kernel listener's reqsk in inet_twsk_purge(),
fix UAF due to races when per-netns hash table is used
Current release - new code bugs:
- eth: adin1110: check in netdev_event that netdev belongs to driver
- fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co.
Previous releases - regressions:
- ipv4: handle attempt to delete multipath route when fib_info
contains an nh reference, avoid oob access
- wifi: fix handful of bugs in the new Multi-BSSID code
- wifi: mt76: fix rate reporting / throughput regression on mt7915
and newer, fix checksum offload
- wifi: iwlwifi: mvm: fix double list_add at
iwl_mvm_mac_wake_tx_queue (other cases)
- wifi: mac80211: do not drop packets smaller than the LLC-SNAP
header on fast-rx
Previous releases - always broken:
- ieee802154: don't warn zero-sized raw_sendmsg()
- ipv6: ping: fix wrong checksum for large frames
- mctp: prevent double key removal and unref
- tcp/udp: fix memory leaks and races around IPV6_ADDRFORM
- hv_netvsc: fix race between VF offering and VF association message
Misc:
- remove -Warray-bounds silencing in the drivers, compilers fixed
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmNISaMACgkQMUZtbf5S
IruEARAArjYZbOEGkUVqtcEbnV0vmxQ5GsVyvurDkmzUULJ1rVAITtG7BbxcPyZ7
tJf5BPmmpXxEXh/lZBIlgHLOGgf/cx4gkCH9Jz6LYlSoTpTZiTqxlOfAZNeei0FI
PD95Slvd3TnIOEysv5RH/pQzIoKdd6+YqOhVITbwCW36cCLaUm+r7JUhzDrnHMNE
KCcsOX9DDtW7MDJrJj/E0wlWeWcudpHY4DLG2A723X6Esu+8k6krK32XtkrFIKqa
PFxeU1NPgMkn4S2xRPKqy+W3dTMfMKB4WWBMMUzEU220MIxV4l/RZSrnI5nrnLh2
uXyUefpx+lD92D5BOiqUw8rK7B4Jq0uUrawuCf+70tbO1f13ThkkAlV6cEzrlnZY
tGQxs0ayFIDVypU1tpY9cemUiYXrnPpCkpz+V1G0us8L323eCHxjz/f5TUlb51Na
BVFvRqvxkjztprBv2LrH2SmnVtcH2kvQG8qMYmXRchBM+11rivz6BrPdE0V+muMg
Hjr6HefYMBpSgcD+ADVFr8a/OB/W7AuWpTBd3z/WyNQ5MxkFX9Kf2Lt2+j8SRfpE
ELO0AANFQZ1Gyp6LTbEkA3mFs1LhNNQyfjHcMHC16ZExHmV3i37BE9LJdnFM27N8
R8lIm4YDs6Jj6YIUDy2wExgUAgkUk7mfZNCMPNi2nSsdJksyAsc=
=AyqG
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, and wifi.
Current release - regressions:
- Revert "net/sched: taprio: make qdisc_leaf() see the
per-netdev-queue pfifo child qdiscs", it may cause crashes when the
qdisc is reconfigured
- inet: ping: fix splat due to packet allocation refactoring in inet
- tcp: clean up kernel listener's reqsk in inet_twsk_purge(), fix UAF
due to races when per-netns hash table is used
Current release - new code bugs:
- eth: adin1110: check in netdev_event that netdev belongs to driver
- fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co.
Previous releases - regressions:
- ipv4: handle attempt to delete multipath route when fib_info
contains an nh reference, avoid oob access
- wifi: fix handful of bugs in the new Multi-BSSID code
- wifi: mt76: fix rate reporting / throughput regression on mt7915
and newer, fix checksum offload
- wifi: iwlwifi: mvm: fix double list_add at
iwl_mvm_mac_wake_tx_queue (other cases)
- wifi: mac80211: do not drop packets smaller than the LLC-SNAP
header on fast-rx
Previous releases - always broken:
- ieee802154: don't warn zero-sized raw_sendmsg()
- ipv6: ping: fix wrong checksum for large frames
- mctp: prevent double key removal and unref
- tcp/udp: fix memory leaks and races around IPV6_ADDRFORM
- hv_netvsc: fix race between VF offering and VF association message
Misc:
- remove -Warray-bounds silencing in the drivers, compilers fixed"
* tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
sunhme: fix an IS_ERR() vs NULL check in probe
net: marvell: prestera: fix a couple NULL vs IS_ERR() checks
kcm: avoid potential race in kcm_tx_work
tcp: Clean up kernel listener's reqsk in inet_twsk_purge()
net: phy: micrel: Fixes FIELD_GET assertion
openvswitch: add nf_ct_is_confirmed check before assigning the helper
tcp: Fix data races around icsk->icsk_af_ops.
ipv6: Fix data races around sk->sk_prot.
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
tcp/udp: Fix memory leak in ipv6_renew_options().
mctp: prevent double key removal and unref
selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
netfilter: rpfilter/fib: Populate flowic_l3mdev field
selftests: netfilter: Test reverse path filtering
net/mlx5: Make ASO poll CQ usable in atomic context
tcp: cdg: allow tcp_cdg_release() to be called multiple times
inet: ping: fix recent breakage
ipv6: ping: fix wrong checksum for large frames
net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
...
In commit 1bfe26fb08 ("bpf: Add verifier support for custom callback
return range"), the verifier was updated to require callbacks to BPF
helpers to explicitly specify the range of values that can be returned.
bpf_user_ringbuf_drain() was merged after this in commit 2057156738
("bpf: Add bpf_user_ringbuf_drain() helper"), and this change in default
behavior was missed. This patch updates the BPF_MAP_TYPE_USER_RINGBUF
selftests to also return 1 from a bpf_user_ringbuf_drain() callback so
as to properly test this going forward.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221012232015.1510043-3-void@manifault.com
The recent vm image in CI has reported error in selftests that use
the iptables command. Manu Bretelle has pointed out the difference
in the recent vm image that the iptables is sym-linked to the iptables-nft.
With this knowledge, I can also reproduce the CI error by manually running
with the 'iptables-nft'.
This patch is to replace the iptables command with iptables-legacy
to unblock the CI tests.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221012221235.3529719-1-martin.lau@linux.dev
It's required by vm_userspace_mem_region_add() that memory size
should be aligned to host page size. However, one guest page is
provided by memslot_modification_stress_test. It triggers failure
in the scenario of 64KB-page-size-host and 4KB-page-size-guest,
as the following messages indicate.
# ./memslot_modification_stress_test
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
guest physical test memory: [0xffbfff0000, 0xffffff0000)
Finished creating vCPUs
Started all vCPUs
==== Test Assertion Failure ====
lib/kvm_util.c:824: vm_adjust_num_guest_pages(vm->mode, npages) == npages
pid=5712 tid=5712 errno=0 - Success
1 0x0000000000404eeb: vm_userspace_mem_region_add at kvm_util.c:822
2 0x0000000000401a5b: add_remove_memslot at memslot_modification_stress_test.c:82
3 (inlined by) run_test at memslot_modification_stress_test.c:110
4 0x0000000000402417: for_each_guest_mode at guest_modes.c:100
5 0x00000000004016a7: main at memslot_modification_stress_test.c:187
6 0x0000ffffb8cd4383: ?? ??:0
7 0x0000000000401827: _start at :?
Number of guest pages is not compatible with the host. Try npages=16
Fix the issue by providing 16 guest pages to the memory slot for this
particular combination of 64KB-page-size-host and 4KB-page-size-guest
on aarch64.
Fixes: ef4c9f4f65 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221013063020.201856-1-gshan@redhat.com
It's not obvious why we had a write check for each of the missing
messages, especially when it should be a locking op. Add a rich comment
for that, and also try to explain its good side and limitations, so that
if someone hit it again for either a bug or a different glibc impl
there'll be some clue to start with.
Link: https://lkml.kernel.org/r/20221004193400.110155-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/a73cf109de0224cfd118d22be58ddebac3ae2897.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This second KUnit update for Linux 6.1-rc1 consists of features and
fixes:
- simplifying resource use.
- make kunit_malloc() and kunit_free() allocations and frees consistent.
kunit_free() frees only the memory allocated by kunit_malloc().
- stop downloading risc-v opensbi binaries using wget.
- other fixes and improvements to tool and KUnit framework.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmNG/4EACgkQCwJExA0N
QxxznhAAqtXbCYxIxerdiAHwYifnsrLcCMm/Ol2yuFJhmTn6sZh7w4S8bRBt0RlX
+1IfqtzOi1K1fTpmWQqnq0/fH8gNZrhZHHqXxx3c353pG0BfrC3vODx1VzxuPCMi
nr/OHqAQ0VSTuxgWxsIr0SuhOM4LFDjhBcLDoCDoBF5aQSJricpa++ixiYsVgaUt
nG+E1i7I/hvEYwqqUqtJLp9fOD6LK2IeiOP4oH2PwYBIpFO+BXwk0Gbs/ISL+fRP
F8pph2Qm2jxCJ4kRDvs/N41mkIvG9PwC1h7fW4vDXix0zryJdh0TbilFQFFwiuW3
S8kFE1tarMBWyqEZU/2cln9MFdZpxXAWtJu1/B8dqOvLA06mBOaNbB4tOXzfyriE
QBOnEJNqgT0wqnwWONvrljz7L+YaFAkJAGxbub1cGIUa/t5HHs0WX5XncctGfsaE
Ec6bLOXMgemb3dm35fDpBHyN6np9K5BMmz8Ggv02+V8FH8nrXAzblOW/CN8KgXiG
R5+1vd3SxaLq7npal4S88LmNRoJCVCSWnNPItBTgWFXy6Ni2T5WEoi6rSdqJNX+/
bpPM4G47IO5BH0YEbl9IPvKLfDGczVB4TVLpIt61QST4rf+puUhysr76ZweqoU6f
sOyEenr3YZ7C3EpSbcAztzgyPomPAacR/lNbG5lezcEPRSo184I=
=FgDN
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more KUnit updates from Shuah Khan:
"Features and fixes:
- simplify resource use
- make kunit_malloc() and kunit_free() allocations and frees
consistent. kunit_free() frees only the memory allocated by
kunit_malloc()
- stop downloading risc-v opensbi binaries using wget
- other fixes and improvements to tool and KUnit framework"
* tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: Update description of --alltests option
kunit: declare kunit_assert structs as const
kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED
kunit: remove format func from struct kunit_assert, get it to 0 bytes
kunit: tool: Don't download risc-v opensbi firmware with wget
kunit: make kunit_kfree(NULL) a no-op to match kfree()
kunit: make kunit_kfree() not segfault on invalid inputs
kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends
kunit: drop test pointer in string_stream_fragment
kunit: string-stream: Simplify resource use
This second Kselftest update for Linux 6.1-rc1 consists of fixes
and improvements to memory-hotplug test and a minor spelling fix
to ftrace test.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmNG4hgACgkQCwJExA0N
QxwkPBAAyPd0ZUHlF7JjzdV2obHDGxbjMzi0x8Di8md4B24gE0PvGY79E7eM/uKd
pBsop5cnvwGBZGuoBM0E/1J7UB/Lgedl2iYDUFXQe8JoPlOgvmBMbJCdZ3Zv8gxp
sk5yIrLakgyp2WZng0QyQwZQY4nvq8Lf/f50T8/3+g8OBqF+xTo60DyEpsaDNHS4
3SddH8/jJ6TkG/5lRoEOlfYFrhCDuxq1e8R0jts1vgnpdhpSD9JZPr26VNGVcygB
dkp4icsQFWAaZjNO6+7scgp1yfxBFJ2Fh/gDdfWqEAYvZtvnnr2XhwlYK+O7JZRp
DuglF4Lo/AN3betWuAz4rWyqAYoBZxrUTxrsIVyzb3FqpRAlR32YPFfMo6iWYYn4
638E6cYvkNbbbhCEEgHJJiFZzUB/xbLR/Y8gD4Que/Y+Ck7+zuvQMzZWHQNJfsGx
OhhfUcJlw/VzRpdZx1UToT++DqOqJLBL7DVMATbiXd2rDGKbnEw2pKkeuURXVged
1nis9odge5yY42Q5I3doyPHO7rENOAP2wmlKvJqFDKZFoD23MsGv/m6gpg/HaS1Q
T27L1hHFXPrAZ14MxGva1DTVTPU8D/ciHcqjCWWmnHp8M359JvjOsDL7PyMUcGlm
bVSqcciy71utp1XaFfaF7kT1bwnNeGgGqs0EXcmj4xE8CyOtat8=
=GXpd
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more Kselftest updates from Shuah Khan:
"This consists of fixes and improvements to memory-hotplug test and a
minor spelling fix to ftrace test"
* tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
docs: notifier-error-inject: Correct test's name
selftests/memory-hotplug: Adjust log info for maintainability
selftests/memory-hotplug: Restore memory before exit
selftests/memory-hotplug: Add checking after online or offline
selftests/ftrace: func_event_triggers: fix typo in user message
- Valentin Schneider makes crash-kexec work properly when invoked from
an NMI-time panic.
- ntfs bugfixes from Hawkins Jiawei
- Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
percpu counters.
- nilfs2 cleanups from Minghao Chi
- lots of other single patches all over the tree!
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
=lUQY
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
setting and thus defeats the fix from bbe4c0896d ("selftests:
netfilter: disable rp_filter on router"). Unset it as well to cover that
case.
Fixes: bbe4c0896d ("selftests: netfilter: disable rp_filter on router")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Test reverse path (filter) matches in iptables, ip6tables and nftables.
Both with a regular interface and a VRF.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
The DENYLIST and DENYLIST.s390x files are used to specify testcases
which should not be run on CI. Currently, testcases are appended to the
end of these files as needed. This can make it a pain to resolve merge
conflicts. This patch alphabetizes the DENYLIST files to ease this
burden.
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20221011165255.774014-1-void@manifault.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
* Added verification that memblock allocations zero the allocated memory
* Added more test cases for memblock_add(), memblock_remove(),
memblock_reserve() and memblock_free()
* Added tests for memblock_*_raw() family
* Added tests for NUMA-aware allocations in memblock_alloc_try_nid() and
memblock_alloc_try_nid_raw()
-----BEGIN PGP SIGNATURE-----
iQFMBAABCAA2FiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmNFRd8YHG1pa2UucmFw
b3BvcnRAZ21haWwuY29tAAoJEDkDhibLDv2RD7AH/2aaJqtGakZGPtx1IBULg/AE
P3/b0OERUjSCzyU3gH8rZ2zG9AsV6Lq/NKeq7B6zdrTJrLs+IlLaGGnbBoO3sEZn
KWJWVQlVTyFdT8bdM0BGAAtQvehE6Km1IEoH5c7ColdcBeZKi98MuXY1BEY0KjgJ
76Z+sWOfC+SKV2Yp8A+7anSmD95jZwU4ogr2rHjAgdXAqPQgyUYNvAeawKLapnpQ
XjvfWwsEC54PVCw7S1k83Q1VDQ7Vo/b8SMP9/XphcVpwt/6rq3B2uJosRE91ViSe
G8M6moNbYtj4n06nNvRNN5zq9igkbjgONCQnBj2D/r6s3SF8Tb3hcznxzORvdK4=
=U6/l
-----END PGP SIGNATURE-----
Merge tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
"Test suite improvements:
- Added verification that memblock allocations zero the allocated
memory
- Added more test cases for memblock_add(), memblock_remove(),
memblock_reserve() and memblock_free()
- Added tests for memblock_*_raw() family
- Added tests for NUMA-aware allocations in memblock_alloc_try_nid()
and memblock_alloc_try_nid_raw()"
* tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: add generic NUMA tests for memblock_alloc_try_nid*
memblock tests: add bottom-up NUMA tests for memblock_alloc_try_nid*
memblock tests: add top-down NUMA tests for memblock_alloc_try_nid*
memblock tests: add simulation of physical memory with multiple NUMA nodes
memblock_tests: move variable declarations to single block
memblock tests: remove 'cleared' from comment blocks
memblock tests: add tests for memblock_trim_memory
memblock tests: add tests for memblock_*bottom_up functions
memblock tests: update alloc_nid_api to test memblock_alloc_try_nid_raw
memblock tests: update alloc_api to test memblock_alloc_raw
memblock tests: add additional tests for basic api and memblock_alloc
memblock tests: add labels to verbose output for generic alloc tests
memblock tests: update zeroed memory check for memblock_alloc_* tests
memblock tests: update tests to check if memblock_alloc zeroed memory
memblock tests: update reference to obsolete build option in comments
memblock tests: add command line help option
* Fixes for single-stepping in the presence of an async
exception as well as the preservation of PSTATE.SS
* Better handling of AArch32 ID registers on AArch64-only
systems
* Fixes for the dirty-ring API, allowing it to work on
architectures with relaxed memory ordering
* Advertise the new kvmarm mailing list
* Various minor cleanups and spelling fixes
RISC-V:
* Improved instruction encoding infrastructure for
instructions not yet supported by binutils
* Svinval support for both KVM Host and KVM Guest
* Zihintpause support for KVM Guest
* Zicbom support for KVM Guest
* Record number of signal exits as a VCPU stat
* Use generic guest entry infrastructure
x86:
* Misc PMU fixes and cleanups.
* selftests: fixes for Hyper-V hypercall
* selftests: fix nx_huge_pages_test on TDP-disabled hosts
* selftests: cleanups for fix_hypercall_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM7OcMUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPAFgf/Rqc9hrXZVdbh2OZ+gScSsFsPK1zO
DISUksLcXaYVYYsvQAEg/N2BPz3XbmO4jA+z8bIUrYTA7fC98we2C4jfR+EaX/fO
+/Kzf0lAgu/nQZyFzUya+1jRsZqvVbC/HmDCI2kzN4u78e/LZ7NVcMijdV/ly6ib
cq0b0LLqJHe/fcpJ806JZP3p5sndQhDmlUkZ2AWZf6CUKSEFcufbbYkt+84ZK4PL
N9mEqXYQ3DXClLQmIBv+NZhtGlmADkWDE4BNouw8dVxhaXH7Hw/jfBHdb6SSHMRe
tQ6Src1j8AYOhf5J35SMudgkbGcMelm0yeZ7Sizk+5Ft0EmdbZsnkvsGdQ==
=4RA+
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"The main batch of ARM + RISC-V changes, and a few fixes and cleanups
for x86 (PMU virtualization and selftests).
ARM:
- Fixes for single-stepping in the presence of an async exception as
well as the preservation of PSTATE.SS
- Better handling of AArch32 ID registers on AArch64-only systems
- Fixes for the dirty-ring API, allowing it to work on architectures
with relaxed memory ordering
- Advertise the new kvmarm mailing list
- Various minor cleanups and spelling fixes
RISC-V:
- Improved instruction encoding infrastructure for instructions not
yet supported by binutils
- Svinval support for both KVM Host and KVM Guest
- Zihintpause support for KVM Guest
- Zicbom support for KVM Guest
- Record number of signal exits as a VCPU stat
- Use generic guest entry infrastructure
x86:
- Misc PMU fixes and cleanups.
- selftests: fixes for Hyper-V hypercall
- selftests: fix nx_huge_pages_test on TDP-disabled hosts
- selftests: cleanups for fix_hypercall_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (57 commits)
riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
RISC-V: KVM: Use generic guest entry infrastructure
RISC-V: KVM: Record number of signal exits as a vCPU stat
RISC-V: KVM: add __init annotation to riscv_kvm_init()
RISC-V: KVM: Expose Zicbom to the guest
RISC-V: KVM: Provide UAPI for Zicbom block size
RISC-V: KVM: Make ISA ext mappings explicit
RISC-V: KVM: Allow Guest use Zihintpause extension
RISC-V: KVM: Allow Guest use Svinval extension
RISC-V: KVM: Use Svinval for local TLB maintenance when available
RISC-V: Probe Svinval extension form ISA string
RISC-V: KVM: Change the SBI specification version to v1.0
riscv: KVM: Apply insn-def to hlv encodings
riscv: KVM: Apply insn-def to hfence encodings
riscv: Introduce support for defining instructions
riscv: Add X register names to gpr-nums
KVM: arm64: Advertise new kvmarm mailing list
kvm: vmx: keep constant definition format consistent
kvm: mmu: fix typos in struct kvm_arch
KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hosts
...
Create process without mappings and check
/proc/*/maps
/proc/*/numa_maps
/proc/*/smaps
/proc/*/smaps_rollup
They must be empty (excluding vsyscall page) or full of zeroes.
Retroactively this test should've caught embarassing /proc/*/smaps_rollup
oops:
[17752.703567] BUG: kernel NULL pointer dereference, address: 0000000000000000
[17752.703580] #PF: supervisor read access in kernel mode
[17752.703583] #PF: error_code(0x0000) - not-present page
[17752.703587] PGD 0 P4D 0
[17752.703593] Oops: 0000 [#1] PREEMPT SMP PTI
[17752.703598] CPU: 0 PID: 60649 Comm: cat Tainted: G W 5.19.9-100.fc35.x86_64 #1
[17752.703603] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme6/3.1, BIOS P3.30 08/05/2016
[17752.703607] RIP: 0010:show_smaps_rollup+0x159/0x2e0
Note 1:
ProtectionKey field in /proc/*/smaps is optional,
so check most of its contents, not everything.
Note 2:
due to the nature of this test, child process hardly can signal
its readiness (after unmapping everything!) to parent.
I feel like "sleep(1)" is justified.
If you know how to do it without sleep please tell me.
Note 3:
/proc/*/statm is not tested but can be.
Link: https://lkml.kernel.org/r/Yz3liL6Dn+n2SD8Q@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
linux-next for a couple of months without, to my knowledge, any negative
reports (or any positive ones, come to that).
- Also the Maple Tree from Liam R. Howlett. An overlapping range-based
tree for vmas. It it apparently slight more efficient in its own right,
but is mainly targeted at enabling work to reduce mmap_lock contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
(https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
This has yet to be addressed due to Liam's unfortunately timed
vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down to
the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
=xfWx
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
Introduce the data_input map, write-protected with a small eBPF program
implementing the lsm/bpf_map hook.
Then, ensure that bpf_map_get_fd_by_id() and bpf_map_get_fd_by_id_opts()
with NULL opts don't succeed due to requesting read-write access to the
write-protected map. Also, ensure that bpf_map_get_fd_by_id_opts() with
open_flags in opts set to BPF_F_RDONLY instead succeeds.
After obtaining a read-only fd, ensure that only map lookup succeeds and
not update. Ensure that update works only with the read-write fd obtained
at program loading time, when the write protection was not yet enabled.
Finally, ensure that the other _opts variants of bpf_*_get_fd_by_id() don't
work if the BPF_F_RDONLY flag is set in opts (due to the kernel not
handling the open_flags member of bpf_attr).
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221006110736.84253-7-roberto.sassu@huaweicloud.com
-----BEGIN PGP SIGNATURE-----
iIgEABYKADAWIQRE6pSOnaBC00OEHEIaerohdGur0gUCYzylIxIcamFya2tvQGtl
cm5lbC5vcmcACgkQGnq6IXRrq9IDBwEAmoMCHzq2JseDBj21H5iLXrB2G5Vl80a9
UW363r09ht4A/RnvCIdFcaYYdawhQbcBWkRSYezDOPu6hopwrElb9+ID
=l5my
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"Just a few bug fixes this time"
* tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle
security/keys: Remove inconsistent __user annotation
char: move from strlcpy with unused retval to strscpy
Major changes:
- Changed location of tracing repo from personal git repo to:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
- Added Masami Hiramatsu as co-maintainer
- Updated MAINTAINERS file to separate out FTRACE as it is
more than just TRACING.
Minor changes:
- Added Mark Rutland as FTRACE reviewer
- Updated user_events to make it on its way to remove the BROKEN tag.
The changes should now be acceptable but will run it through
a cycle and hopefully we can remove the BROKEN tag next release.
- Added filtering to eprobes
- Added a delta time to the benchmark trace event
- Have the histogram and filter callbacks called via a switch
statement instead of indirect functions. This speeds it up to
avoid retpolines.
- Add a way to wake up ring buffer waiters waiting for the
ring buffer to fill up to its watermark.
- New ioctl() on the trace_pipe_raw file to wake up ring buffer
waiters.
- Wake up waiters when the ring buffer is disabled.
A reader may block when the ring buffer is disabled,
but if it was blocked when the ring buffer is disabled
it should then wake up.
Fixes:
- Allow splice to read partially read ring buffer pages
Fixes splice never moving forward.
- Fix inverted compare that made the "shortest" ring buffer
wait queue actually the longest.
- Fix a race in the ring buffer between resetting a page when
a writer goes to another page, and the reader.
- Fix ftrace accounting bug when function hooks are added at
boot up before the weak functions are set to "disabled".
- Fix bug that freed a user allocated snapshot buffer when
enabling a tracer.
- Fix possible recursive locks in osnoise tracer
- Fix recursive locking direct functions
- And other minor clean ups and fixes
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYz70cxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qpLKAP4+yOje7ZY/b3R4tTx0EIWiKdhqPx6t
Nvam2+WR2PN3QQEAqiK2A+oIbh3Zjp1MyhQWuulssWKtSTXhIQkbs7ioYAc=
=MsQw
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"Major changes:
- Changed location of tracing repo from personal git repo to:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
- Added Masami Hiramatsu as co-maintainer
- Updated MAINTAINERS file to separate out FTRACE as it is more than
just TRACING.
Minor changes:
- Added Mark Rutland as FTRACE reviewer
- Updated user_events to make it on its way to remove the BROKEN tag.
The changes should now be acceptable but will run it through a
cycle and hopefully we can remove the BROKEN tag next release.
- Added filtering to eprobes
- Added a delta time to the benchmark trace event
- Have the histogram and filter callbacks called via a switch
statement instead of indirect functions. This speeds it up to avoid
retpolines.
- Add a way to wake up ring buffer waiters waiting for the ring
buffer to fill up to its watermark.
- New ioctl() on the trace_pipe_raw file to wake up ring buffer
waiters.
- Wake up waiters when the ring buffer is disabled. A reader may
block when the ring buffer is disabled, but if it was blocked when
the ring buffer is disabled it should then wake up.
Fixes:
- Allow splice to read partially read ring buffer pages. This fixes
splice never moving forward.
- Fix inverted compare that made the "shortest" ring buffer wait
queue actually the longest.
- Fix a race in the ring buffer between resetting a page when a
writer goes to another page, and the reader.
- Fix ftrace accounting bug when function hooks are added at boot up
before the weak functions are set to "disabled".
- Fix bug that freed a user allocated snapshot buffer when enabling a
tracer.
- Fix possible recursive locks in osnoise tracer
- Fix recursive locking direct functions
- Other minor clean ups and fixes"
* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
ftrace: Create separate entry in MAINTAINERS for function hooks
tracing: Update MAINTAINERS to reflect new tracing git repo
tracing: Do not free snapshot if tracer is on cmdline
ftrace: Still disable enabled records marked as disabled
tracing/user_events: Move pages/locks into groups to prepare for namespaces
tracing: Add Masami Hiramatsu as co-maintainer
tracing: Remove unused variable 'dups'
MAINTAINERS: add myself as a tracing reviewer
ring-buffer: Fix race between reset page and reading page
tracing/user_events: Update ABI documentation to align to bits vs bytes
tracing/user_events: Use bits vs bytes for enabled status page data
tracing/user_events: Use refcount instead of atomic for ref tracking
tracing/user_events: Ensure user provided strings are safely formatted
tracing/user_events: Use WRITE instead of READ for io vector import
tracing/user_events: Use NULL for strstr checks
tracing: Fix spelling mistake "preapre" -> "prepare"
tracing: Wake up waiters when tracing is disabled
tracing: Add ioctl() to force ring buffer waiters to wake up
tracing: Wake up ring buffer waiters on closing of the file
ring-buffer: Add ring_buffer_wake_waiters()
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmM9Y/gACgkQUqAMR0iA
lPJxwg//SJsMwv3zDDP3YAzhROongc9mlNL9yTNo5zMFYaLIF6j0UkN7zP7+ChTY
u3rfiOYOiSNVyylC4+Auznqi219wjEPedDtJkP6Gx6mHJIArXoFltOGct1fz3CH6
xJjBYPpRVQTP54HQ/OEM4kfxn7eG1HXpErxIYUfeRzM5REJR0iZFfI6FwyXVhjsL
1lDTHc/qlLlaDMqik2sQ/3yJ391SQKodAwkJY9d2wS06OjMmvtX+dADZJkXwv14f
5wnBXG/4Zr0HSR2JW3VgU7AwZptaRLYF2PtYdDp+yI0DSsmjPq4d+0YqTVbgLNSq
r0XmewMpnNUNojEuFpW4/+RGZo0pnlJpjUOYfKTbDecnYfMjUFtSGPLVTMnN1OT+
xcW/Q8jO4nQ24993xrQbOr/vBT6nIsePkJAOJkTPzzAxkPF3X7ik3lZyjb7ntDlV
mTfUpmTiFtmdMsDaKFxbgRknPre7z4XASMAiErHU4TGNyKBHsRDXmAdtC8hNn7ZU
z2NGlDrBKuHL8IcKQ6uH/SYtyG3HIVCeiWqkRd9Sm9RKoz4p+sGl3I4D367O+fGX
GGwXj8b5wvjl/BGXHoM+kQKmclYB8kD/8iQSwd64qooTkiDtt1HDPGr2Ewa5thiz
fDoVyy98vyRSNmXtst9S7+IMmdjQ7mjmW9AlAITOwIo89UW+7GQ=
=+qS1
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Fix race between fork and livepatch transition revert
- Add sysfs entry that shows "patched" state for each object (module)
that can be livepatched by the given livepatch
- Some clean up
* tag 'livepatching-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests/livepatch: add sysfs test
livepatch: add sysfs entry "patched" for each klp_object
selftests/livepatch: normalize sysctl error message
livepatch: Add a missing newline character in klp_module_coming()
livepatch: fix race between fork and KLP transition
* cpuset now support isolated cpus.partition type, which will enable dynamic
CPU isolation.
* pids.peak added to remember the max number of pids used.
* Holes in cgroup namespace plugged.
* Internal cleanups.
Note that for-6.1-fixes was pulled into for-6.1 twice. Both were for
follow-up cleanups and each merge commit has details.
Also, 8a693f7766 ("cgroup: Remove CFTYPE_PRESSURE") removes the flag used
by PSI changes in the tip tree and the merged result won't compile due to
the missing flag. Simply removing the struct init lines specifying the flag
is the correct resolution. linux-next already contains the correct fix:
https://lkml.kernel.org/r/20220912161812.072aaa3b@canb.auug.org.au
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCYzsl7w4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGYsxAP4kad4YPw+CueLyyEMiYgBHouqDt8cG0+FJWK3X
svTC7wD/eCLfxZM8TjjSrMmvaMrml586mr3NoQaFeW0x3twptQQ=
=LERu
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset now support isolated cpus.partition type, which will enable
dynamic CPU isolation
- pids.peak added to remember the max number of pids used
- holes in cgroup namespace plugged
- internal cleanups
* tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits)
cgroup: use strscpy() is more robust and safer
iocost_monitor: reorder BlkgIterator
cgroup: simplify code in cgroup_apply_control
cgroup: Make cgroup_get_from_id() prettier
cgroup/cpuset: remove unreachable code
cgroup: Remove CFTYPE_PRESSURE
cgroup: Improve cftype add/rm error handling
kselftest/cgroup: Add cpuset v2 partition root state test
cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule
cgroup/cpuset: Relocate a code block in validate_change()
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Relax constraints to partition & cpus changes
cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective
cgroup/cpuset: Miscellaneous cleanups & add helper functions
cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
cgroup: add pids.peak interface for pids controller
cgroup: Remove data-race around cgrp_dfl_visible
cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG
...
- Disable preemption in rwsem_write_trylock()'s attempt to
take the rwsem, to avoid RT tasks hogging the CPU, which
managed to preempt this function after the owner has
been cleared but before a new owner is set. Also add
debug checks to enforce this.
- Add __lockfunc to more slow path functions and add
__sched to semaphore functions.
- Mark spinlock APIs noinline when the respective CONFIG_INLINE_SPIN_*
toggles are disabled, to reduce LTO text size.
- Print more debug information when lockdep gets confused
in look_up_lock_class().
- Improve header file abuse checks.
- Misc cleanups
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/3r8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1h3fxAAvUfAq4M41aKVDnF1n3e4fZ8MhAQcV7U6
qC+jwS6VII6bd0D2SQseij3BQZqGcg4CqjY7uX/jgcrQHib4haDZn+VlWPacsuN5
yUVNkQNdns6+/fFyLkVJg9HfK7Cw4dgXDUquu/Ivd9YTjtGkGQkJQMa5H5x6NpIF
PcN5B2ynGLt9CBOxqON/SqUIulh58ydUhiPOv0wjgCiCvLXltyCrR57QfX8eY22/
SEzOlbluzp3WBS2beCztKkw1X6woIhhMoCzg2w2WXbvoBr2upKHmIiDoR6U1MUv3
d3iLP4oqmXuN6KQViZsXf7/nulx3NlMkK+9/xLdVbeUiG/F+99AWdiYH5SipFZi0
IxvXPMnl7WE2MxbnL83nbslVoOwxb5M0Ia5VIoJvZnL5HF8P2MQLvSA1XucXJE+f
0JgNb65SFE9SmYLWD8JHOe5whVFf0ccqpuSDVsEzYj18vh/BPbTpVvbrLTp2muSY
uELtGELjgw9zDXFxgC8s3iA6ZzRzcUdCXvrE4ZF+fAjMs3RvtJ66SyRL0R1tevDB
zgsV1oGvgJtKeqaOKqa8IA2OxqQRSIAKzSUFYVmDXG+XXuGrmcWu+LeSetleU3lZ
cS4NAlNSxtWaN6ff9+ULMooSkJQE9pK2FUwc2KNE8vrqn6mP5BeWk4cnA7KtwbYY
fIsO1/F9pIs=
=we4n
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Disable preemption in rwsem_write_trylock()'s attempt to take the
rwsem, to avoid RT tasks hogging the CPU, which managed to preempt
this function after the owner has been cleared but before a new owner
is set. Also add debug checks to enforce this.
- Add __lockfunc to more slow path functions and add __sched to
semaphore functions.
- Mark spinlock APIs noinline when the respective CONFIG_INLINE_SPIN_*
toggles are disabled, to reduce LTO text size.
- Print more debug information when lockdep gets confused in
look_up_lock_class().
- Improve header file abuse checks.
- Misc cleanups
* tag 'locking-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Print more debug information - report name and key when look_up_lock_class() got confused
locking: Add __sched to semaphore functions
locking/rwsem: Disable preemption while trying for rwsem lock
locking: Detect includes rwlock.h outside of spinlock.h
locking: Add __lockfunc to slow path functions
locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled
selftests: futex: Fix 'the the' typo in comment
Commit 98f94ce42a ("KVM: selftests: Move KVM_CREATE_DEVICE_TEST code to
separate helper") wrongly converted a "real" GIC device creation to
__kvm_test_create_device() and caused the test failure on my D05 (which
supports v2 emulation). Fix it.
Fixes: 98f94ce42a ("KVM: selftests: Move KVM_CREATE_DEVICE_TEST code to separate helper")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221009033131.365-1-yuzenghui@huawei.com
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
- Add support for syscall wrappers.
- Add support for KFENCE on 64-bit.
- Update 64-bit HV KVM to use the new guest state entry/exit accounting API.
- Support execute-only memory when using the Radix MMU (P9 or later).
- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
- Updates to our linker script to move more data into read-only sections.
- Allow the VDSO to be randomised on 32-bit.
- Many other small features and fixes.
Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe
Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva,
Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof
Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure,
Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram
Sang, ye xingchen, Zheng Yongjun.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNCpBMTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDx3EACCf86iumFF3RyvENtDwoTRgH3H0z2E
/ZC4LKrtxgaPFJzKUT4F0kLK85Hw5GzMEKK42NIhAB0o5vFwmEzxOtnlHOyEufAm
EDIZDIfxV2J9Qx/cW2DSojPj/o9O6noXwhw9SBqMwiDWd8gXmNgOUEklAO7aR7Vq
Ne2N2FLMNthZydCoHR6dAEjfe2ceFXP5cALwzQO+ILDdZQ0UcF2Yq4yw/gEDoCrB
FH7mmE7UaQQHvYzo85VTZu7XfUys1P7kUcnhVurOg7/07ITnvnQR+itKZXC+bSft
1K7ULtjd2QiCgxZA/apFc3lO46kqHVFsB3onRQw12/Ku5vfGFfY0L0iK97OgM4s0
0u4r+J7A+MM5YBJVVjwZ6woYO5CWMHYKBZepxOpcvftPxj1LNkiHsryqKILGISEC
aIY/lI0hpeNU4QshDMXzSTgeb/VF9O5cGPncTPkOFbXxD4RpVyz8tSngsG1+D8lj
S6B2h3k4A14rnblLOxP22jcedBlTYQcRQS4vwr0a7+63QTjfSJ12xT3ucIAKU9f7
65rVSS/igbrfxqHDmrd60WWZBMXeK0Zy7YIG6iYPTxpP31eFpSp9wtDlV7V2+EH2
F2p+TJY8aTA8UW+2L5gigN3RsBeeEB8zxJkB14ivICM7+XzVu11PxPDqjDZYkfzC
ueKKvCcHhHAYqQ==
=TFBA
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
- Add support for syscall wrappers.
- Add support for KFENCE on 64-bit.
- Update 64-bit HV KVM to use the new guest state entry/exit accounting
API.
- Support execute-only memory when using the Radix MMU (P9 or later).
- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
- Updates to our linker script to move more data into read-only
sections.
- Allow the VDSO to be randomised on 32-bit.
- Many other small features and fixes.
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.
* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
KVM: PPC: Book3S HV: Fix stack frame regs marker
powerpc: Don't add __powerpc_ prefix to syscall entry points
powerpc/64s/interrupt: Fix stack frame regs marker
powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
powerpc/pseries: Add firmware details to the hardware description
powerpc/powernv: Add opal details to the hardware description
powerpc: Add device-tree model to the hardware description
powerpc/64: Add logical PVR to the hardware description
powerpc: Add PVR & CPU name to hardware description
powerpc: Add hardware description string
powerpc/configs: Enable PPC_UV in powernv_defconfig
powerpc/configs: Update config files for removed/renamed symbols
powerpc/mm: Fix UBSAN warning reported on hugetlb
powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
powerpc: Drops STABS_DEBUG from linker scripts
powerpc/64s: Remove lost/old comment
powerpc/64s: Remove old STAB comment
powerpc: remove orphan systbl_chk.sh
...
am sending out early due to me travelling next week. There is a
lone mm patch for which Andrew gave an informal ack at
https://lore.kernel.org/linux-mm/20220817102500.440c6d0a3fce296fdf91bea6@linux-foundation.org.
I will send the bulk of ARM work, as well as other
architectures, at the end of next week.
ARM:
* Account stage2 page table allocations in memory stats.
x86:
* Account EPT/NPT arm64 page table allocations in memory stats.
* Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses.
* Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of
Hyper-V now support eVMCS fields associated with features that are
enumerated to the guest.
* Use KVM's sanitized VMCS config as the basis for the values of nested VMX
capabilities MSRs.
* A myriad event/exception fixes and cleanups. Most notably, pending
exceptions morph into VM-Exits earlier, as soon as the exception is
queued, instead of waiting until the next vmentry. This fixed
a longstanding issue where the exceptions would incorrecly become
double-faults instead of triggering a vmexit; the common case of
page-fault vmexits had a special workaround, but now it's fixed
for good.
* A handful of fixes for memory leaks in error paths.
* Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow.
* Never write to memory from non-sleepable kvm_vcpu_check_block()
* Selftests refinements and cleanups.
* Misc typo cleanups.
Generic:
* remove KVM_REQ_UNHALT
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM2zwcUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNpbwf+MlVeOlzE5SBdrJ0TEnLmKUel1lSz
QnZzP5+D65oD0zhCilUZHcg6G4mzZ5SdVVOvrGJvA0eXh25ruLNMF6jbaABkMLk/
FfI1ybN7A82hwJn/aXMI/sUurWv4Jteaad20JC2DytBCnsW8jUqc49gtXHS2QWy4
3uMsFdpdTAg4zdJKgEUfXBmQviweVpjjl3ziRyZZ7yaeo1oP7XZ8LaE1nR2l5m0J
mfjzneNm5QAnueypOh5KhSwIvqf6WHIVm/rIHDJ1HIFbgfOU0dT27nhb1tmPwAcE
+cJnnMUHjZqtCXteHkAxMClyRq0zsEoKk0OGvSOOMoq3Q0DavSXUNANOig==
=/hqX
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"The first batch of KVM patches, mostly covering x86.
ARM:
- Account stage2 page table allocations in memory stats
x86:
- Account EPT/NPT arm64 page table allocations in memory stats
- Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
accesses
- Drop eVMCS controls filtering for KVM on Hyper-V, all known
versions of Hyper-V now support eVMCS fields associated with
features that are enumerated to the guest
- Use KVM's sanitized VMCS config as the basis for the values of
nested VMX capabilities MSRs
- A myriad event/exception fixes and cleanups. Most notably, pending
exceptions morph into VM-Exits earlier, as soon as the exception is
queued, instead of waiting until the next vmentry. This fixed a
longstanding issue where the exceptions would incorrecly become
double-faults instead of triggering a vmexit; the common case of
page-fault vmexits had a special workaround, but now it's fixed for
good
- A handful of fixes for memory leaks in error paths
- Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow
- Never write to memory from non-sleepable kvm_vcpu_check_block()
- Selftests refinements and cleanups
- Misc typo cleanups
Generic:
- remove KVM_REQ_UNHALT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
KVM: remove KVM_REQ_UNHALT
KVM: mips, x86: do not rely on KVM_REQ_UNHALT
KVM: x86: never write to memory from kvm_vcpu_check_block()
KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
KVM: x86: lapic does not have to process INIT if it is blocked
KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
KVM: x86: make vendor code check for all nested events
mailmap: Update Oliver's email address
KVM: x86: Allow force_emulation_prefix to be written without a reload
KVM: selftests: Add an x86-only test to verify nested exception queueing
KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
...
Some momory will be left in offline state when calling
offline_memory_expect_fail() failed. Restore it before exit.
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add checking for online_memory_expect_success()/
offline_memory_expect_success()/offline_memory_expect_fail(), or
the test would exit 0 although the functions return 1.
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When running a RISC-V test kernel under QEMU, we need an OpenSBI BIOS
file. In the original QEMU support patchset, kunit_tool would optionally
download this file from GitHub if it didn't exist, using wget.
These days, it can usually be found in the distro's qemu-system-riscv
package, and is located in /usr/share/qemu on all the distros I tried
(Debian, Arch, OpenSUSE). Use this file, and thereby don't do any
downloading in kunit_tool.
In addition, we used to shell out to whatever 'wget' was in the path,
which could have potentially been used to trick the developer into
running another binary. By not using wget at all, we nicely sidestep
this issue.
Cc: Xu Panda <xu.panda@zte.com.cn>
Fixes: 87c9c16317 ("kunit: tool: add support for QEMU")
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmM67S0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppnPEACkBzilBLKwT9MWdUAITwyrMXsAa1R9gsR9
Tb3Xs+mNO2meuycLAUh4LIbb28NNr7/S5rwWet5NRZ71hgv4Q/WA/0EemAGGXYqd
+3MEBAWU3FBFkC/cJXCnT8F5yCXYRkT5n/hzCSNEpNKjQ5JnAhHDlWAjgzZRuD/A
A+YJjoBVJJuI1wY4I5XCpeQXEmg/Wc1MgXfyHgWVtGKnYrrxibiCnBZnqbAMZNvD
hGn1Vl02ooamGTFm/nW/OAt71DtqsjWUCVOHKmlZ+zBUjbUj6FMXmPVV7vCV9o2w
PT4Dx3CTc2iXwa8KfEFNPvXBzy0Qfu8edweP/MvZHWHVZREpEAh4cG6GhwW8whD+
5mPisqmRjZKe0BBS4k/wKN1RXEypSQoTU4EdljfbQPU/usn35lmjMmEXXgs3IhqM
fcTdO5ZUOp+CGyzI0Bc7UtS8vilJbX9ynN8G80MUUAZzuQg39MH7lNQYSJSSvJfU
OlvzmL3lhRLYM1s/KKiZzdDBoMvC7R4oHmzCveOjQTMIHf6WNyqKFlrWScq2wzpN
oRxqt0xiVQ3PFMmFj6N08f145qtbASuF3sKv7dbU3QXTsCAos3wdTdX+PejYApEZ
W3dr0TDjNBicLNVPiSj132p0ZRtdZvLGuGVkBD4GPQeH2NwswxMHQAfz8e2lqmA4
9bWG6BM7Yw==
=m9kX
-----END PGP SIGNATURE-----
Merge tag 'for-6.1/io_uring-2022-10-03' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
- Add supported for more directly managed task_work running.
This is beneficial for real world applications that end up issuing
lots of system calls as part of handling work. Normal task_work will
always execute as we transition in and out of the kernel, even for
"unrelated" system calls. It's more efficient to defer the handling
of io_uring's deferred work until the application wants it to be run,
generally in batches.
As part of ongoing work to write an io_uring network backend for
Thrift, this has been shown to greatly improve performance. (Dylan)
- Add IOPOLL support for passthrough (Kanchan)
- Improvements and fixes to the send zero-copy support (Pavel)
- Partial IO handling fixes (Pavel)
- CQE ordering fixes around CQ ring overflow (Pavel)
- Support sendto() for non-zc as well (Pavel)
- Support sendmsg for zerocopy (Pavel)
- Networking iov_iter fix (Stefan)
- Misc fixes and cleanups (Pavel, me)
* tag 'for-6.1/io_uring-2022-10-03' of git://git.kernel.dk/linux: (56 commits)
io_uring/net: fix notif cqe reordering
io_uring/net: don't update msg_name if not provided
io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL
io_uring/rw: defer fsnotify calls to task context
io_uring/net: fix fast_iov assignment in io_setup_async_msg()
io_uring/net: fix non-zc send with address
io_uring/net: don't skip notifs for failed requests
io_uring/rw: don't lose short results on io_setup_async_rw()
io_uring/rw: fix unexpected link breakage
io_uring/net: fix cleanup double free free_iov init
io_uring: fix CQE reordering
io_uring/net: fix UAF in io_sendrecv_fail()
selftest/net: adjust io_uring sendzc notif handling
io_uring: ensure local task_work marks task as running
io_uring/net: zerocopy sendmsg
io_uring/net: combine fail handlers
io_uring/net: rename io_sendzc()
io_uring/net: support non-zerocopy sendto
io_uring/net: refactor io_setup_async_addr
io_uring/net: don't lose partial send_zc on fail
...
Gwangun Jung reported a slab-out-of-bounds access in fib_nh_match:
fib_nh_match+0xf98/0x1130 linux-6.0-rc7/net/ipv4/fib_semantics.c:961
fib_table_delete+0x5f3/0xa40 linux-6.0-rc7/net/ipv4/fib_trie.c:1753
inet_rtm_delroute+0x2b3/0x380 linux-6.0-rc7/net/ipv4/fib_frontend.c:874
Separate nexthop objects are mutually exclusive with the legacy
multipath spec. Fix fib_nh_match to return if the config for the
to be deleted route contains a multipath spec while the fib_info
is using a nexthop object.
Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
Fixes: 6bf92d70e6 ("net: ipv4: fix route with nexthop object delete warning")
Reported-by: Gwangun Jung <exsociety@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This KUnit update for Linux 6.1-rc1 consists of several documentation
fixes, UML related cleanups, and a feature to enable/disable KUnit
tests. This update includes the following change to
- rename all_test_uml.config, use it for --alltests
Note: if anyone was using all_tests_uml.config, this change breaks them.
This change simplifies the usage and eliminates the need to type:
--kunitconfig=tools/testing/kunit/configs/all_tests_uml.config.
A simple workaround to create a symlink to the new name can solve the
problem for anyone using all_tests_uml.config.
all_tests_uml.config should work across ~all architectures.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmM97okACgkQCwJExA0N
Qxwe9BAA4rUh/CdaDpGxSoknamlSlXbcsxyHQK32OoqtIQPJfPcqxZwZhqGzpZAE
z7qqnfN7dGWACcJjaZYZtkjJ1tsJ7OADf6m81drM6lmHSDRldAhdXMK9qqJAA2pk
OMciu7TcoSezaMw3eKZZ8jeQ2OIEUirQQXFIuMkb9r+E65tyIGCVo9njY19is6bk
7TcrUNlzgTednzkJ7B5BY54vZjvET/kwwBhXUgygMoi0JkKIoksJcxF2y7x1UKUx
Kfu/3bqiBAAEiOdMLxnxqwNH/Uhny1IWV6w+p0gtEsVGS47QvRew7gtIuKgCZWpf
S/w7QL+zCaXGB3jM8KVk7p4L9qU6EIttqu7kwXSs5OZz8QVrw6oc1l0RDUI5NIWi
TLRO4QJHWssKNrSyldtk2J9wFEHOF48Hc6HYqHxJLFYwfutt5hzsKotN6+H60Uth
TVAE+oUWknQks1Imv7x3Avq5k6CyfH9dFqqH+IrM2hDbqkZN1hfHvm1RJgcXFaHf
f4UpAYwCgkb3Maa7mTBq2/x98UUhh051EQuHDUO6H0tzccxJUbwkeeGK+WH51aZu
d6RD4ZnH4BlB+EMbfXgkXdsGqcsQMneoLh5/V0y1L8aSyjJQ8FYlK5lQYwu9VBXQ
6g6AH4nlhvVplXgCDeGnnTa7HfTD87Vx9Dws+uO+J/QbwqTatI4=
=Q5W1
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"Several documentation fixes, UML related cleanups, and a feature to
enable/disable KUnit tests
This includes the change to rename all_test_uml.config, and use it for
'--alltests'. Note: if anyone was using all_tests_uml.config, this
change breaks them.
This change simplifies the usage and eliminates the need to type:
--kunitconfig=tools/testing/kunit/configs/all_tests_uml.config
A simple workaround to create a symlink to the new name can solve the
problem for anyone using all_tests_uml.config.
all_tests_uml.config should work across ~all architectures"
* tag 'linux-kselftest-kunit-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: Kunit: Use full path to .kunitconfig
kunit: tool: rename all_test_uml.config, use it for --alltests
kunit: tool: remove UML specific options from all_tests_uml.config
lib: stackinit: update reference to kunit-tool
lib: overflow: update reference to kunit-tool
Documentation: KUnit: update links in the index page
Documentation: KUnit: add intro to the getting-started page
Documentation: KUnit: Reword start guide for selecting tests
Documentation: KUnit: add note about mrproper in start.rst
Documentation: KUnit: avoid repeating "kunit.py run" in start.rst
Documentation: KUnit: remove duplicated docs for kunit_tool
Documentation: Kunit: Add ref for other kinds of tests
Documentation: KUnit: Fix non-uml anchor
Documentation: Kunit: Fix inconsistent titles
Documentation: kunit: fix trivial typo
kunit: no longer call module_info(test, "Y") for kunit modules
kunit: add kunit.enable to enable/disable KUnit test
kunit: tool: make --raw_output=kunit (aka --raw_output) preserve leading spaces
This Kselftest update for Linux 6.1-rc1 consists of fixes and new tests.
- Adds a amd-pstate-ut test module, this module is used by kselftest
to unit test amd-pstate functionality
- Fixes and cleanups to to cpu-hotplug to delete the fault injection
test code
- Improvements to vm test to use top_srcdir for builds
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmM912gACgkQCwJExA0N
QxzfAhAAspak4B2FVZdXrb7ulGxCV2CSd49qXbv++7rBy3Ey17XlIO8pUj2FMPpx
nBrD2UD+ET/6wHjSzkcSzU47N2BnhetuUWqN2vPmuYo4c8xDME9VSWeoKQ9Yw5rc
6r6TxNG/Jbxab4y9r8jibwkidVzcgfYO/Ecnj9HAZ6wDERkVG+LBxpX4HpxEnCsY
c3frYFl0cdtLOSGbEA18qctG7ro2otgh4+cWVfSo9NDypL9UfF8bZ8DzqeMOwRLr
z2S8L+kBPyQx+ZNdqUgNOfegVx2gKu/P43Xpg8WqRJG/42xxmRmftEOLoVKZ1cCe
Rpjv+ZV+/u3QnbEejqJ7MU/QX9JzMG35O4mo7ojxpOJ+cZNXc1ycOm+IOvTCSUJE
fiqM2idLvQNjKMVI+FHvkE+9us6K8fzquCYqgRLBKCnfnJJfgT6tWwZpw/G8OSZ6
Px37bfQhALzDj79OkxxoV96GdkR9a2L8GRvZG+7W4l6Oa00OM+rK1oQnjKVgmjT5
wtMwJAXGpwFYqnJagtXcg8dsxS/ZA+k4Eq/skd/yUlCmugmgeVltjk3UmEAYs6Nx
ame3798/haptgR1u7ncqltqjx11YR2k9i0Sjf7iqQl8zJhjtDdePAlo2gB4mFUio
mZ9BBxCD+oUnAZprA5AwN0ZxPKLWgvPS5rLqTtFt3Pf0agjHSpY=
=O0Lh
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
"Fixes and new tests:
- Add an amd-pstate-ut test module, used by kselftest to unit test
amd-pstate functionality
- Fixes and cleanups to to cpu-hotplug to delete the fault injection
test code
- Improvements to vm test to use top_srcdir for builds"
* tag 'linux-kselftest-next-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
docs:kselftest: fix kselftest_module.h path of example module
cpufreq: amd-pstate: Add explanation for X86_AMD_PSTATE_UT
selftests/cpu-hotplug: Add log info when test success
selftests/cpu-hotplug: Reserve one cpu online at least
selftests/cpu-hotplug: Delete fault injection related code
selftests/cpu-hotplug: Use return instead of exit
selftests/cpu-hotplug: Correct log info
cpufreq: amd-pstate: modify type in argument 2 for filp_open
Documentation: amd-pstate: Add unit test introduction
selftests: amd-pstate: Add test trigger for amd-pstate driver
cpufreq: amd-pstate: Add test module for amd-pstate driver
cpufreq: amd-pstate: Expose struct amd_cpudata
selftests/vm: use top_srcdir instead of recomputing relative paths
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
vector granule register added to the user regs together with SVE perf
extensions documentation.
- SVE updates: add HWCAP for SVE EBF16, update the SVE ABI documentation
to match the actual kernel behaviour (zeroing the registers on syscall
rather than "zeroed or preserved" previously).
- More conversions to automatic system registers generation.
- vDSO: use self-synchronising virtual counter access in gettimeofday()
if the architecture supports it.
- arm64 stacktrace cleanups and improvements.
- arm64 atomics improvements: always inline assembly, remove LL/SC
trampolines.
- Improve the reporting of EL1 exceptions: rework BTI and FPAC exception
handling, better EL1 undefs reporting.
- Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
result.
- arm64 defconfig updates: build CoreSight as a module, enable options
necessary for docker, memory hotplug/hotremove, enable all PMUs
provided by Arm.
- arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
extensions).
- arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
unused function.
- kselftest updates for arm64: simple HWCAP validation, FP stress test
improvements, validation of ZA regs in signal handlers, include larger
SVE and SME vector lengths in signal tests, various cleanups.
- arm64 alternatives (code patching) improvements to robustness and
consistency: replace cpucap static branches with equivalent
alternatives, associate callback alternatives with a cpucap.
- Miscellaneous updates: optimise kprobe performance of patching
single-step slots, simplify uaccess_mask_ptr(), move MTE registers
initialisation to C, support huge vmalloc() mappings, run softirqs on
the per-CPU IRQ stack, compat (arm32) misalignment fixups for
multiword accesses.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmM9W4cACgkQa9axLQDI
XvEy3w/+LJ3KCFowWiz5gTAWikjv+UVssHjLMJixn47V7hsEFQ26Xnam/438rTMI
kE95u6DHUpw2SMIxKzFRO7oI5cQtP+cWGwTtOUnjVO+U1oN+HqDOIbO9DbylWDcU
eeeqMMmawMfTPuZrYklpOhXscsorbrKIvYBg7wHYOcwBYV3EPhWr89lwMvTVRuyJ
qpX628KlkGMaBcONNhv3nS3qZcAOs0oHQCAVS4C8czLDL+vtJlumXUS3xr1Mqm72
xtFe7sje8Djr2kZ8mzh0GbFiZEBoBD3F/l7ayq8gVRaVpToUt8sk36Stjs4LojF1
6imuAfji/5TItkScq5KhGqj6MIugwp/eUVbRN74OLNTYx7msF1ZADNFQ+Q0UuY0H
SYK13KvmOji0xjS8qAfhqrwNB79sk3fb+zF9LjETbdz4ZJCgg9gcFbSUTY0DvMfS
MXZk/jVeB07olA8xYbjh0BRt4UV9xU628FPQzK5k7e4Nzl4jSvgtJZCZanfuVtjy
/ZS1vbN8o7tQLBAlVnw+Exi/VedkKxkkMgm8tPKsMgERTFDx0Pc4Gs72hRpDnPWT
MRbeCCGleAf3JQ5vF0coBDNOCEVvweQgShHOyHTz0GyhWXLCFx3RJICo5I4EIpps
LLUk4JK0fO3LVrf1AEpu5ZP4+Sact0zfsH3gB7qyLPYFDmjDXD8=
=jl3Z
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
vector granule register added to the user regs together with SVE perf
extensions documentation.
- SVE updates: add HWCAP for SVE EBF16, update the SVE ABI
documentation to match the actual kernel behaviour (zeroing the
registers on syscall rather than "zeroed or preserved" previously).
- More conversions to automatic system registers generation.
- vDSO: use self-synchronising virtual counter access in gettimeofday()
if the architecture supports it.
- arm64 stacktrace cleanups and improvements.
- arm64 atomics improvements: always inline assembly, remove LL/SC
trampolines.
- Improve the reporting of EL1 exceptions: rework BTI and FPAC
exception handling, better EL1 undefs reporting.
- Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
result.
- arm64 defconfig updates: build CoreSight as a module, enable options
necessary for docker, memory hotplug/hotremove, enable all PMUs
provided by Arm.
- arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
extensions).
- arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
unused function.
- kselftest updates for arm64: simple HWCAP validation, FP stress test
improvements, validation of ZA regs in signal handlers, include
larger SVE and SME vector lengths in signal tests, various cleanups.
- arm64 alternatives (code patching) improvements to robustness and
consistency: replace cpucap static branches with equivalent
alternatives, associate callback alternatives with a cpucap.
- Miscellaneous updates: optimise kprobe performance of patching
single-step slots, simplify uaccess_mask_ptr(), move MTE registers
initialisation to C, support huge vmalloc() mappings, run softirqs on
the per-CPU IRQ stack, compat (arm32) misalignment fixups for
multiword accesses.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits)
arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
arm64/kprobe: Optimize the performance of patching single-step slot
arm64: defconfig: Add Coresight as module
kselftest/arm64: Handle EINTR while reading data from children
kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
kselftest/arm64: Don't repeat termination handler for fp-stress
ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
arm64/mm: fold check for KFENCE into can_set_direct_map()
arm64: ftrace: fix module PLTs with mcount
arm64: module: Remove unused plt_entry_is_initialized()
arm64: module: Make plt_equals_entry() static
arm64: fix the build with binutils 2.27
kselftest/arm64: Don't enable v8.5 for MTE selftest builds
arm64: uaccess: simplify uaccess_mask_ptr()
arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
kselftest/arm64: Fix typo in hwcap check
arm64: mte: move register initialization to C
arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
arm64/sve: Add Perf extensions documentation
...
Adding missing bpf_iter_vma_offset__destroy call and using in-skeletin
link pointer so we don't need extra bpf_link__destroy call.
Fixes: b3e1331eb9 ("selftests/bpf: Test parameterized task BPF iterators.")
Cc: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221006083106.117987-1-jolsa@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
BPF CI reported that selftest deny_namespace failed with s390x.
test_unpriv_userns_create_no_bpf:PASS:no-bpf unpriv new user ns 0 nsec
test_deny_namespace:PASS:skel load 0 nsec
libbpf: prog 'test_userns_create': failed to attach: ERROR: strerror_r(-524)=22
libbpf: prog 'test_userns_create': failed to auto-attach: -524
test_deny_namespace:FAIL:attach unexpected error: -524 (errno 524)
#57/1 deny_namespace/unpriv_userns_create_no_bpf:FAIL
#57 deny_namespace:FAIL
BPF program test_userns_create is a BPF LSM type program which is
based on trampoline and s390x does not support s390x. Let add the
test to x390x deny list to avoid this failure in BPF CI.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221006053429.3549165-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a step to attempt to "fix up" BPF object file to make it possible to
successfully load it. E.g., set non-zero size for BPF maps that expect
max_entries set, but BPF object file itself doesn't have declarative
max_entries values specified.
Another issue was with automatic map pinning. Pinning has no effect on
BPF verification process itself but can interfere when validating
multiple related programs and object files, so veristat disabled all the
pinning explicitly.
In the future more such fix up heuristics could be added to accommodate
common patterns encountered in practice.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221005161450.1064469-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In special case when both base and comparison values are 0, veristat
currently reports "+0 (+100%)" difference, which is quite confusing. Fix
it up to be "+0 (+0%)".
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221005161450.1064469-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Log level 1 on successfully verified programs are basically equivalent
to log level 4 (stats-only), so it's useful to be able to request more
verbose logs at log level 2. Teach test_verifier to recognize -vv as
"very verbose" mode switch and use log level 2 in such mode.
Also force verifier stats regradless of -v or -vv, they are very minimal
and useful to be always emitted in verbose mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221005161450.1064469-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Structures with zero regular fields but some padding constitute a
special case in btf_dump.c:btf_dump_emit_struct_def with regards to
newline before closing '}'.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221001104425.415768-2-eddyz87@gmail.com
This kernel module is used for testing. It's safe to say M here.
It can also be built-in without X86_AMD_PSTATE enabled.
Currently, only tests for amd-pstate are supported. If X86_AMD_PSTATE
is set disabled, it can tell the users test can only run on amd-pstate
driver, please set X86_AMD_PSTATE enabled.
In the future, comparison tests will be added. It can set amd-pstate
disabled and set acpi-cpufreq enabled to run test cases, then compare
the test results.
Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add log information when run full test successfully.
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Considering that we can not offline all cpus in any cases,
we need to reserve one cpu online when the test offline all
hotpluggable online cpus, otherwise the test will fail forever.
Fixes: d89dffa976 ("fault-injection: add selftests for cpu and memory hotplug")
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Delete fault injection related code since the module has been deleted.
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Some cpus will be left in offline state when online
function exits in some error conditions. Use return
instead of exit to fix it.
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add amd-pstate test trigger in kselftest, it will load/unload
amd-pstate-ut module to test some cases etc.
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In various places both in t/t/s/v/Makefile as well as some of the test
sources, we were referring to headers or directories using some fairly
long relative paths.
Since we have a working top_srcdir variable though, which refers to the
root of the kernel tree, we can clean up all of these "up and over"
relative paths, just relying on the single variable instead.
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The following output can bee seen when the test is executed:
test_flush_context (tpm2_tests.SpaceTest) ... \
/usr/lib64/python3.6/unittest/case.py:605: ResourceWarning: \
unclosed file <_io.FileIO name='/dev/tpmrm0' mode='rb+' closefd=True>
An instance of Client does not implicitly close /dev/tpm* handle, once it
gets destroyed. Close the file handle in the class destructor
Client.__del__().
Fixes: 6ea3dfe1e0 ("selftests: add TPM 2.0 tests")
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Core
----
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO.
This significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
BPF
---
- Add BPF-specific, any-context-safe memory allocator.
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions.
Expose crash_kexec() to allow BPF to trigger a kernel dump.
Use CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs.
Only structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols
---------
- WiFi: more Extremely High Throughput (EHT) and Multi-Link
Operation (MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state
and RST packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API
----------
- Allow selecting the conduit interface used by each port
in DSA switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting
per traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one
of the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much
as possible. It's one of those magic knobs which seemed like
a good idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers
----------------------
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- Ethernet SFPs / modules:
- RollBall / Hilink / Turris 10G copper SFPs
- HALNy GPON module
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers
-------
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay
window offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Ethernet high-speed switches:
- Marvell (prestera):
- support SPAN port features (traffic mirroring)
- nexthop object offloading
- Microchip (sparx5):
- multicast forwarding offload
- QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- support RGMII cmode
- NXP (felix):
- standardized ethtool counters
- Microchip (lan966x):
- QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
- traffic policing and mirroring
- link aggregation / bonding offload
- QUSGMII PHY mode support
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
- MediaTek WiFi (mt76):
- WiFi-to-Ethernet bridging offload for MT7986 chips
- RealTek WiFi (rtw89):
- P2P support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S
Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY
fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe
IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j
TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52
bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs
HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89
Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B
AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q
9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf
LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE
ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8=
=Gsio
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO. This
significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
BPF:
- Add BPF-specific, any-context-safe memory allocator.
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions. Expose
crash_kexec() to allow BPF to trigger a kernel dump. Use
CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs. Only
structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols:
- WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
(MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API:
- Allow selecting the conduit interface used by each port in DSA
switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one of
the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much as
possible. It's one of those magic knobs which seemed like a good
idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers:
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- Ethernet SFPs / modules:
- RollBall / Hilink / Turris 10G copper SFPs
- HALNy GPON module
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers:
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay window
offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Ethernet high-speed switches:
- Marvell (prestera):
- support SPAN port features (traffic mirroring)
- nexthop object offloading
- Microchip (sparx5):
- multicast forwarding offload
- QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- support RGMII cmode
- NXP (felix):
- standardized ethtool counters
- Microchip (lan966x):
- QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
- traffic policing and mirroring
- link aggregation / bonding offload
- QUSGMII PHY mode support
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
- MediaTek WiFi (mt76):
- WiFi-to-Ethernet bridging offload for MT7986 chips
- RealTek WiFi (rtw89):
- P2P support"
* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
eth: pse: add missing static inlines
once: rename _SLOW to _SLEEPABLE
net: pse-pd: add regulator based PSE driver
dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
ethtool: add interface to interact with Ethernet Power Equipment
net: mdiobus: search for PSE nodes by parsing PHY nodes.
net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
net: add framework to support Ethernet PSE and PDs devices
dt-bindings: net: phy: add PoDL PSE property
net: marvell: prestera: Propagate nh state from hw to kernel
net: marvell: prestera: Add neighbour cache accounting
net: marvell: prestera: add stub handler neighbour events
net: marvell: prestera: Add heplers to interact with fib_notifier_info
net: marvell: prestera: Add length macros for prestera_ip_addr
net: marvell: prestera: add delayed wq and flush wq on deinit
net: marvell: prestera: Add strict cleanup of fib arbiter
net: marvell: prestera: Add cleanup of allocated fib_nodes
net: marvell: prestera: Add router nexthops ABI
eth: octeon: fix build after netif_napi_add() changes
net/mlx5: E-Switch, Return EBUSY if can't get mode lock
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68YIUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOTbA//TR8i+Wy8iswUCmtfmYg91h1uebpl
/kjNsSmfgivAUTGamr3eN2WRlGhZfkFDPIHa25uybSA6Q+75p4lst83Rt3HDbjkv
Ga7grCXnHwSDwJoHOSeFh0pojV2u7Zvfmiib2U5hPZEmd3kBw3NCgAJVcSGN80B2
dct36fzZNXjvpWDbygmFtRRkmEseslSkft8bUVvNZBP+B0zvv3vcNY1QFuKuK+W2
8wWpvO/cCSmke5i2c2ktHSk2f8/Y6n26Ik/OTHcTVfoKZLRaFbXEzLyxzLrNWd6m
hujXgcxszTtHdmoXx+J6uBauju7TR8pi1x8mO2LSGrlpRc1cX0A5ED8WcH71+HVE
8L1fIOmZShccPZn8xRok7oYycAUm/gIfpmSLzmZA76JsZYAe+mp9Ze9FA6fZtSwp
7Q/rfw/Rlz25WcFBe4xypP078HkOmqutkCk2zy5liR+cWGrgy/WKX15vyC0TaPrX
tbsRKuCLkipgfXrTk0dX3kmhz+3bJYjqeZEt7sfPSZYpaOGkNXVmAW0wnCOTuLMU
+8pIVktvQxMmACEj2gBMz11iooR4DpWLxOcQQR/impgCpNdZ60nA0a6KPJoIXC+5
NfTa422FZkc99QRVblUZyWSgJBW78Z3ZAQcQlo1AGLlFydbfrSFTRLbmNJZo/Nkl
KwpGvWs5nB0rVw0=
=VZl5
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
"Seven patches for the LSM layer and we've got a mix of trivial and
significant patches. Highlights below, starting with the smaller bits
first so they don't get lost in the discussion of the larger items:
- Remove some redundant NULL pointer checks in the common LSM audit
code.
- Ratelimit the lockdown LSM's access denial messages.
With this change there is a chance that the last visible lockdown
message on the console is outdated/old, but it does help preserve
the initial series of lockdown denials that started the denial
message flood and my gut feeling is that these might be the more
valuable messages.
- Open userfaultfds as readonly instead of read/write.
While this code obviously lives outside the LSM, it does have a
noticeable impact on the LSMs with Ondrej explaining the situation
in the commit description. It is worth noting that this patch
languished on the VFS list for over a year without any comments
(objections or otherwise) so I took the liberty of pulling it into
the LSM tree after giving fair notice. It has been in linux-next
since the end of August without any noticeable problems.
- Add a LSM hook for user namespace creation, with implementations
for both the BPF LSM and SELinux.
Even though the changes are fairly small, this is the bulk of the
diffstat as we are also including BPF LSM selftests for the new
hook.
It's also the most contentious of the changes in this pull request
with Eric Biederman NACK'ing the LSM hook multiple times during its
development and discussion upstream. While I've never taken NACK's
lightly, I'm sending these patches to you because it is my belief
that they are of good quality, satisfy a long-standing need of
users and distros, and are in keeping with the existing nature of
the LSM layer and the Linux Kernel as a whole.
The patches in implement a LSM hook for user namespace creation
that allows for a granular approach, configurable at runtime, which
enables both monitoring and control of user namespaces. The general
consensus has been that this is far preferable to the other
solutions that have been adopted downstream including outright
removal from the kernel, disabling via system wide sysctls, or
various other out-of-tree mechanisms that users have been forced to
adopt since we haven't been able to provide them an upstream
solution for their requests. Eric has been steadfast in his
objections to this LSM hook, explaining that any restrictions on
the user namespace could have significant impact on userspace.
While there is the possibility of impacting userspace, it is
important to note that this solution only impacts userspace when it
is requested based on the runtime configuration supplied by the
distro/admin/user. Frederick (the pathset author), the LSM/security
community, and myself have tried to work with Eric during
development of this patchset to find a mutually acceptable
solution, but Eric's approach and unwillingness to engage in a
meaningful way have made this impossible. I have CC'd Eric directly
on this pull request so he has a chance to provide his side of the
story; there have been no objections outside of Eric's"
* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: ratelimit denial messages
userfaultfd: open userfaultfds with O_RDONLY
selinux: Implement userns_create hook
selftests/bpf: Add tests verifying bpf lsm userns_create hook
bpf-lsm: Make bpf_lsm_userns_create() sleepable
security, lsm: Introduce security_create_user_ns()
lsm: clean up redundant NULL pointer check
Various fixes across several hardening areas:
- loadpin: Fix verity target enforcement (Matthias Kaehlcke).
- zero-call-used-regs: Add missing clobbers in paravirt (Bill Wendling).
- CFI: clean up sparc function pointer type mismatches (Bart Van Assche).
- Clang: Adjust compiler flag detection for various Clang changes (Sami
Tolvanen, Kees Cook).
- fortify: Fix warnings in arch-specific code in sh, ARM, and xen.
Improvements to existing features:
- testing: improve overflow KUnit test, introduce fortify KUnit test,
add more coverage to LKDTM tests (Bart Van Assche, Kees Cook).
- overflow: Relax overflow type checking for wider utility.
New features:
- string: Introduce strtomem() and strtomem_pad() to fill a gap in
strncpy() replacement needs.
- um: Enable FORTIFY_SOURCE support.
- fortify: Enable run-time struct member memcpy() overflow warning.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4chcWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJvq1D/9uKU03RozAOnzhi4gcgRnHZSAK
oOQOkPwnkUgFU0yOnMkNYOZ7njLnM+CjCN3RJ9SSpD2lrQ23PwLeThAuOzy0brPO
0iAksIztSF3e5tAyFjtFkjswrY8MSv/TkF0WttTOSOj3lCUcwatF0FBkclCOXtwu
ILXfG7K8E17r/wsUejN+oMAI42ih/YeVQAZpKRymEEJsK+Lly7OT4uu3fdFWVb1P
M77eRLI2Vg1eSgMVwv6XdwGakpUdwsboK7do0GGX+JOrhayJoCfY2IpwyPz9ciel
jsp9OQs8NrlPJMa2sQ7LDl+b5EQl/MtggX3JlQEbLs2LV7gDtYgAWNo6vxCT5Lvd
zB7TZqIR3lrVjbtw4FAKQ+41bS4VOajk2NB3Mkiy5AfivB+6zKF+P56a+xSoNhOl
iktpjCEP7bp4oxmTMXpOfmywjh/ZsyoMhQ2ABP7S+JZ5rHUndpPAjjuBetIcHxX2
28Wlr4aFIF9ff9caasg4sMYXcQMGnuLUlUKngceUbd1umZZRNZ1gaIxYpm9poefm
qd/lvTIvzn9V8IB8wHVmvafbvDbV88A+2bKJdSUDA352Dt9PvqT7yI0dmbMNliGL
os+iLPW6Y6x38BxhXax0HR9FEhO3Eq7kLdNdc4J29NvISg8HHaifwNrG41lNwaWL
cuc6IAjLxiRk3NsUpg==
=HZ6+
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kernel hardening updates from Kees Cook:
"Most of the collected changes here are fixes across the tree for
various hardening features (details noted below).
The most notable new feature here is the addition of the memcpy()
overflow warning (under CONFIG_FORTIFY_SOURCE), which is the next step
on the path to killing the common class of "trivially detectable"
buffer overflow conditions (i.e. on arrays with sizes known at compile
time) that have resulted in many exploitable vulnerabilities over the
years (e.g. BleedingTooth).
This feature is expected to still have some undiscovered false
positives. It's been in -next for a full development cycle and all the
reported false positives have been fixed in their respective trees.
All the known-bad code patterns we could find with Coccinelle are also
either fixed in their respective trees or in flight.
The commit message in commit 54d9469bc5 ("fortify: Add run-time WARN
for cross-field memcpy()") for the feature has extensive details, but
I'll repeat here that this is a warning _only_, and is not intended to
actually block overflows (yet). The many patches fixing array sizes
and struct members have been landing for several years now, and we're
finally able to turn this on to find any remaining stragglers.
Summary:
Various fixes across several hardening areas:
- loadpin: Fix verity target enforcement (Matthias Kaehlcke).
- zero-call-used-regs: Add missing clobbers in paravirt (Bill
Wendling).
- CFI: clean up sparc function pointer type mismatches (Bart Van
Assche).
- Clang: Adjust compiler flag detection for various Clang changes
(Sami Tolvanen, Kees Cook).
- fortify: Fix warnings in arch-specific code in sh, ARM, and xen.
Improvements to existing features:
- testing: improve overflow KUnit test, introduce fortify KUnit test,
add more coverage to LKDTM tests (Bart Van Assche, Kees Cook).
- overflow: Relax overflow type checking for wider utility.
New features:
- string: Introduce strtomem() and strtomem_pad() to fill a gap in
strncpy() replacement needs.
- um: Enable FORTIFY_SOURCE support.
- fortify: Enable run-time struct member memcpy() overflow warning"
* tag 'hardening-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (27 commits)
Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero
sparc: Unbreak the build
x86/paravirt: add extra clobbers with ZERO_CALL_USED_REGS enabled
x86/paravirt: clean up typos and grammaros
fortify: Convert to struct vs member helpers
fortify: Explicitly check bounds are compile-time constants
x86/entry: Work around Clang __bdos() bug
ARM: decompressor: Include .data.rel.ro.local
fortify: Adjust KUnit test for modular build
sh: machvec: Use char[] for section boundaries
kunit/memcpy: Avoid pathological compile-time string size
lib: Improve the is_signed_type() kunit test
LoadPin: Require file with verity root digests to have a header
dm: verity-loadpin: Only trust verity targets with enforcement
LoadPin: Fix Kconfig doc about format of file with verity digests
um: Enable FORTIFY_SOURCE
lkdtm: Update tests for memcpy() run-time warnings
fortify: Add run-time WARN for cross-field memcpy()
fortify: Use SIZE_MAX instead of (size_t)-1
...
Daniel Borkmann says:
====================
pull-request: bpf 2022-10-03
We've added 10 non-merge commits during the last 23 day(s) which contain
a total of 14 files changed, 130 insertions(+), 69 deletions(-).
The main changes are:
1) Fix dynptr helper API to gate behind CAP_BPF given it was not intended
for unprivileged BPF programs, from Kumar Kartikeya Dwivedi.
2) Fix need_wakeup flag inheritance from umem buffer pool for shared xsk
sockets, from Jalal Mostafa.
3) Fix truncated last_member_type_id in btf_struct_resolve() which had a
wrong storage type, from Lorenz Bauer.
4) Fix xsk back-pressure mechanism on tx when amount of produced
descriptors to CQ is lower than what was grabbed from xsk tx ring,
from Maciej Fijalkowski.
5) Fix wrong cgroup attach flags being displayed to effective progs,
from Pu Lehui.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xsk: Inherit need_wakeup flag for shared sockets
bpf: Gate dynptr API behind CAP_BPF
selftests/bpf: Adapt cgroup effective query uapi change
bpftool: Fix wrong cgroup attach flags being assigned to effective progs
bpf, cgroup: Reject prog_attach_flags array when effective query
bpf: Ensure correct locking around vulnerable function find_vpid()
bpf: btf: fix truncated last_member_type_id in btf_struct_resolve
selftests/xsk: Add missing close() on netns fd
xsk: Fix backpressure mechanism on Tx
MAINTAINERS: Add include/linux/tnum.h to BPF CORE
====================
Link: https://lore.kernel.org/r/20221003201957.13149-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since 2d1c498072 ("mm: memcontrol: make swap tracking an integral part
of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config
option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP.
Update the sites accordingly and drop the symbol.
[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
which hasn't been a user-visible symbol for over half a decade. ]
Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add :collapse mod to userfaultfd selftest. Currently this mod is only
valid for "shmem" test type, but could be used for other test types.
When provided, memory allocated by ->allocate_area() will be
hugepage-aligned enforced to be hugepage-sized. userfaultf_minor_test,
after the UFFD-registered mapping has been populated by UUFD minor fault
handler, attempt to MADV_COLLAPSE the UFFD-registered mapping to collapse
the memory into a pmd-mapped THP.
This test is meant to be a functional test of what occurs during
UFFD-driven live migration of VMs backed by huge tmpfs where, after a
hugepage-sized region has been successfully migrated (in native page-sized
chunks, to avoid latency of fetched a hugepage over the network), we want
to reclaim previous VM performance by remapping it at the PMD level.
Link: https://lkml.kernel.org/r/20220907144521.3115321-11-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-11-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This test tests that MADV_COLLAPSE acting on file/shmem memory for which
(1) the file extent mapping by the memory is already a huge page in the
page cache, and (2) the pmd mapping this memory in the target process is
none.
In practice, (1)+(2) is the state left over after khugepaged has
successfully collapsed file/shmem memory for a target VMA, but the memory
has not yet been refaulted. So, this test in-effect tests MADV_COLLAPSE
racing with khugepaged to collapse the memory first.
Link: https://lkml.kernel.org/r/20220907144521.3115321-10-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-10-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add memory operations for shmem (memfd) memory, and reuse existing tests
with the new memory operations.
Shmem tests can be called with "shmem" mem_type, and shmem tests are ran
with "all" mem_type as well.
Link: https://lkml.kernel.org/r/20220907144521.3115321-9-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-9-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add memory operations for file-backed and tmpfs memory. Call existing
tests with these new memory operations to test collapse functionality of
khugepaged and MADV_COLLAPSE on file-backed and tmpfs memory. Not all
tests are reusable; for example, collapse_swapin_single_pte() which checks
swap usage.
Refactor test arguments. Usage is now:
Usage: ./khugepaged <test type> [dir]
<test type> : <context>:<mem_type>
<context> : [all|khugepaged|madvise]
<mem_type> : [all|anon|file]
"file,all" mem_type requires [dir] argument
"file,all" mem_type requires kernel built with
CONFIG_READ_ONLY_THP_FOR_FS=y
if [dir] is a (sub)directory of a tmpfs mount, tmpfs must be
mounted with huge=madvise option for khugepaged tests to work
Refactor calling tests to make it clear what collapse context / memory
operations they support, but only invoke tests requested by user. Also
log what test is being ran, and with what context / memory, to make test
logs more human readable.
A new test file is created and deleted for every test to ensure no pages
remain in the page cache between tests (tests also may attempt to collapse
different amount of memory).
For file-backed memory where the file is stored on a block device, disable
/sys/block/<device>/queue/read_ahead_kb so that pages don't find their way
into the page cache without the tests faulting them in.
Add file and shmem wrappers to vm_utils check for file and shmem hugepages
in smaps.
[zokeefe@google.com: fix "add thp collapse file and tmpfs testing" for
tmpfs]
Link: https://lkml.kernel.org/r/20220913212517.3163701-1-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220907144521.3115321-8-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-8-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Modularize operations to setup, cleanup, fault, and check for huge pages,
for a given memory type. This allows reusing existing tests with
additional memory types by defining new memory operations. Following
patches will add file and shmem memory types.
Link: https://lkml.kernel.org/r/20220907144521.3115321-7-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-7-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
These files:
tools/testing/selftests/vm/vm_util.c
tools/testing/selftests/vm/khugepaged.c
Both contain logic to:
1) Determine hugepage size on current system
2) Read /proc/self/smaps to determine number of THPs at an address
Refactor selftests/vm/khugepaged.c to use the vm_util common helpers and
add it as a build dependency.
Since selftests/vm/khugepaged.c is the largest user of check_huge(),
change the signature of check_huge() to match selftests/vm/khugepaged.c's
useage: take an expected number of hugepages, and return a bool indicating
if the correct number of hugepages were found. Add a wrapper,
check_huge_anon(), in anticipation of checking smaps for file and shmem
hugepages.
Update existing callsites to use the new pattern / function.
Likewise, check_for_pattern() was duplicated, and it's a general enough
helper to include in vm_util helpers as well.
Link: https://lkml.kernel.org/r/20220907144521.3115321-6-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-6-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MADV_COLLAPSE is a best-effort request that will set errno to an
actionable value if the request cannot be performed.
For example, if pages are not found on the LRU, or if they are currently
locked by something else, MADV_COLLAPSE will fail and set errno to EAGAIN
to inform callers that they may try again.
Since the khugepaged selftest is the first public use of MADV_COLLAPSE,
set a best practice of checking errno and retrying on EAGAIN.
Link: https://lkml.kernel.org/r/20220922184651.1016461-2-zokeefe@google.com
Fixes: 9330694de5 ("selftests/vm: add MADV_COLLAPSE collapse context to selftests")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/damon: minor fixes and cleanups".
This patchset contains minor fixes and cleanups for DAMON including
- selftest for a bug we found before (Patch 1),
- fix of region holes in vaddr corner case and a kunit test for it
(Patches 2 and 3), and
- documents/Kconfig updates for title wordsmithing (Patch 4) and more
aggressive DAMON debugfs interface deprecation announcement
(Patches 5-7).
This patch (of 7):
Commit d26f607036 ("mm/damon/dbgfs: avoid duplicate context directory
creation") fixes a bug which could result in memory leak and DAMON
disablement. This commit adds a selftest for verifying the fix and avoid
regression.
Link: https://lkml.kernel.org/r/20220909202901.57977-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220909202901.57977-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yun Levi <ppbuk5246@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add new pageblock_start_pfn() and pageblock_align() macro which are needed
by memblock tests.
Link: https://lkml.kernel.org/r/20220907082643.186979-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
HMM selftests use an in-kernel pseudo device to emulate device memory.
The pseudo device registers a major device range for two or four pseudo
device instances. User space has a script that reads /proc/devices in
order to find the assigned major number, and sends that to mknod(1), once
for each node.
Change this to properly use cdev and struct device APIs.
Delete the /proc/devices parsing from the user-space test script, now that
it is unnecessary.
Also, delete an unused field in struct dmirror_device: devmem.
Link: https://lkml.kernel.org/r/20220826050631.25771-1-mpenttil@redhat.com
Signed-off-by: Mika Penttilä <mpenttil@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-10-03
We've added 143 non-merge commits during the last 27 day(s) which contain
a total of 151 files changed, 8321 insertions(+), 1402 deletions(-).
The main changes are:
1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu.
2) Add support for struct-based arguments for trampoline based BPF programs,
from Yonghong Song.
3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa.
4) Batch of improvements to veristat selftest tool in particular to add CSV output,
a comparison mode for CSV outputs and filtering, from Andrii Nakryiko.
5) Add preparatory changes needed for the BPF core for upcoming BPF HID support,
from Benjamin Tissoires.
6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program
types, from Daniel Xu.
7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler.
8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer /
single-kernel-consumer semantics for BPF ring buffer, from David Vernet.
9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF
hashtab's bucket lock, from Hou Tao.
10) Allow creating an iterator that loops through only the resources of one
task/thread instead of all, from Kui-Feng Lee.
11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi.
12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT
entry which is not yet inserted, from Lorenzo Bianconi.
13) Remove invalid recursion check for struct_ops for TCP congestion control BPF
programs, from Martin KaFai Lau.
14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu.
15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa.
16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen.
17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu.
18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta.
19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits)
net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
Documentation: bpf: Add implementation notes documentations to table of contents
bpf, docs: Delete misformatted table.
selftests/xsk: Fix double free
bpftool: Fix error message of strerror
libbpf: Fix overrun in netlink attribute iteration
selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged"
samples/bpf: Fix typo in xdp_router_ipv4 sample
bpftool: Remove unused struct event_ring_info
bpftool: Remove unused struct btf_attach_point
bpf, docs: Add TOC and fix formatting.
bpf, docs: Add Clang note about BPF_ALU
bpf, docs: Move Clang notes to a separate file
bpf, docs: Linux byteswap note
bpf, docs: Move legacy packet instructions to a separate file
selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)
bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function
bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
...
====================
Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Improved instruction encoding infrastructure for
instructions not yet supported by binutils
- Svinval support for both KVM Host and KVM Guest
- Zihintpause support for KVM Guest
- Zicbom support for KVM Guest
- Record number of signal exits as a VCPU stat
- Use generic guest entry infrastructure
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmM5IRgACgkQrUjsVaLH
LAfNxg//TuUVC230Yh88WJNIQzX7Jf587E7DA5kHdLV/Lai/KSqoeaegbJ+XLSCp
IEC2sDabWO3M3auoyF51NfCfLIR1qkR0xq/gwV6QlsgtTuCBTpdI7Yqg/GFuaZnv
JMmkFxfprEtH9QLISYjt2xDUHqcorFEyislL2gg5iEilMR2kWDc5ZmMCQge1CwR0
ldo5w9PSQM9CFmhjY9Gg6/Gx8QzfpEDxGNtn8KIZaBFUalGcj6gUYpFJDmAQFbXG
k825s00gonEMrx3tGcp4URtQNW5Tnuxqb1vCoGm5v+vcQdRFbWrsBzxki96qPgvk
iSbc3rqSCquyWQzUoIiPZ08/rkSW1Of4MwoffD3E9XyjjjlRnwOj85G5lB1Mtwb7
zIf65/lfid5O+gUqBz1xPXNZt1MzcoiAL/1Yd9hijIzHlrESxwXL3jfzfpWV7MoT
zc1v7Y5DKaiYBVhlE1zh0Fm/CLkS80AP/ndK5scsF/LW+U3G+nvmWAD9oOr5uB/Z
CkdcWykZZ0iw5dNwyxTg9lK0tFw/4QDaQPLiDjG/rokcER4ky0dtW3kmbTNznfGn
c+OKEML9jMuY2pJ0RwmXZZ6bsBUBa83J5qVcREQYWljhJ9hSqQX0k3aXE24DUYSV
fBbmDDD/9jhIf06K75RDdFVDV9itVrWEjDMr5GPr3EM3A7+LekM=
=qbIY
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-6.1-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.1
- Improved instruction encoding infrastructure for
instructions not yet supported by binutils
- Svinval support for both KVM Host and KVM Guest
- Zihintpause support for KVM Guest
- Zicbom support for KVM Guest
- Record number of signal exits as a VCPU stat
- Use generic guest entry infrastructure
- Fixes for single-stepping in the presence of an async
exception as well as the preservation of PSTATE.SS
- Better handling of AArch32 ID registers on AArch64-only
systems
- Fixes for the dirty-ring API, allowing it to work on
architectures with relaxed memory ordering
- Advertise the new kvmarm mailing list
- Various minor cleanups and spelling fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmM5hQcPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDoMUP/jra4HSmujLUB5G7Op8HxuurEecOc6xtw0Af
AbDLlVc2Vs4rrdVh8GMc8D80atUAVitp8IFjdp/PzI2GTBTzWz43Gav2AbhgIJbJ
xoFVHL8LkdHKyMbq10359DqGMqhIf41OFzGwhbzcx2V4pKNkSpjbCpu3bi/+Ybjg
006ZpZc7NAU0rZgw9Flb/dhn0jw7RMc3orhoDQ4tBp1P/VhvqvgFt5bWipkvvBP7
+lQK28ujG3ghST/hKRhg6ozgy5+6NEEHMuhErMYP8nIivRchX+pWF2Lb0qGH1e+U
v2MZIZnIIUjyTV1vbYlxtltzfYmPuQ2MFNUBawI9tmlIOU9vJSCzeJS64uWK4KLV
kbmk57OfC7rQoSNJH4jaKQp0YpIktrB9Vei97t4I7NwEmkjQj6cLTgg4tQrNqTiQ
cFGeC9mE+lhFC8z1lCbna2eG631FxpPrB1SJ1/CU9wboam9dUfXGIvBPh+i2pvMZ
vcxzUZJ11y+/uhp4k8i2PBwNno0iwRXd5MinwRUs2CR5vhs8qa5y7FVWKyqKpgI2
xqr4lYTixJZL3mWkYyOQuClrTbT1zkoaPldLq6M7wvO08+QV8ryMeyKT+9s/gNQU
dcYSwBCWZaOZm2nN8/zjxRb7VqZVu3cwyXi9XXUWNTCgIe/Q/SDPbXU/Hwbgzf8X
UsQF7e9A
=aNPK
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for v6.1
- Fixes for single-stepping in the presence of an async
exception as well as the preservation of PSTATE.SS
- Better handling of AArch32 ID registers on AArch64-only
systems
- Fixes for the dirty-ring API, allowing it to work on
architectures with relaxed memory ordering
- Advertise the new kvmarm mailing list
- Various minor cleanups and spelling fixes
This pull request provides nolibc updates, most notably greatly improved
testing. These tests are located in tools/testing/selftests/nolibc. The
output of "make help" is as follows:
Supported targets under selftests/nolibc:
all call the "run" target below
help this help
sysroot create the nolibc sysroot here (uses $ARCH)
nolibc-test build the executable (uses $CC and $CROSS_COMPILE)
initramfs prepare the initramfs with nolibc-test
defconfig create a fresh new default config (uses $ARCH)
kernel (re)build the kernel with the initramfs (uses $ARCH)
run runs the kernel in QEMU after building it (uses $ARCH, $TEST)
rerun runs a previously prebuilt kernel in QEMU (uses $ARCH, $TEST)
clean clean the sysroot, initramfs, build and output files
The output file is "run.out". Test ranges may be passed using $TEST.
Currently using the following variables:
ARCH = x86
CROSS_COMPILE =
CC = gcc
OUTPUT = /home/git/linux-rcu/tools/testing/selftests/nolibc/
TEST =
QEMU_ARCH = x86_64 [determined from $ARCH]
IMAGE_NAME = bzImage [determined from $ARCH]
The output of a successful x86 "make run" is currently as follows,
with kernel build output omitted:
$ make run
71 test(s) passed.
$
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmM5AeATHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jJCLD/wIps8OlIvNMlvT0Bwmi1gG6kTdJrXb
Cj495z9F1P8c1x2nZBVuwPM05KM4SUvMLBzkND8le/AqrDP2Vc9WurEjnx6aYbjw
aNrWeeVOYF2ykROpEyolF92bV0tu2aZEYmZ12Rxb3ybKYW8wf5t2M6JbU9VvLATB
TNPov6cBUjJLqF5AVf4pJ3FNjGJH392NyTzGuKFYN4kK8XGPSxKFfpb8jJX/Jnr/
OQPZZurHclgIJzWYwjWpWBY5NBEMSUI8K2KqQIFzeVdNGpWmAyAyciCDG3zD+Rp1
GtcHrPEgvBQnPd5o89eZonMPHkIC5kOVU3Ebwzzx30GWQO8cseCZCb9foJCu918q
wbJYPvopDw1Kd7NtltTzWj/xWkINgBcqdki5bEtcOW8i71RX8tgzUVx/tz771Vok
/j21se9Svwi8ZnAY+dTxMNdy0Jd7eI0dJO6fMzagrQamDKZzNaO50WjiCW53/Eln
JSttwSneWJ190PJ5ty6lNQ6OLGuILSXWhbYzRplgHiEcSwjtpxccgMRkiNQlQiow
hXRNKQU/A7q3b5HPpf/kdQFGEk/aqFNDTpTuz1JCsCY2f9WQ6qz5stQuUKfFxyVh
YD82JBI1DHtucs5dD/4ZbparHrJGG/NsHoe61wZYpXVrcJWfB0bMiie3zQ1cGoWN
t6lw39lNotQbsw==
=wqBE
-----END PGP SIGNATURE-----
Merge tag 'nolibc.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
"Most notably greatly improved testing. These tests are located in
tools/testing/selftests/nolibc. The output of "make help" is as
follows:
Supported targets under selftests/nolibc:
all call the "run" target below
help this help
sysroot create the nolibc sysroot here (uses $ARCH)
nolibc-test build the executable (uses $CC and $CROSS_COMPILE)
initramfs prepare the initramfs with nolibc-test
defconfig create a fresh new default config (uses $ARCH)
kernel (re)build the kernel with the initramfs (uses $ARCH)
run runs the kernel in QEMU after building it (uses $ARCH, $TEST)
rerun runs a previously prebuilt kernel in QEMU (uses $ARCH, $TEST)
clean clean the sysroot, initramfs, build and output files
The output file is "run.out". Test ranges may be passed using $TEST.
Currently using the following variables:
ARCH = x86
CROSS_COMPILE =
CC = gcc
OUTPUT = /home/git/linux-rcu/tools/testing/selftests/nolibc/
TEST =
QEMU_ARCH = x86_64 [determined from $ARCH]
IMAGE_NAME = bzImage [determined from $ARCH]
The output of a successful x86 "make run" is currently as follows,
with kernel build output omitted:
$ make run
71 test(s) passed."
* tag 'nolibc.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
selftests/nolibc: Avoid generated files being committed
selftests/nolibc: add a "help" target
selftests/nolibc: "sysroot" target installs a local copy of the sysroot
selftests/nolibc: add a "run" target to start the kernel in QEMU
selftests/nolibc: add a "defconfig" target
selftests/nolibc: add a "kernel" target to build the kernel with the initramfs
selftests/nolibc: support glibc as well
selftests/nolibc: condition some tests on /proc existence
selftests/nolibc: recreate and populate /dev and /proc if missing
selftests/nolibc: on x86, support exiting with isa-debug-exit
selftests/nolibc: exit with poweroff on success when getpid() == 1
selftests/nolibc: add a few tests for some libc functions
selftests/nolibc: implement a few tests for various syscalls
selftests/nolibc: support a test definition format
selftests/nolibc: add basic infrastructure to ease creation of nolibc tests
tools/nolibc: make sys_mmap() automatically use the right __NR_mmap definition
tools/nolibc: fix build warning in sys_mmap() when my_syscall6 is not defined
tools/nolibc: make argc 32-bit in riscv startup code
After the previous patches, the MPTCP protocol can generate
fast-closes on both ends of the connection. Rework the relevant
test-case to carefully trigger the fast-close code-path on a
single end at the time, while ensuring than a predictable amount
of data is spooled on both ends.
Additionally add another test-cases for the passive socket
fast-close.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Refactor selftests to use an array of structs in xfrm_fill_key().
From Gautam Menghani.
2) Drop an unused argument from xfrm_policy_match.
From Hongbin Wang.
3) Support collect metadata mode for xfrm interfaces.
From Eyal Birger.
4) Add netlink extack support to xfrm.
From Sabrina Dubroca.
Please note, there is a merge conflict in:
include/net/dst_metadata.h
between commit:
0a28bfd497 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")
from the net-next tree and commit:
5182a5d48c ("net: allow storing xfrm interface metadata in metadata_dst")
from the ipsec-next tree.
Can be solved as done in linux-next.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* kvm-arm64/misc-6.1:
: .
: Misc KVM/arm64 fixes and improvement for v6.1
:
: - Simplify the affinity check when moving a GICv3 collection
:
: - Tone down the shouting when kvm-arm.mode=protected is passed
: to a guest
:
: - Fix various comments
:
: - Advertise the new kvmarm@lists.linux.dev and deprecate the
: old Columbia list
: .
KVM: arm64: Advertise new kvmarm mailing list
KVM: arm64: Fix comment typo in nvhe/switch.c
KVM: selftests: Update top-of-file comment in psci_test
KVM: arm64: Ignore kvm-arm.mode if !is_hyp_mode_available()
KVM: arm64: vgic: Remove duplicate check in update_affinity_collection()
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/dirty-log-ordered:
: .
: Retrofit some ordering into the existing API dirty-ring by:
:
: - relying on acquire/release semantics which are the default on x86,
: but need to be explicit on arm64
:
: - adding a new capability that indicate which flavor is supported, either
: with explicit ordering (arm64) or both implicit and explicit (x86),
: as suggested by Paolo at KVM Forum
:
: - documenting the requirements for this new capability on weakly ordered
: architectures
:
: - updating the selftests to do the right thing
: .
KVM: selftests: dirty-log: Use KVM_CAP_DIRTY_LOG_RING_ACQ_REL if available
KVM: selftests: dirty-log: Upgrade flag accesses to acquire/release semantics
KVM: Document weakly ordered architecture requirements for dirty ring
KVM: x86: Select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL
KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option
KVM: Use acquire/release semantics when accessing dirty ring GFN state
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since three patchsets "add tc-testing test cases", "refactor duplicate
codes in the tc cls walk function", and "refactor duplicate codes in the
qdisc class walk function" are merged to net-next tree, the list of
supported features needs to be updated in config file.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220929041909.83913-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Skip tests that require EPT when it is not available
* Do not hang when a test fails with an empty stack trace
* avoid spurious failure when running access_tracking_perf_test in a KVM guest
* work around GCC's tendency to optimize loops into mem*() functions, which
breaks because the guest code in selftests cannot call into PLTs
* fix -Warray-bounds error in fix_hypercall_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM2x0MUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMLBggAmQpzSTuUq8Vn4fbXg8NpxofzZVzo
hWagwlm29Ac7jfoSa4lOX4nbkyb5CjEL1O2qSd1/lJuZndMxQhmLe3Bnzu2Mzqp0
7WFXr0lAiOeH4hw8NE9Kx4x/vJ8nev4gsOlWNh7rr1RlP3vgHTtyYN/J7UoNnvTl
fcbD1Bi/A+2RnzMQ6cHms8lqsAfts/OZIMTCUf0TW2EhLmrN5CLwQkRNNh1Ql8gX
/ZvLs6C/FEkgok6Wfc6mMrIY+Wlu7en6/pHAjeORAoieGC2CYh0Od3ETzpe7Qx+L
L/W3BiOYlvQBqEQmgzd6bpVNDlqhn0GGczY5rkB54Es7jfxUjLU2myqw+g==
=bb0j
-----END PGP SIGNATURE-----
Merge tag 'for-linus-6.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"A small fix to the reported set of supported CPUID bits, and selftests
fixes:
- Skip tests that require EPT when it is not available
- Do not hang when a test fails with an empty stack trace
- avoid spurious failure when running access_tracking_perf_test in a
KVM guest
- work around GCC's tendency to optimize loops into mem*() functions,
which breaks because the guest code in selftests cannot call into
PLTs
- fix -Warray-bounds error in fix_hypercall_test"
* tag 'for-linus-6.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Compare insn opcodes directly in fix_hypercall_test
KVM: selftests: Implement memcmp(), memcpy(), and memset() for guest use
KVM: x86: Hide IA32_PLATFORM_DCA_CAP[31:0] from the guest
KVM: selftests: Gracefully handle empty stack traces
KVM: selftests: replace assertion with warning in access_tracking_perf_test
KVM: selftests: Skip tests that require EPT when it is not available
Fix a double free at exit of the test suite.
Fixes: a693ff3ed5 ("selftests/xsk: Add support for executing tests on physical device")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20220929090133.7869-1-magnus.karlsson@gmail.com