The pattern of checking nocb_defer_wakeup and deleting the timer is
duplicated in __wake_nocb_gp() and nocb_gp_wait(). Extract this into a
common helper function nocb_defer_wakeup_cancel().
This removes code duplication and makes it easier to maintain.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
During callback overload (exceeding qhimark), the NOCB code attempts
opportunistic advancement via rcu_advance_cbs_nowake(). Analysis shows
this code path is practically unreachable and serves no useful purpose.
Testing with 300,000 callback floods showed:
- 30 overload conditions triggered
- 0 advancements actually occurred
While a theoretical window exists where this code could execute (e.g.,
vCPU preemption between gp_seq update and rcu_nocb_gp_cleanup()), even
if it did, the advancement would be redundant. The rcuog kthread must
still run to wake the rcuoc callback thread - we would just be
duplicating work that rcuog will perform when it finally gets to run.
Since this path provides no meaningful benefit and extensive testing
confirms it is never useful, remove it entirely.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
The WakeOvfIsDeferred code path in __call_rcu_nocb_wake() attempts to
wake rcuog when the callback count exceeds qhimark and callbacks aren't
done with their GP (newly queued or awaiting GP). However, a lot of
testing proves this wake is always redundant or useless.
In the flooding case, rcuog is always waiting for a GP to finish. So
waking up the rcuog thread is pointless. The timer wakeup adds overhead,
rcuog simply wakes up and goes back to sleep achieving nothing.
This path also adds a full memory barrier, and additional timer expiry
modifications unnecessarily.
The root cause is that WakeOvfIsDeferred fires when
!rcu_segcblist_ready_cbs() (GP not complete), but waking rcuog cannot
accelerate GP completion.
This commit therefore removes this path.
Tested with rcutorture scenarios: TREE01, TREE05, TREE08 (all NOCB
configurations) - all pass. Also stress tested using a kernel module
that floods call_rcu() to trigger the overload conditions and made the
observations confirming the findings.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-misc.20260111a:
rcu: Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early
srcu: Use suitable gfp_flags for the init_srcu_struct_nodes()
rcu: Fix rcu_read_unlock() deadloop due to softirq
rcutorture: Correctly compute probability to invoke ->exp_current()
rcu: Make expedited RCU CPU stall warnings detect stall-end races
The RCU grace period mechanism uses a two-phase FQS (Force Quiescent
State) design where the first FQS saves dyntick-idle snapshots and
the second FQS compares them. This results in long and unnecessary latency
for synchronize_rcu() on idle systems (two FQS waits of ~3ms each with
1000HZ) whenever one FQS wait sufficed.
Some investigations showed that the GP kthread's CPU is the holdout CPU
a lot of times after the first FQS as - it cannot be detected as "idle"
because it's actively running the FQS scan in the GP kthread.
Therefore, at the end of rcu_gp_init(), immediately report a quiescent
state for the GP kthread's CPU using rcu_qs() + rcu_report_qs_rdp(). The
GP kthread cannot be in an RCU read-side critical section while running
GP initialization, so this is safe and results in significant latency
improvements.
The following tests were performed:
(1) synchronize_rcu() benchmarking
100 synchronize_rcu() calls with 32 CPUs, 10 runs each (default fqs
jiffies settings):
Baseline (without fix):
| Run | Mean | Min | Max |
|-----|-----------|----------|-----------|
| 1 | 10.088 ms | 9.989 ms | 18.848 ms |
| 2 | 10.064 ms | 9.982 ms | 16.470 ms |
| 3 | 10.051 ms | 9.988 ms | 15.113 ms |
| 4 | 10.125 ms | 9.929 ms | 22.411 ms |
| 5 | 8.695 ms | 5.996 ms | 15.471 ms |
| 6 | 10.157 ms | 9.977 ms | 25.723 ms |
| 7 | 10.102 ms | 9.990 ms | 20.224 ms |
| 8 | 8.050 ms | 5.985 ms | 10.007 ms |
| 9 | 10.059 ms | 9.978 ms | 15.934 ms |
| 10 | 10.077 ms | 9.984 ms | 17.703 ms |
With fix:
| Run | Mean | Min | Max |
|-----|----------|----------|-----------|
| 1 | 6.027 ms | 5.915 ms | 8.589 ms |
| 2 | 6.032 ms | 5.984 ms | 9.241 ms |
| 3 | 6.010 ms | 5.986 ms | 7.004 ms |
| 4 | 6.076 ms | 5.993 ms | 10.001 ms |
| 5 | 6.084 ms | 5.893 ms | 10.250 ms |
| 6 | 6.034 ms | 5.908 ms | 9.456 ms |
| 7 | 6.051 ms | 5.993 ms | 10.000 ms |
| 8 | 6.057 ms | 5.941 ms | 10.001 ms |
| 9 | 6.016 ms | 5.927 ms | 7.540 ms |
| 10 | 6.036 ms | 5.993 ms | 9.579 ms |
Summary:
- Mean latency: 9.75 ms -> 6.04 ms (38% improvement)
- Max latency: 25.72 ms -> 10.25 ms (60% improvement)
(2) Bridge setup/teardown latency (Uladzislau Rezki)
x86_64 with 64 CPUs, 100 iterations of bridge add/configure/delete:
real time
1 - default: 24.221s
2 - this patch: 20.754s (14% faster)
3 - this patch + wake_from_gp: 15.895s (34% faster)
4 - wake_from_gp only: 18.947s (22% faster)
Per-synchronize_rcu() latency (in usec):
1 2 3 4
median: 37249.5 31540.5 15765 22480
min: 7881 7918 9803 7857
max: 63651 55639 31861 32040
This patch combined with rcu_normal_wake_from_gp reduces bridge
setup/teardown time from 24 seconds to 16 seconds.
(3) CPU overhead verification (Uladzislau Rezki)
System CPU time across 5 runs showed no measurable increase:
default: 1.698s - 1.937s
this patch: 1.667s - 1.930s
Conclusion: variations are within noise, no CPU overhead regression.
(4) rcutorture
Tested TREE and SRCU configurations - no regressions.
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Samir M <samir@linux.ibm.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
For use the init_srcu_struct*() to initialized srcu structure,
the srcu structure's->srcu_sup and sda use GFP_KERNEL flags to
allocate memory. similarly, if set SRCU_SIZING_INIT, the
srcu_sup's->node can still use GFP_KERNEL flags to allocate
memory, not need to use GFP_ATOMIC flags all the time.
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Lack of parentheses causes the ->exp_current() function, for example,
srcu_expedite_current(), to be called only once in four billion times
instead of the intended once in 256 times. This commit therefore adds
the needed parentheses.
Reported-by: Chris Mason <clm@meta.com>
Reported-by: Joel Fernandes <joelagnelf@nvidia.com>
Fixes: 950063c6e8 ("rcutorture: Test srcu_expedite_current()")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
If an expedited RCU CPU stall ends just at the stall-warning timeout,
the current code will print an expedited stall-warning message, but one
that doesn't identify any CPUs or tasks causing the stall. This is most
likely to happen for short-timeout stalls, for example, the 20-millisecond
timeouts that are sometimes used for small embedded devices. Needless to
say, these semi-empty stall-warning messages can be rather confusing.
One option would be to suppress the stall-warning message entirely in
this case, but the near-miss information can be quite valuable.
Detect this race condition and emits a "INFO: Expedited stall ended
before state dump start" message to clarify matters.
[boqun: Apply feedback from Borislav]
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
* rcu-torture.20260104a:
rcutorture: Add --kill-previous option to terminate previous kvm.sh runs
rcutorture: Prevent concurrent kvm.sh runs on same source tree
torture: Include commit discription in testid.txt
torture: Make config2csv.sh properly handle comments in .boot files
torture: Make kvm-series.sh give run numbers and totals
torture: Make kvm-series.sh give build numbers and totals
torture: Parallelize kvm-series.sh guest-OS execution
rcutorture: Add context checks to rcu_torture_timer()
This commit adds irq, NMI, and softirq context checks to the
rcu_torture_timer() function. Just because you are paranoid does not
mean that they are not out to get you... ;-)
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
This commit adds a ->exp_current member to the tasks_tracing_ops structure
to test the rcu_tasks_trace_expedite_current() function.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
When expressing RCU Tasks Trace in terms of SRCU-fast, it was
necessary to keep a nesting count and per-CPU srcu_ctr structure
pointer in the task_struct structure, which is slow to access.
But an alternative is to instead make rcu_read_lock_tasks_trace() and
rcu_read_unlock_tasks_trace(), which match the underlying SRCU-fast
semantics, avoiding the task_struct accesses.
When all callers have switched to the new API, the previous
rcu_read_lock_trace() and rcu_read_unlock_trace() APIs will be removed.
The rcu_read_{,un}lock_{,tasks_}trace() functions need to use smp_mb()
only if invoked where RCU is not watching, that is, from locations where
a call to rcu_is_watching() would return false. In architectures that
define the ARCH_WANTS_NO_INSTR Kconfig option, use of noinstr and friends
ensures that tracing happens only where RCU is watching, so those
architectures can dispense entirely with the read-side calls to smp_mb().
Other architectures include these read-side calls by default, but in many
installations there might be either larger than average tolerance for
risk, prohibition of removing tracing on a running system, or careful
review and approval of removing of tracing. Such installations can
build their kernels with CONFIG_TASKS_TRACE_RCU_NO_MB=y to avoid those
read-side calls to smp_mb(), thus accepting responsibility for run-time
removal of tracing from code regions that RCU is not watching.
Those wishing to disable read-side memory barriers for an entire
architecture can select this TASKS_TRACE_RCU_NO_MB Kconfig option,
hence the polarity.
[ paulmck: Apply Peter Zijlstra feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Moving the rcu_tasks_trace_srcu_struct structure instance out
from under the CONFIG_TASKS_RCU_GENERIC Kconfig option permits
the CONFIG_TASKS_TRACE_RCU Kconfig option to stop enabling this
CONFIG_TASKS_RCU_GENERIC Kconfig option. This commit also therefore
makes it so.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Now that RCU Tasks Trace has been re-implemented in terms of SRCU-fast,
the ->trc_ipi_to_cpu, ->trc_blkd_cpu, ->trc_blkd_node, ->trc_holdout_list,
and ->trc_reader_special task_struct fields are no longer used.
In addition, the rcu_tasks_trace_qs(), rcu_tasks_trace_qs_blkd(),
exit_tasks_rcu_finish_trace(), and rcu_spawn_tasks_trace_kthread(),
show_rcu_tasks_trace_gp_kthread(), rcu_tasks_trace_get_gp_data(),
rcu_tasks_trace_torture_stats_print(), and get_rcu_tasks_trace_gp_kthread()
functions and all the other functions that they invoke are no longer used.
Also, the TRC_NEED_QS and TRC_NEED_QS_CHECKED CPP macros are no longer used.
Neither are the rcu_tasks_trace_lazy_ms and rcu_task_ipi_delay rcupdate
module parameters and the TASKS_TRACE_RCU_READ_MB Kconfig option.
This commit therefore removes all of them.
[ paulmck: Apply Alexei Starovoitov feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
This commit saves more than 500 lines of RCU code by re-implementing
RCU Tasks Trace in terms of SRCU-fast. Follow-up work will remove
more code that does not cause problems by its presence, but that is no
longer required.
This variant places smp_mb() in rcu_read_{,un}lock_trace(), and in the
same place that srcu_read_{,un}lock() would put them. These smp_mb()
calls will be removed on common-case architectures in a later commit.
In the meantime, it serves to enforce ordering between the underlying
srcu_read_{,un}lock_fast() markers and the intervening critical section,
even on architectures that permit attaching tracepoints on regions of
code not watched by RCU. Such architectures defeat SRCU-fast's use of
implicit single-instruction, interrupts-disabled, and atomic-operation
RCU read-side critical sections, which have no effect when RCU is not
watching. The aforementioned later commit will insert these smp_mb()
calls only on architectures that have not used noinstr to prevent
attaching tracepoints to code where RCU is not watching.
[ paulmck: Apply kernel test robot, Boqun Feng, and Zqiang feedback. ]
[ paulmck: Split out Tiny SRCU fixes per Andrii Nakryiko feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
SRCU
----
_ Properly handle SRCU readers within IRQ disabled sections in tiny SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture.
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write side
(using full RCU grace period ordering instead of simply full
ordering) as compared to "traditional" non-fast SRCU. But those
srcu-fast flavours are going to be optimized in two different
ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next cycle.
Only the new interfaces are added for now, along with related
torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering) without
waiting for the read side to tell which to use. Also this optimizes
the read side altogether with moving flavour debug checks under debug
config and with removing a costly RmW operation on their first call.
- Make some diagnostic functions tracing safe.
REFSCALE
-------
Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code.
MISCELLANOUS
------
- In order to prepare the layout for nohz_full work deferral to
user exit, the context tracking state must shrink the counter
of transitions to/from RCU not watching. The only possible hazard
is to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption.
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
On recent discussions, we also concluded that all those WRITE_ONCE()
and READ_ONCE() on list APIs deserve appropriate comments. Something
to be expected for the next cycle.
- Provide a script to apply several configs to several commits with torture.
- Allow torture to reuse a build directory in order to save needless
rebuild time.
- Various cleanups.
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmksubgbFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJEIUkVEdQjox3MJ0P/Aq+AOPWuaE70B8SxVYY
VfucjjzZwSyI30Vw4GE8skNVMOnyY2wPIrNRgmckghi9LM5luureMcks9esrdFfv
csWj3AhnBu+/Lyd4EhP8AxMH2OPIX50FUrAQaY7LAcmc4nIng1MqRm35eT776XzR
tN07+nsXjNumPcdN8JAv/VzInUf5DdMvhy+Qw0Krpvan/sUpzzpFpNUWdHsyZ2Re
UCQr//AooQLfCIIIHUYlviThqigJETZlv0NKYrD0Zaxx5MCOwwWmz5otQzacSRk2
Zgn6Bayp98s0972sOghiMRXEoSvqBSJe5eVTWcj1oNLruO3flpyREgE5Wuo8T8Bg
BHvOiYaPNC5FW+btCWOQcKcSx1mxrgsX3/9OAth+kmtPqZdJSQEeY+zqWzdN7ZlV
q5+oipsFONvWXL6+U6uZf44MG1pscrzn6QvXajjg98iaLZgY0+mkAhvtpjcfuTwZ
2CXYQKacfwftpOXgM2QQGB4QE8CKXchlma381KbTHv0rX8zfK0BnWoe/oGPz22GL
3kldmhfIC/87i8t9HHngkU6SZ+Mkvf9vC28eclnuXB1PKCRsBHePGvi3I671kbLj
hycBwnwWriULEwKYTVQ8gmdBHSANB4IpmMYflADVqLuvp+7U6W5mNjWNTcXoSwFl
NoR7AViEUNzOEoa8u6WMKoec
=XnUt
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Properly handle SRCU readers within IRQ disabled sections in tiny
SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write
side (using full RCU grace period ordering instead of simply
full ordering) as compared to "traditional" non-fast SRCU. But
those srcu-fast flavours are going to be optimized in two
different ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next
cycle. Only the new interfaces are added for now, along with
related torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering)
without waiting for the read side to tell which to use.
This also optimizes the read side altogether with moving flavour
debug checks under debug config and with removing a costly RmW
operation on their first call.
- Make some diagnostic functions tracing safe
Refscale:
- Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code
Miscellanous:
- In order to prepare the layout for nohz_full work deferral to user
exit, the context tracking state must shrink the counter of
transitions to/from RCU not watching. The only possible hazard is
to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN
warnings. On recent discussions, we also concluded that all those
WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
comments. Something to be expected for the next cycle
- Provide a script to apply several configs to several commits with
torture
- Allow torture to reuse a build directory in order to save needless
rebuild time
- Various cleanups"
* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
refscale: Add SRCU-fast-updown readers
refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
srcu: Create an SRCU-fast-updown API
refscale: Do not disable interrupts for tests involving local_bh_enable()
refscale: Add non-atomic per-CPU increment readers
refscale: Add this_cpu_inc() readers
refscale: Add preempt_disable() readers
refscale: Add local_bh_disable() readers
refscale: Add local_irq_disable() and local_irq_save() readers
torture: Permit negative kvm.sh --kconfig numberic arguments
srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
rcu: Mark diagnostic functions as notrace
rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
rcutorture: Permit kvm-again.sh to re-use the build directory
torture: Add kvm-series.sh to test commit/scenario combination
rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
locktorture: Fix memory leak in param_set_cpumask()
doc: Update for SRCU-fast definitions and initialization
...
- In order to prepare the layout for nohz_full work deferral to
user exit, the context tracking state must shrink the counter
of transitions to/from RCU not watching. The only possible hazard
is to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption.
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
On recent discussions, we also concluded that all those WRITE_ONCE()
and READ_ONCE() on list APIs deserve appropriate comments. Something
to be expected for the next cycle.
- Provide a script to apply several configs to several commits with torture.
- Allow torture to reuse a build directory in order to save needless
rebuild time.
- Various cleanups.
Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code.
This commit adds refscale readers based on srcu_read_lock_fast_updown()
and srcu_read_unlock_fast_updown()
("refscale.scale_type=srcu-fast-updown"). On my x86 laptop, these are
about 2.2ns per pair.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit updates the initialization for the "srcu-fast" scale
type to use DEFINE_STATIC_SRCU_FAST() when reader_flavor is equal to
SRCU_READ_FLAVOR_FAST.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit causes rcutorture's srcu_torture_init() and
srcud_torture_init() functions to announce on the console log
which variant of SRCU is being tortured, for example: "torture:
srcud_torture_init fast SRCU".
[ paulmck: Apply feedback from kernel test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit creates an SRCU-fast-updown API, including
DEFINE_SRCU_FAST_UPDOWN(), DEFINE_STATIC_SRCU_FAST_UPDOWN(),
__init_srcu_struct_fast_updown(), init_srcu_struct_fast_updown(),
srcu_read_lock_fast_updown(), srcu_read_unlock_fast_updown(),
__srcu_read_lock_fast_updown(), and __srcu_read_unlock_fast_updown().
These are initially identical to their SRCU-fast counterparts, but both
SRCU-fast and SRCU-fast-updown will be optimized in different directions
by later commits. SRCU-fast will lack any sort of srcu_down_read() and
srcu_up_read() APIs, which will enable extremely efficient NMI safety.
For its part, SRCU-fast-updown will not be NMI safe, which will enable
reasonably efficient implementations of srcu_down_read_fast() and
srcu_up_read_fast().
This API fork happens to meet two different future use cases.
* SRCU-fast will become the reimplementation basis for RCU-TASK-TRACE
for consolidation. Since RCU-TASK-TRACE must be NMI safe, SRCU-fast
must be as well.
* SRCU-fast-updown will be needed for uretprobes code in order to get
rid of the read-side memory barriers while still allowing entering the
reader at task level while exiting it in a timer handler.
This commit also adds rcutorture tests for the new APIs. This
(annoyingly) needs to be in the same commit for bisectability. With this
commit, the 0x8 value tests SRCU-fast-updown. However, most SRCU-fast
testing will be via the RCU Tasks Trace wrappers.
[ paulmck: Apply s/0x8/0x4/ missing change per Boqun Feng feedback. ]
[ paulmck: Apply Akira Yokosawa feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
set_tsk_need_resched(current) requires set_preempt_need_resched(current) to
work correctly outside of the scheduler.
Provide set_need_resched_current() which wraps this correctly and replace
all the open coded instances.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251116174750.665769842@linutronix.de
Some kernel configurations prohibit invoking local_bh_enable() while
interrupts are disabled. However, refscale disables interrupts to reduce
OS noise during the tests, which results in splats. This commit therefore
adds an ->enable_irqs flag to the ref_scale_ops structure, and refrains
from disabling interrupts when that flag is set. This flag is set for
the "bh" and "incpercpubh" scale_type module-parameter values.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds refscale readers based on READ_ONCE() and WRITE_ONCE()
that are unprotected (can lose counts, "refscale.scale_type=incpercpu"),
preempt-disabled ("refscale.scale_type=incpercpupreempt"),
bh-disabled ("refscale.scale_type=incpercpubh"), and irq-disabled
("refscale.scale_type=incpercpuirqsave"). On my x86 laptop, these are
about 4.3ns, 3.8ns, and 7.3ns per pair, respectively.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds refscale readers based on this_cpu_inc() and
this_cpu_inc() ("refscale.scale_type=percpuinc"). On my x86 laptop,
these are about 4.5ns per pair.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds refscale readers based on preempt_disable() and
preempt_enable() ("refscale.scale_type=preempt"). On my x86 laptop, these
are about 2.8ns.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds refscale readers based on local_bh_disable() and
local_bh_enable() ("refscale.scale_type=bh"). On my x86 laptop, these
are about 4.9ns.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds refscale readers based on local_irq_disable() and
local_irq_enable() ("refscale.scale_type=irq") and on local_irq_save()
and local_irq_restore ("refscale.scale_type=irqsave"). On my x86 laptop,
these are about 2.8ns and 7.5ns per enable/disable pair, respectively.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds the SRCU_READ_FLAVOR_FAST_UPDOWN=0x8 macro
and adjusts rcutorture to make use of it. In this commit, both
SRCU_READ_FLAVOR_FAST=0x4 and the new SRCU_READ_FLAVOR_FAST_UPDOWN
test SRCU-fast. When the SRCU-fast-updown is added, the new
SRCU_READ_FLAVOR_FAST_UPDOWN macro will test it when passed to the
rcutorture.reader_flavor module parameter.
The old SRCU_READ_FLAVOR_FAST macro's value changed from 0x8 to 0x4.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
The rcu_lockdep_current_cpu_online(), rcu_read_lock_sched_held(),
rcu_read_lock_held(), rcu_read_lock_bh_held(), rcu_read_lock_any_held()
are used by tracing-related code paths, so putting traces on them is
unlikely to make anyone happy. This commit therefore marks them all
"notrace".
Reported-by: Leon Hwang <leon.hwang@linux.dev>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit removes a harmless but potentially confusing invocation of
rcutorture_one_extend() within rcu_torture_one_read(). The immediately
preceding call to rcu_torture_one_read_start() already does this cleanup,
and the other call to rcu_torture_one_read_start() already relies on this.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds CONFIG_PROVE_RCU=y checking to enforce the new rule that
srcu_struct structures passed to srcu_read_lock_fast() and other SRCU-fast
read-side markers be either initialized with init_srcu_struct_fast()
on the one hand or defined using either DEFINE_SRCU_FAST() or
DEFINE_STATIC_SRCU_FAST(). This will enable removal of the non-debug
read-side checks from srcu_read_lock_fast() and friends, which on my
laptop provides a 25% speedup (which admittedly amounts to about half
a nanosecond, but when tracing fastpaths...)
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit updates the initialization for the "srcu" and "srcud" torture
types to use DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast(),
respectively, when reader_flavor is equal to SRCU_READ_FLAVOR_FAST.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit causes the srcu_readers_unlock_idx() function to take the
srcu_struct structure's ->srcu_reader_flavor field into account. This
ensures that structures defined via DEFINE_SRCU_FAST( or initialized via
init_srcu_struct_fast() have their grace periods use synchronize_srcu()
or synchronize_srcu_expedited() instead of smp_mb(), even before the
first SRCU reader has been entered.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit creates DEFINE_SRCU_FAST() and DEFINE_STATIC_SRCU_FAST()
macros that are similar to DEFINE_SRCU() and DEFINE_STATIC_SRCU(),
but which create srcu_struct structures that are usable only by readers
initiated by srcu_read_lock_fast() and friends.
This commit does make DEFINE_SRCU_FAST() available to modules, in which
case the per-CPU srcu_data structures are not created at compile time, but
rather at module-load time. This means that the >srcu_reader_flavor field
of the srcu_data structure is not available. Therefore,
this commit instead creates an ->srcu_reader_flavor field in the
srcu_struct structure, adds arguments to the DEFINE_SRCU()-related
macros to initialize this new field, and extends the checks in the
__srcu_check_read_flavor() function to include this new field.
This commit also allows dynamically allocated srcu_struct structure
to be marked for SRCU-fast readers. It does so by defining a new
init_srcu_struct_fast() function that marks the specified srcu_struct
structure for use by srcu_read_lock_fast() and friends.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit adds a ->exp_current member to the rcu_torture_ops structure
to test the srcu_expedite_current() function.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
This commit creates an srcu_expedite_current() function that expedites
the current (and possibly the next) SRCU grace period for the specified
srcu_struct structure. This functionality will be inherited by RCU
Tasks Trace courtesy of its mapping to SRCU fast.
If the current SRCU grace period is already waiting, that wait will
complete before the expediting takes effect. If there is no SRCU grace
period in flight, this function might well create one.
[ paulmck: Apply Zqiang feedback for PREEMPT_RT use. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
The current Tiny SRCU implementation of srcu_read_unlock() awakens
the grace-period processing when exiting the outermost SRCU read-side
critical section. However, not all Linux-kernel configurations and
contexts permit swake_up_one() to be invoked while interrupts are
disabled, and this can result in indefinitely extended SRCU grace periods.
This commit therefore only invokes swake_up_one() when interrupts are
enabled, and introduces polling to the grace-period workqueue handler.
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Zqiang <qiang.zhang@linux.dev>
Closes: https://lore.kernel.org/oe-lkp/202508261642.b15eefbb-lkp@intel.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
A later commit will reduce the size of the RCU watching counter to free up
some bits for another purpose. Paul suggested adding a config option to
test the extreme case where the counter is reduced to its minimum usable
width for rcutorture to poke at, so do that.
Make it only configurable under RCU_EXPERT. While at it, add a comment to
explain the layout of context_tracking->state.
Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmjkpakTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXip5B/48MvTFJ1qwRGPVzevZQ8Z4SDogEREp
69VS/xRf1YCIzyXyanwqf1dXLq8NAqicSp6ewpJAmNA55/9O0cwT2EtohjeGCu61
krPIvS3KT7xI0uSEniBdhBtALYBscnQ0e3cAbLNzL7bwA6Q6OmvoIawpBADgE/cW
aZNCK9jy+WUqtXc6lNtkJtST0HWGDn0h04o2hjqIkZ+7ewjuEEJBUUB/JZwJ41Od
UxbID0PAcn9O4n/u/Y/GH65MX+ddrdCgPHEGCLAGAKT24lou3NzVv445OuCw0c4W
ilALIRb9iea56ZLVBW5O82+7g9Ag41LGq+841MNlZjeRNONGykaUpTWZ
=OR26
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Unify guest entry code for KVM and MSHV (Sean Christopherson)
- Switch Hyper-V MSI domain to use msi_create_parent_irq_domain()
(Nam Cao)
- Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV
(Mukesh Rathor)
- Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov)
- Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna
Kumar T S M)
- Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari,
Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley)
* tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hyperv: Remove the spurious null directive line
MAINTAINERS: Mark hyperv_fb driver Obsolete
fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver
Drivers: hv: Make CONFIG_HYPERV bool
Drivers: hv: Add CONFIG_HYPERV_VMBUS option
Drivers: hv: vmbus: Fix typos in vmbus_drv.c
Drivers: hv: vmbus: Fix sysfs output format for ring buffer index
Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()
x86/hyperv: Switch to msi_create_parent_irq_domain()
mshv: Use common "entry virt" APIs to do work in root before running guest
entry: Rename "kvm" entry code assets to "virt" to genericize APIs
entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
mshv: Handle NEED_RESCHED_LAZY before transferring to guest
x86/hyperv: Add kexec/kdump support on Azure CVMs
Drivers: hv: Simplify data structures for VMBus channel close message
Drivers: hv: util: Cosmetic changes for hv_utils_transport.c
mshv: Add support for a new parent partition configuration
clocksource: hyper-v: Skip unnecessary checks for the root partition
hyperv: Add missing field to hv_output_map_device_interrupt
This pull request contains the following branches, non-octopus merged:
Documentation updates:
- Update whatisRCU.rst and checklist.rst for recent RCU API additions.
- Fix RCU documentation formatting and typos.
- Replace dead Ottawa Linux Symposium links in RTFP.txt.
Miscellaneous RCU updates:
- Document that rcu_barrier() hurries RCU_LAZY callbacks.
- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler().
- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly.
- Make initial set of changes to accommodate upcoming system_percpu_wq
changes.
SRCU updates:
- Create an srcu_read_lock_fast_notrace() for eventual use in tracing,
including adding guards.
- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast().
- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers.
- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed().
Torture-test updates:
- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary.
- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels without
doing non-debug kernels.
- Fix a number of false-positive diagnostics that were being triggered
by rcutorture starting before boot completed. Running multiple
near-CPU-bound rcutorture processes when there is only the boot CPU
is after all a bit excessive.
- Substitute kcalloc() for kzalloc().
- Remove a redundant kfree() and NULL out kfree()ed objects.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmjWzFcTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jAUeD/4xp/3rvVlfX6UB1ax/lYbQopOm2Hns
7DVO/lp8ih6jUFCRappw7do+jbU3EcP+76sGUd5qkKlRbJIierTHBrolULpKph/p
hQ/LddPYIHg+bPtbq/vA6fFwFI/xwbBNQOMS1bWzzU5iWn/p/ETS1kWptz+g2wk5
pOxx9PnH52Ls5BgCCnb8kGpUuG3cy63sFb52ORrN206EaUq59Q12CLA7aTge4QGV
3fvBIv+U+chagELG0usoPPTC+fMUt8oj8vfVYyDqUPjoyxATDtvxAv7ORFI7ZmEm
CVemKrHWEaVEERfXSSbMp0TpPrNnkB4SoYOGT6vakyjgpcQHLh0GMaUZfluvyROt
nQNaQFbOLfh9GGBjZQpAu1Aa2aAvMcOxyrL1JPhvwFVT0G4hF6s4Zs0zLY7+MAPD
XT0wSf79U4lTzlg4eNrwZazoFGIeUhwyH2X59yc04yXytM7QUBFw+7XIK/PA8Wn4
LgJYrwn6poFinGT2HlwKPrIUKSfCpLS8ePutmMgMUqLhiVtKvoaE6S38h2T97d9i
OuLFDVMm+j0awMadS3cJD5kqtGVbqnPsUuaC3OFlpyng8K+OHHTNCfcFMiMbhgdw
w6cMioYdq/a9mLoN6T50ylvTOKeoKWxXI4X6q6x7vZYUztMpFby+b5mTTbR12dJs
LSqHR8QGSIpNmQ==
=FEf1
-----END PGP SIGNATURE-----
Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Paul McKenney:
"Documentation updates:
- Update whatisRCU.rst and checklist.rst for recent RCU API additions
- Fix RCU documentation formatting and typos
- Replace dead Ottawa Linux Symposium links in RTFP.txt
Miscellaneous RCU updates:
- Document that rcu_barrier() hurries RCU_LAZY callbacks
- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler()
- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly
- Make initial set of changes to accommodate upcoming
system_percpu_wq changes
SRCU updates:
- Create an srcu_read_lock_fast_notrace() for eventual use in
tracing, including adding guards
- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast()
- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers
- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed()
Torture-test updates:
- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary
- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels
without doing non-debug kernels
- Fix a number of false-positive diagnostics that were being
triggered by rcutorture starting before boot completed. Running
multiple near-CPU-bound rcutorture processes when there is only the
boot CPU is after all a bit excessive
- Substitute kcalloc() for kzalloc()
- Remove a redundant kfree() and NULL out kfree()ed objects"
* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
rcu: WQ_UNBOUND added to sync_wq workqueue
rcu: WQ_PERCPU added to alloc_workqueue users
rcu: replace use of system_wq with system_percpu_wq
refperf: Set reader_tasks to NULL after kfree()
refperf: Remove redundant kfree() after torture_stop_kthread()
srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
rculist: move list_for_each_rcu() to where it belongs
refscale: Use kcalloc() instead of kzalloc()
rcutorture: Use kcalloc() instead of kzalloc()
docs: rcu: Replace multiple dead OLS links in RTFP.txt
doc: Fix typo in RCU's torture.rst documentation
Documentation: RCU: Retitle toctree index
Documentation: RCU: Reduce toctree depth
Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
doc: Add RCU guards to checklist.rst
doc: Update whatisRCU.rst for recent RCU API additions
rcutorture: Delay forward-progress testing until boot completes
...
Rename the "kvm" entry code files and Kconfigs to use generic "virt"
nomenclature so that the code can be reused by other hypervisors (or
rather, their root/dom0 partition drivers), without incorrectly suggesting
the code somehow relies on and/or involves KVM.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Torture-test updates:
* rcutorture: Fix jitter.sh spin time
* torture: Add --do-normal parameter to torture.sh help text
* torture: Announce kernel boot status at torture-test startup
* rcutorture: Suppress "Writer stall state" reports during boot
* rcutorture: Delay rcutorture readers and writers until boot completes
* torture: Delay CPU-hotplug operations until boot completes
* rcutorture: Delay forward-progress testing until boot completes
* rcutorture,refscale: Use kcalloc() instead of kzalloc()
* refperf: Remove redundant kfree() after torture_stop_kthread()
* refperf: Set reader_tasks to NULL after kfree()
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change add the WQ_UNBOUND flag to sync_wq, to make explicit this
workqueue can be unbound and that it does not benefit from per-cpu work.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This patch adds a new WQ_PERCPU flag to explicitly request the use of
the per-CPU behavior. Both flags coexist for one release cycle to allow
callers to transition their calls.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
All existing users have been updated accordingly.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users. Make
it clear by adding a system_percpu_wq.
The old wq will be kept for a few release cylces.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>