Evaluating the next expiry time of all clock bases is cache line expensive
as the expiry time of the first expiring timer is not cached in the base
and requires to access the timer itself, which is definitely in a different
cache line.
It's way more efficient to keep track of the expiry time on enqueue and
dequeue operations as the relevant data is already in the cache at that
point.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.404839710@kernel.org
Most times there is no change between hrtimer_interrupt() deferring the rearm
and the invocation of hrtimer_rearm_deferred(). In those cases it's a pointless
exercise to re-evaluate the next expiring timer.
Cache the required data and use it if nothing changed.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.338569372@kernel.org
Currently hrtimer_interrupt() runs expired timers, which can re-arm
themselves, after which it computes the next expiration time and
re-programs the hardware.
However, things like HRTICK, a highres timer driving preemption, cannot
re-arm itself at the point of running, since the next task has not been
determined yet. The schedule() in the interrupt return path will switch to
the next task, which then causes a new hrtimer to be programmed.
This then results in reprogramming the hardware at least twice, once after
running the timers, and once upon selecting the new task.
Notably, *both* events happen in the interrupt.
By pushing the hrtimer reprogram all the way into the interrupt return
path, it runs after schedule() picks the new task and the double reprogram
can be avoided.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.273488269@kernel.org
The hrtimer interrupt expires timers and at the end of the interrupt it
rearms the clockevent device for the next expiring timer.
That's obviously correct, but in the case that a expired timer sets
NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is
enabled then schedule() will modify the hrtick timer, which causes another
reprogramming of the hardware.
That can be avoided by deferring the rearming to the return from interrupt
path and if the return results in a immediate schedule() invocation then it
can be deferred until the end of schedule(), which avoids multiple rearms
and re-evaluation of the timer wheel.
Add the rearm checks to the existing sched_hrtick_enter/exit() functions,
which already handle the batched rearm of the hrtick timer.
For now this is just placing empty stubs at the right places which are all
optimized out by the compiler until the guard condition becomes true.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.208580085@kernel.org
The hrtimer interrupt expires timers and at the end of the interrupt it
rearms the clockevent device for the next expiring timer.
That's obviously correct, but in the case that a expired timer sets
NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is
enabled then schedule() will modify the hrtick timer, which causes another
reprogramming of the hardware.
That can be avoided by deferring the rearming to the return from interrupt
path and if the return results in a immediate schedule() invocation then it
can be deferred until the end of schedule(), which avoids multiple rearms
and re-evaluation of the timer wheel.
In case that the return from interrupt ends up handling softirqs before
reaching the rearm conditions in the return to user entry code functions, a
deferred rearm has to be handled before softirq handling enables interrupts
as soft interrupt handling can be long and would therefore introduce hard
to diagnose latencies to the timer interrupt.
Place the for now empty stub call right before invoking the softirq
handling routine.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.142854488@kernel.org
The hrtimer interrupt expires timers and at the end of the interrupt it
rearms the clockevent device for the next expiring timer.
That's obviously correct, but in the case that a expired timer sets
NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is
enabled then schedule() will modify the hrtick timer, which causes another
reprogramming of the hardware.
That can be avoided by deferring the rearming to the return from interrupt
path and if the return results in a immediate schedule() invocation then it
can be deferred until the end of schedule(), which avoids multiple rearms
and re-evaluation of the timer wheel.
As this is only relevant for interrupt to user return split the work masks
up and hand them in as arguments from the relevant exit to user functions,
which allows the compiler to optimize the deferred handling out for the
syscall exit to user case.
Add the rearm checks to the approritate places in the exit to user loop and
the interrupt return to kernel path, so that the rearming is always
guaranteed.
In the return to user space path this is handled in the same way as
TIF_RSEQ to avoid extra instructions in the fast path, which are truly
hurtful for device interrupt heavy work loads as the extra instructions and
conditionals while benign at first sight accumulate quickly into measurable
regressions. The return from syscall path is completely unaffected due to
the above mentioned split so syscall heavy workloads wont have any extra
burden.
For now this is just placing empty stubs at the right places which are all
optimized out by the compiler until the actual functionality is in place.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.066469985@kernel.org
The hrtimer interrupt expires timers and at the end of the interrupt it
rearms the clockevent device for the next expiring timer.
That's obviously correct, but in the case that a expired timer set
NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is
enabled then schedule() will modify the hrtick timer, which causes another
reprogramming of the hardware.
That can be avoided by deferring the rearming to the return from interrupt
path and if the return results in a immediate schedule() invocation then it
can be deferred until the end of schedule().
To make this correct the affected code parts need to be made aware of this.
Provide empty stubs for the deferred rearming mechanism, so that the
relevant code changes for entry, softirq and scheduler can be split up into
separate changes independent of the actual enablement in the hrtimer code.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.000891171@kernel.org
The upcoming deferred rearming scheme has the same effect as the deferred
rearming when the hrtimer interrupt is executing. So it can reuse the
in_hrtirq flag, but when it gets deferred beyond the hrtimer interrupt
path, then the name does not make sense anymore.
Rename it to deferred_rearm upfront to keep the actual functional change
separate from the mechanical rename churn.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.935623347@kernel.org
Rework hrtimer_interrupt() such that reprogramming is split out into an
independent function at the end of the interrupt.
This prepares for reprogramming getting delayed beyond the end of
hrtimer_interrupt().
Notably, this changes the hang handling to always wait 100ms instead of
trying to keep it proportional to the actual delay. This simplifies the
state, also this really shouldn't be happening.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.870639266@kernel.org
As the base switch can be avoided completely when the base stays the same
the remove/enqueue handling can be more streamlined.
Split it out into a separate function which handles both in one go which is
way more efficient and makes the code simpler to follow.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.737600486@kernel.org
The decision to keep a timer which is associated to the local CPU on that
CPU does not take NOHZ information into account. As a result there are a
lot of hrtimer base switch invocations which end up not switching the base
and stay on the local CPU. That's just work for nothing and can be further
improved.
If the local CPU is part of the NOISE housekeeping mask, then check:
1) Whether the local CPU has the tick running, which means it is
either not idle or already expecting a timer soon.
2) Whether the tick is stopped and need_resched() is set, which
means the CPU is about to exit idle.
This reduces the amount of hrtimer base switch attempts, which end up on
the local CPU anyway, significantly and prepares for further optimizations.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.673473029@kernel.org
The decision whether to keep timers on the local CPU or on the CPU they are
associated to is suboptimal and causes the expensive switch_hrtimer_base()
mechanism to be invoked more than necessary. This is especially true for
pinned timers.
Rewrite the decision logic so that the current base is kept if:
1) The callback is running on the base
2) The timer is associated to the local CPU and the first expiring timer as
that allows to optimize for reprogramming avoidance
3) The timer is associated to the local CPU and pinned
4) The timer is associated to the local CPU and timer migration is
disabled.
Only #2 was covered by the original code, but especially #3 makes a
difference for high frequency rearming timers like the scheduler hrtick
timer. If timer migration is disabled, then #4 avoids most of the base
switches.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.607935269@kernel.org
All 'u8' flags are true booleans, so make it entirely clear that these can
only contain true or false.
This is especially true for hrtimer::state, which has a historical leftover
of using the state with bitwise operations. That was used in the early
hrtimer implementation with several bits, but then converted to a boolean
state. But that conversion missed to replace the bit OR and bit check
operations all over the place, which creates suboptimal code. As of today
'state' is a misnomer because it's only purpose is to reflect whether the
timer is enqueued into the RB-tree or not. Rename it to 'is_queued' and
make all operations on it boolean.
This reduces text size from 8926 to 8732 bytes.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.542427240@kernel.org
Use bool for the various flags as that creates better code in the hot path.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.475262618@kernel.org
As this code has some major surgery ahead, clean up coding style and bring
comments up to date.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.342740952@kernel.org
hrtimer_start() when invoked with an already armed timer traces like:
<comm>-.. [032] d.h2. 5.002263: hrtimer_cancel: hrtimer= ....
<comm>-.. [032] d.h1. 5.002263: hrtimer_start: hrtimer= ....
Which is incorrect as the timer doesn't get canceled. Just the expiry time
changes. The internal dequeue operation which is required for that is not
really interesting for trace analysis. But it makes it tedious to keep real
cancellations and the above case apart.
Remove the cancel tracing in hrtimer_start() and add a 'was_armed'
indicator to the hrtimer start tracepoint, which clearly indicates what the
state of the hrtimer is when hrtimer_start() is invoked:
<comm>-.. [032] d.h1. 6.200103: hrtimer_start: hrtimer= .... was_armed=0
<comm>-.. [032] d.h1. 6.200558: hrtimer_start: hrtimer= .... was_armed=1
Fixes: c6a2a17702 ("hrtimer: Add tracepoint for hrtimers")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.208491877@kernel.org
The debug object coverage in hrtimer_start_range_ns() happens too late to
do anything useful. Implement the init assert assertion part and invoke
that early in hrtimer_start_range_ns().
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.143098153@kernel.org
Some clockevent devices are coupled to the system clocksource by
implementing a less than or equal comparator which compares the programmed
absolute expiry time against the underlying time counter.
The timekeeping core provides a function to convert and absolute
CLOCK_MONOTONIC based expiry time to a absolute clock cycles time which can
be directly fed into the comparator. That spares two time reads in the next
event progamming path, one to convert the absolute nanoseconds time to a
delta value and the other to convert the delta value back to a absolute
time value suitable for the comparator.
Provide a new clocksource callback which takes the absolute cycle value and
wire it up in clockevents_program_event(). Similar to clocksources allow
architectures to inline the rearm operation.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.010425428@kernel.org
Some architectures have clockevent devices which are coupled to the system
clocksource by implementing a less than or equal comparator which compares
the programmed absolute expiry time against the underlying time
counter. Well known examples are TSC/TSC deadline timer and the S390 TOD
clocksource/comparator.
While the concept is nice it has some downsides:
1) The clockevents core code is strictly based on relative expiry times
as that's the most common case for clockevent device hardware. That
requires to convert the absolute expiry time provided by the caller
(hrtimers, NOHZ code) to a relative expiry time by reading and
substracting the current time.
The clockevent::set_next_event() callback must then read the counter
again to convert the relative expiry back into a absolute one.
2) The conversion factors from nanoseconds to counter clock cycles are
set up when the clockevent is registered. When NTP applies corrections
then the clockevent conversion factors can deviate from the
clocksource conversion substantially which either results in timers
firing late or in the worst case early. The early expiry then needs to
do a reprogam with a short delta.
In most cases this is papered over by the fact that the read in the
set_next_event() callback happens after the read which is used to
calculate the delta. So the tendency is that timers expire mostly
late.
All of this can be avoided by providing support for these devices in the
core code:
1) The timekeeping core keeps track of the last update to the clocksource
by storing the base nanoseconds and the corresponding clocksource
counter value. That's used to keep the conversion math for reading the
time within 64-bit in the common case.
This information can be used to avoid both reads of the underlying
clocksource in the clockevents reprogramming path:
delta = expiry - base_ns;
cycles = base_cycles + ((delta * clockevent::mult) >> clockevent::shift);
The resulting cycles value can be directly used to program the
comparator.
2) As #1 does not longer provide the "compensation" through the second
read the deviation of the clocksource and clockevent conversions
caused by NTP become more prominent.
This can be cured by letting the timekeeping core compute and store
the reverse conversion factors when the clocksource cycles to
nanoseconds factors are modified by NTP:
CS::MULT (1 << NS_TO_CYC_SHIFT)
--------------- = ----------------------
(1 << CS:SHIFT) NS_TO_CYC_MULT
Ergo: NS_TO_CYC_MULT = (1 << (CS::SHIFT + NS_TO_CYC_SHIFT)) / CS::MULT
The NS_TO_CYC_SHIFT value is calculated when the clocksource is
installed so that it aims for a one hour maximum sleep time.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.944763521@kernel.org
On some architectures clocksource::read() boils down to a single
instruction, so the indirect function call is just a massive overhead
especially with speculative execution mitigations in effect.
Allow architectures to enable conditional inlining of that read to avoid
that by:
- providing a static branch to switch to the inlined variant
- disabling the branch before clocksource changes
- enabling the branch after a clocksource change, when the clocksource
indicates in a feature flag that it is the one which provides the
inlined variant
This is intentionally not a static call as that would only remove the
indirect call, but not the rest of the overhead.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.675151545@kernel.org
The only real usecase for this is the hrtimer based broadcast device.
No point in using two different feature flags for this.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.609049777@kernel.org
The sequence of cancel and start is inefficient. It has to do the timer
lock/unlock twice and in the worst case has to reprogram the underlying
clock event device twice.
The reason why it is done this way is the usage of hrtimer_forward_now(),
which requires the timer to be inactive.
But that can be completely avoided as the forward can be done on a variable
and does not need any of the overrun accounting provided by
hrtimer_forward_now().
Implement a trivial forwarding mechanism and replace the cancel/reprogram
sequence with hrtimer_start(..., new_expiry).
For the non high resolution case the timer is not actually armed, but used
for storage so that code checking for expiry times can unconditially look
it up in the timer. So it is safe for that case to set the new expiry time
directly.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.542178086@kernel.org
The hrtick timer is frequently rearmed before expiry and most of the time
the new expiry is past the armed one. As this happens on every context
switch it becomes expensive with scheduling heavy work loads especially in
virtual machines as the "hardware" reprogamming implies a VM exit.
hrtimer now provide a lazy rearm mode flag which skips the reprogamming if:
1) The timer was the first expiring timer before the rearm
2) The new expiry time is farther out than the armed time
This avoids a massive amount of reprogramming operations of the hrtick
timer for the price of eventually taking the alredy armed interrupt for
nothing.
Mark the hrtick timer accordingly.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.475409346@kernel.org
The hrtick timer is frequently rearmed before expiry and most of the time
the new expiry is past the armed one. As this happens on every context
switch it becomes expensive with scheduling heavy work loads especially in
virtual machines as the "hardware" reprogamming implies a VM exit.
Add a lazy rearm mode flag which skips the reprogamming if:
1) The timer was the first expiring timer before the rearm
2) The new expiry time is farther out than the armed time
This avoids a massive amount of reprogramming operations of the hrtick
timer for the price of eventually taking the alredy armed interrupt for
nothing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.408524456@kernel.org
Tiny adjustments to the hrtick expiry time below 5 microseconds are just
causing extra work for no real value. Filter them out when restarting the
hrtick.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.340593047@kernel.org
schedule() provides several mechanisms to update the hrtick timer:
1) When the next task is picked
2) When the balance callbacks are invoked before rq::lock is released
Each of them can result in a first expiring timer and cause a reprogram of
the clock event device.
Solve this by deferring the rearm to the end of schedule() right before
releasing rq::lock by setting a flag on entry which tells hrtick_start() to
cache the runtime constraint in rq::hrtick_delay without touching the timer
itself.
Right before releasing rq::lock evaluate the flags and either rearm or
cancel the hrtick timer.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.273068659@kernel.org
Use the static branch based variant and thereby avoid following three
pointers.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.203610956@kernel.org
The scheduler evaluates this via hrtimer_is_hres_active() every time it has
to update HRTICK. This needs to follow three pointers, which is expensive.
Provide a static branch based mechanism to avoid that.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.136503358@kernel.org
Much like hrtimer_reprogram(), skip programming if the cpu_base is running
the hrtimer interrupt.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260224163429.069535561@kernel.org
The clock of the hrtick and deadline timers is known to be CLOCK_MONOTONIC.
No point in looking it up via hrtimer_cb_get_time().
Just use ktime_get() directly.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.001511662@kernel.org
Since the tick causes hard preemption, the hrtick should too.
Letting the hrtick do lazy preemption completely defeats the purpose, since
it will then still be delayed until a old tick and be dependent on
CONFIG_HZ.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163428.933894105@kernel.org
hrtick_update() was needed when the slice depended on nr_running, all that
code is gone. All that remains is starting the hrtick when nr_running
becomes more than 1.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163428.866374835@kernel.org
The nominal duration for an EEVDF task to run is until its deadline. At
which point the deadline is moved ahead and a new task selection is done.
Try and predict the time 'lost' to higher scheduling classes. Since this is
an estimate, the timer can be both early or late. In case it is early
task_tick_fair() will take the !need_resched() path and restarts the timer.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260224163428.798198874@kernel.org
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
- Fix possible dereference of uninitialized pointer
When validating the persistent ring buffer on boot up, if the first
validation fails, a reference to "head_page" is performed in the
error path, but it skips over the initialization of that variable.
Move the initialization before the first validation check.
- Fix use of event length in validation of persistent ring buffer
On boot up, the persistent ring buffer is checked to see if it is
valid by several methods. One being to walk all the events in the
memory location to make sure they are all valid. The length of the
event is used to move to the next event. This length is determined
by the data in the buffer. If that length is corrupted, it could
possibly make the next event to check located at a bad memory location.
Validate the length field of the event when doing the event walk.
- Fix function graph on archs that do not support use of ftrace_ops
When an architecture defines HAVE_DYNAMIC_FTRACE_WITH_ARGS, it means
that its function graph tracer uses the ftrace_ops of the function
tracer to call its callbacks. This allows a single registered callback
to be called directly instead of checking the callback's meta data's
hash entries against the function being traced.
For architectures that do not support this feature, it must always
call the loop function that tests each registered callback (even if
there's only one). The loop function tests each callback's meta data
against its hash of functions and will call its callback if the
function being traced is in its hash map.
The issue was that there was no check against this and the direct
function was being called even if the architecture didn't support it.
This meant that if function tracing was enabled at the same time
as a callback was registered with the function graph tracer, its
callback would be called for every function that the function tracer
also traced, even if the callback's meta data only wanted to be
called back for a small subset of functions.
Prevent the direct calling for those architectures that do not support
it.
- Fix references to trace_event_file for hist files
The hist files used event_file_data() to get a reference to the
associated trace_event_file the histogram was attached to. This
would return a pointer even if the trace_event_file is about to
be freed (via RCU). Instead it should use the event_file_file()
helper that returns NULL if the trace_event_file is marked to be
freed so that no new references are added to it.
- Wake up hist poll readers when an event is being freed
When polling on a hist file, the task is only awoken when a hist
trigger is triggered. This means that if an event is being freed
while there's a task waiting on its hist file, it will need to wait
until the hist trigger occurs to wake it up and allow the freeing
to happen. Note, the event will not be completely freed until all
references are removed, and a hist poller keeps a reference. But
it should still be woken when the event is being freed.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaZd4ExQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqX5AP4powfnNnRfLSKH9idhTp+ltmJ+9roy
L7kWTr/z20S2VQEAk331PNZ32uZu+/ZUETYpgtEx4SbRGZFehTBv1ddjfw4=
=TWj5
-----END PGP SIGNATURE-----
Merge tag 'trace-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix possible dereference of uninitialized pointer
When validating the persistent ring buffer on boot up, if the first
validation fails, a reference to "head_page" is performed in the
error path, but it skips over the initialization of that variable.
Move the initialization before the first validation check.
- Fix use of event length in validation of persistent ring buffer
On boot up, the persistent ring buffer is checked to see if it is
valid by several methods. One being to walk all the events in the
memory location to make sure they are all valid. The length of the
event is used to move to the next event. This length is determined by
the data in the buffer. If that length is corrupted, it could
possibly make the next event to check located at a bad memory
location.
Validate the length field of the event when doing the event walk.
- Fix function graph on archs that do not support use of ftrace_ops
When an architecture defines HAVE_DYNAMIC_FTRACE_WITH_ARGS, it means
that its function graph tracer uses the ftrace_ops of the function
tracer to call its callbacks. This allows a single registered
callback to be called directly instead of checking the callback's
meta data's hash entries against the function being traced.
For architectures that do not support this feature, it must always
call the loop function that tests each registered callback (even if
there's only one). The loop function tests each callback's meta data
against its hash of functions and will call its callback if the
function being traced is in its hash map.
The issue was that there was no check against this and the direct
function was being called even if the architecture didn't support it.
This meant that if function tracing was enabled at the same time as a
callback was registered with the function graph tracer, its callback
would be called for every function that the function tracer also
traced, even if the callback's meta data only wanted to be called
back for a small subset of functions.
Prevent the direct calling for those architectures that do not
support it.
- Fix references to trace_event_file for hist files
The hist files used event_file_data() to get a reference to the
associated trace_event_file the histogram was attached to. This would
return a pointer even if the trace_event_file is about to be freed
(via RCU). Instead it should use the event_file_file() helper that
returns NULL if the trace_event_file is marked to be freed so that no
new references are added to it.
- Wake up hist poll readers when an event is being freed
When polling on a hist file, the task is only awoken when a hist
trigger is triggered. This means that if an event is being freed
while there's a task waiting on its hist file, it will need to wait
until the hist trigger occurs to wake it up and allow the freeing to
happen. Note, the event will not be completely freed until all
references are removed, and a hist poller keeps a reference. But it
should still be woken when the event is being freed.
* tag 'trace-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Wake up poll waiters for hist files when removing an event
tracing: Fix checking of freed trace_event_file for hist files
fgraph: Do not call handlers direct when not using ftrace_ops
tracing: ring-buffer: Fix to check event length before using
ring-buffer: Fix possible dereference of uninitialized pointer
The event_hist_poll() function attempts to verify whether an event file is
being removed, but this check may not occur or could be unnecessarily
delayed. This happens because hist_poll_wakeup() is currently invoked only
from event_hist_trigger() when a hist command is triggered. If the event
file is being removed, no associated hist command will be triggered and a
waiter will be woken up only after an unrelated hist command is triggered.
Fix the issue by adding a call to hist_poll_wakeup() in
remove_event_file_dir() after setting the EVENT_FILE_FL_FREED flag. This
ensures that a task polling on a hist file is woken up and receives
EPOLLERR.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-3-petr.pavlu@suse.com
Fixes: 1bd13edbbe ("tracing/hist: Add poll(POLLIN) support on hist file")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The event_hist_open() and event_hist_poll() functions currently retrieve
a trace_event_file pointer from a file struct by invoking
event_file_data(), which simply returns file->f_inode->i_private. The
functions then check if the pointer is NULL to determine whether the event
is still valid. This approach is flawed because i_private is assigned when
an eventfs inode is allocated and remains set throughout its lifetime.
Instead, the code should call event_file_file(), which checks for
EVENT_FILE_FL_FREED. Using the incorrect access function may result in the
code potentially opening a hist file for an event that is being removed or
becoming stuck while polling on this file.
Correct the access method to event_file_file() in both functions.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-2-petr.pavlu@suse.com
Fixes: 1bd13edbbe ("tracing/hist: Add poll(POLLIN) support on hist file")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The function graph tracer was modified to us the ftrace_ops of the
function tracer. This simplified the code as well as allowed more features
of the function graph tracer.
Not all architectures were converted over as it required the
implementation of HAVE_DYNAMIC_FTRACE_WITH_ARGS to implement. For those
architectures, it still did it the old way where the function graph tracer
handle was called by the function tracer trampoline. The handler then had
to check the hash to see if the registered handlers wanted to be called by
that function or not.
In order to speed up the function graph tracer that used ftrace_ops, if
only one callback was registered with function graph, it would call its
function directly via a static call.
Now, if the architecture does not support the use of using ftrace_ops and
still has the ftrace function trampoline calling the function graph
handler, then by doing a direct call it removes the check against the
handler's hash (list of functions it wants callbacks to), and it may call
that handler for functions that the handler did not request calls for.
On 32bit x86, which does not support the ftrace_ops use with function
graph tracer, it shows the issue:
~# trace-cmd start -p function -l schedule
~# trace-cmd show
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) * 11898.94 us | schedule();
3) # 1783.041 us | schedule();
1) | schedule() {
------------------------------------------
1) bash-8369 => kworker-7669
------------------------------------------
1) | schedule() {
------------------------------------------
1) kworker-7669 => bash-8369
------------------------------------------
1) + 97.004 us | }
1) | schedule() {
[..]
Now by starting the function tracer is another instance:
~# trace-cmd start -B foo -p function
This causes the function graph tracer to trace all functions (because the
function trace calls the function graph tracer for each on, and the
function graph trace is doing a direct call):
~# trace-cmd show
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
1) 1.669 us | } /* preempt_count_sub */
1) + 10.443 us | } /* _raw_spin_unlock_irqrestore */
1) | tick_program_event() {
1) | clockevents_program_event() {
1) 1.044 us | ktime_get();
1) 6.481 us | lapic_next_event();
1) + 10.114 us | }
1) + 11.790 us | }
1) ! 181.223 us | } /* hrtimer_interrupt */
1) ! 184.624 us | } /* __sysvec_apic_timer_interrupt */
1) | irq_exit_rcu() {
1) 0.678 us | preempt_count_sub();
When it should still only be tracing the schedule() function.
To fix this, add a macro FGRAPH_NO_DIRECT to be set to 0 when the
architecture does not support function graph use of ftrace_ops, and set to
1 otherwise. Then use this macro to know to allow function graph tracer to
call the handlers directly or not.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://patch.msgid.link/20260218104244.5f14dade@gandalf.local.home
Fixes: cc60ee813b ("function_graph: Use static_call and branch to optimize entry function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Check the event length before adding it for accessing next index in
rb_read_data_buffer(). Since this function is used for validating
possibly broken ring buffers, the length of the event could be broken.
In that case, the new event (e + len) can point a wrong address.
To avoid invalid memory access at boot, check whether the length of
each event is in the possible range before using it.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5f3b6e839f ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177123421541.142205.9414352170164678966.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There is a pointer head_page in rb_meta_validate_events() which is not
initialized at the beginning of a function. This pointer can be dereferenced
if there is a failure during reader page validation. In this case the control
is passed to "invalid" label where the pointer is dereferenced in a loop.
To fix the issue initialize orig_head and head_page before calling
rb_validate_buffer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260213100130.2013839-1-d.dulov@aladdin.ru
Closes: https://lore.kernel.org/r/202406130130.JtTGRf7W-lkp@intel.com/
Fixes: 5f3b6e839f ("ring-buffer: Validate boot range memory events")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmXR6QACgkQ6rmadz2v
bTqVjg/+PZPMKGBMfF5uWk74LWYQIt01ePfHH2QeA4DsOwNK+9Q1+jmCLRPa/diL
Ds//ZIEMatmtdd1eO5aHyGXE1sBsJ02LfKOhsPukQyzD/FtZ4BmQpzpG2mK5o1M5
NAH6wxY+6Tr8UlXQtoTF1FFXSa6Y0vQmkyXofOoUgSBAxTPMGVQnWw4bq7mUAX9A
G6/TnPDgGbNLejPCmu8mERCkqRjIGAgjBUItVeiHbdxymtzjHcrH7nwxuP59djR9
1AhMrJnyV+s7iEMkAKGkE6NOID73R/YQEqmvD1eX0AWvqdR8+4lOHT0KPU039JqT
RQV5JgXSfeEkdUtyvqQJZdiJinjFLOwp4CGcX+DKcvUpAKmLx8q3ihPiuWk8+JOV
fnosXQIeQ7B9EuTvoNoNTfvU/MuV8vWd3/1kQc+KGXhzk944Ypb29zywGEoGZarU
eb7YRtUIsXBo7H2K1juqTvj72jyhG83cbZxE5+pR2gv87yGgUvt0r0u0FvzkQf2c
Fq671n6UaOd+0ZYey7YG9bIc+WMTbsjqVY0f7+3Rcl3Q68we0HHdqiPoF637/a1r
lS6TZLRDJmykDPp03db97UcQml6RKJiyTnTNQVUF4Uk+VF1KTreXK/D8fD401gxZ
GWuLt/bjq8l/EVJOhzpaO0JDejmZVRaOQUX8t9DwstS+FFbSyBs=
=mb+y
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
Hung)
- Fix a potential use-after-free of BTF object (Anton Protopopov)
- Add feature detection to libbpf and avoid moving arena global
variables on older kernels (Emil Tsalapatis)
- Remove extern declaration of bpf_stream_vprintk() from libbpf headers
(Ihor Solodrai)
- Fix truncated netlink dumps in bpftool (Jakub Kicinski)
- Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
Dwivedi)
- Remove hexdump dependency while building bpf selftests (Matthieu
Baerts)
- Complete fsession support in BPF trampolines on riscv (Menglong Dong)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Remove hexdump dependency
libbpf: Remove extern declaration of bpf_stream_vprintk()
selftests/bpf: Use vmlinux.h in test_xdp_meta
bpftool: Fix truncated netlink dumps
libbpf: Delay feature gate check until object prepare time
libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
bpf: Add a map/btf from a fd array more consistently
selftests/bpf: Fix map_kptr grace period wait
selftests/bpf: enable fsession_test on riscv64
selftests/bpf: Adjust selftest due to function rename
bpf, riscv: add fsession support for trampolines
bpf: Fix a potential use-after-free of BTF object
bpf, riscv: introduce emit_store_stack_imm64() for trampoline
libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
libbpf: Add gating for arena globals relocation feature
Total patches: 7
Reviews/patch: 0.57
Reviewed rate: 42%
- The 2 patch series "two fixes in kho_populate()" from Ran Xiaokai
fixes a couple of not-major issues in the kexec handover code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaKBAAKCRDdBJ7gKXxA
jpB1AP9UpNzT63aGDnB6G8pgekSdK/I2gypZI3cS7MpBPorRUgEAhcClc2//zWGK
0Wz1rxh3sWIE/pzd/yOEsv+7oQHeDQA=
=oUp2
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-MM updates from Andrew Morton:
- "two fixes in kho_populate()" fixes a couple of not-major issues in
the kexec handover code (Ran Xiaokai)
- misc singletons
* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib/group_cpus: handle const qualifier from clusters allocation type
kho: remove unnecessary WARN_ON(err) in kho_populate()
kho: fix missing early_memunmap() call in kho_populate()
scripts/gdb: implement x86_page_ops in mm.py
objpool: fix the overestimation of object pooling metadata size
selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
delayacct: fix build regression on accounting tool
Total patches: 36
Reviews/patch: 1.77
Reviewed rate: 83%
- The 2 patch series "mm/vmscan: fix demotion targets checks in
reclaim/demotion" from Bing Jiao fixes a couple of issues in the
demotion code - pages were failed demotion and were finding themselves
demoted into disallowed nodes.
- The 11 patch series "Remove XA_ZERO from error recovery of dup_mmap()"
from Liam Howlett fixes a rare mapledtree race and performs a number of
cleanups.
- The 13 patch series "mm: add bitmap VMA flag helpers and convert all
mmap_prepare to use them" from Lorenzo Stoakes implements a lot of
cleanups following on from the conversion of the VMA flags into a
bitmap.
- The 5 patch series "support batch checking of references and unmapping
for large folios" from Baolin Wang implements batching to greatly
improve the performance of reclaiming clean file-backed large folios.
- The 3 patch series "selftests/mm: add memory failure selftests" from
Miaohe Lin does as claimed.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaIEQAKCRDdBJ7gKXxA
jj73AQCQDwLoipDiQRGyjB5BDYydymWuDoiB1tlDPHfYAP3b/QD/UQtVlOEXqwM3
naOKs3NQ1pwnfhDaQMirGw2eAnJ1SQY=
=6Iif
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
couple of issues in the demotion code - pages were failed demotion
and were finding themselves demoted into disallowed nodes (Bing Jiao)
- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
mapledtree race and performs a number of cleanups (Liam Howlett)
- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
them" implements a lot of cleanups following on from the conversion
of the VMA flags into a bitmap (Lorenzo Stoakes)
- "support batch checking of references and unmapping for large folios"
implements batching to greatly improve the performance of reclaiming
clean file-backed large folios (Baolin Wang)
- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
Lin)
* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
mm/page_alloc: clear page->private in free_pages_prepare()
selftests/mm: add memory failure dirty pagecache test
selftests/mm: add memory failure clean pagecache test
selftests/mm: add memory failure anonymous page test
mm: rmap: support batched unmapping for file large folios
arm64: mm: implement the architecture-specific clear_flush_young_ptes()
arm64: mm: support batch clearing of the young flag for large folios
arm64: mm: factor out the address and ptep alignment into a new helper
mm: rmap: support batched checks of the references for large folios
tools/testing/vma: add VMA userland tests for VMA flag functions
tools/testing/vma: separate out vma_internal.h into logical headers
tools/testing/vma: separate VMA userland tests into separate files
mm: make vm_area_desc utilise vma_flags_t only
mm: update all remaining mmap_prepare users to use vma_flags_t
mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
mm: update secretmem to use VMA flags on mmap_prepare
mm: update hugetlbfs to use VMA flags on mmap_prepare
mm: add basic VMA flag operation helper functions
tools: bitmap: add missing bitmap_[subset(), andnot()]
mm: add mk_vma_flags() bitmap flag macro helper
...
* Removed macros from proc handler converters
Replace the proc converter macros with "regular" functions. Though it is more
verbose than the macro version, it helps when debugging and better aligns with
coding-style.rst.
* General cleanup
Remove superfluous ctl_table forward declarations. Const qualify the
memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays. Add
missing kernel doc to proc_dointvec_conv.
* Testing
This series was run through sysctl selftests/kunit test suite in
x86_64. And went into linux-next after rc4, giving it a good 3 weeks of
testing
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmmUabYACgkQupfNUreW
QU8y2Qv/d2y35uQPRDh0HKWKWXJy41C2RJzd/rFCWJPCwo150whTSHIHkWYnu76g
10QblBXQmXi9TVqFnJ7Il7PWgqkMPjzA13tfT9eXNWU8j2OB/mcVKNl9X4wm/jWi
QxtGmBsIQ/nxb2pUzMCykzgfc5mLi2NQ8qhZ5bOnq7UW3zdYmzEqx+tRdvIacyIk
adComi5v8xUDqyEbVFaBovuX2WHQkPyBMnD64nwWG93JpNG/+9PxGzv/DNUXY11Y
epVOfSoKdJbSLjYoHEPEhT0aHjSydq3QHru7uF6wzKOFTfHej/XkXXbUnFXPO2Pn
c5J0u/HziYG5eN2QTqGfrhECZYuCFPemtUozltbcgGebkl1wKH+k9K5vsCaz/mhk
ihUC3mui++W/n9B9HJRYh1XeEpk6C1pWERCOx27XFZ25fSek2YO6ZWkT0q+gceC0
t4+eIFSGJ3OzheJgHNK9XhTMWiQPmHyA6brXYGx4WeRvJFLpVddPF7k3Z89zIAu/
Fut7FGTH
=0Z+I
-----END PGP SIGNATURE-----
Merge tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Remove macros from proc handler converters
Replace the proc converter macros with "regular" functions. Though it
is more verbose than the macro version, it helps when debugging and
better aligns with coding-style.rst.
- General cleanup
Remove superfluous ctl_table forward declarations. Const qualify the
memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
Add missing kernel doc to proc_dointvec_conv.
- Testing
This series was run through sysctl selftests/kunit test suite in
x86_64. And went into linux-next after rc4, giving it a good 3 weeks
of testing
* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
sysctl: Replace unidirectional INT converter macros with functions
sysctl: Add kernel doc to proc_douintvec_conv
sysctl: Replace UINT converter macros with functions
sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
sysctl: clarify proc_douintvec_minmax doc
sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
sysctl: Remove unused ctl_table forward declarations
loadpin: Implement custom proc_handler for enforce
alloc_tag: move memory_allocation_profiling_sysctls into .rodata
sysctl: Add missing kernel-doc for proc_dointvec_conv