Commit Graph

6678 Commits (f9db1fc56281b96fe8748632b3894de970a8a850)

Author SHA1 Message Date
Steven Rostedt f75340d73c tracing: Remove unnecessary "goto out" that simply returns ret is trigger code
There's several functions that have "goto out;" where the label out is just:

 out:
	return ret;

Simplify the code by just doing the return in the location and removing
all the out labels and jumps.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://lore.kernel.org/20250507145456.121186494@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:11 -04:00
Miaoqian Lin c5dd28e7fb tracing: Fix error handling in event_trigger_parse()
According to trigger_data_alloc() doc, trigger_data_free() should be
used to free an event_trigger_data object. This fixes a mismatch introduced
when kzalloc was replaced with trigger_data_alloc without updating
the corresponding deallocation calls.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://lore.kernel.org/20250507145455.944453325@goodmis.org
Link: https://lore.kernel.org/20250318112737.4174-1-linmq006@gmail.com
Fixes: e1f187d09e ("tracing: Have existing event_command.parse() implementations use helpers")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
[ SDR: Changed event_trigger_alloc/free() to trigger_data_alloc/free() ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:11 -04:00
Steven Rostedt f2947c4b7d tracing: Rename event_trigger_alloc() to trigger_data_alloc()
The function event_trigger_alloc() creates an event_trigger_data
descriptor and states that it needs to be freed via event_trigger_free().
This is incorrect, it needs to be freed by trigger_data_free() as
event_trigger_free() adds ref counting.

Rename event_trigger_alloc() to trigger_data_alloc() and state that it
needs to be freed via trigger_data_free(). This naming convention
was introducing bugs.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://lore.kernel.org/20250507145455.776436410@goodmis.org
Fixes: 86599dbe2c ("tracing: Add helper functions to simplify event_command.parse() callback handling")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:11 -04:00
Devaansh Kumar 73207746d3 tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf
strncpy() is deprecated for NUL-terminated destination buffers and must
be replaced by strscpy().

See issue: https://github.com/KSPP/linux/issues/90

Link: https://lore.kernel.org/20250507133837.19640-1-devaanshk840@gmail.com
Signed-off-by: Devaansh Kumar <devaanshk840@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:11 -04:00
Steven Rostedt 6e3b3acaf4 tracing: Remove unused buffer_page field from trace_array_cpu structure
The trace_array_cpu had a "buffer_page" field that was originally going to
be used as a backup page for the ring buffer. But the ring buffer has its
own way of reusing pages and this field was never used.

Remove it.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212236.738849456@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:10 -04:00
Steven Rostedt c4a80c0615 tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer
The irqsoff tracer uses the per CPU "disabled" field to prevent corruption
of the accounting when it starts to trace interrupts disabled, but there's
a slight race that could happen if for some reason it was called twice.
Use atomic_inc_return() instead.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212236.567884756@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:10 -04:00
Steven Rostedt 90633c34c3 tracing: Convert the per CPU "disabled" counter to local from atomic
The per CPU "disabled" counter is used for the latency tracers and stack
tracers to make sure that their accounting isn't messed up by an NMI or
interrupt coming in and affecting the same CPU data. But the counter is an
atomic_t type. As it only needs to synchronize against the current CPU,
switch it over to local_t type.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212236.394925376@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:10 -04:00
Steven Rostedt cf64792f0a tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field
The branch tracer currently checks the per CPU "disabled" field to know if
tracing is enabled or not for the CPU. As the "disabled" value is not used
anymore to turn of tracing generically, use tracing_tracer_is_on_cpu()
instead.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212236.224658526@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:10 -04:00
Steven Rostedt 092a38565e ring-buffer: Add ring_buffer_record_is_on_cpu()
Add the function ring_buffer_record_is_on_cpu() that returns true if the
ring buffer for a give CPU is writable and false otherwise.

Also add tracer_tracing_is_on_cpu() to return if the ring buffer for a
given CPU is writeable for a given trace_array.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212236.059853898@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:10 -04:00
Steven Rostedt 969043af15 tracing: Do not use per CPU array_buffer.data->disabled for cpumask
The per CPU "disabled" value was the original way to disable tracing when
the tracing subsystem was first created. Today, the ring buffer
infrastructure has its own way to disable tracing. In fact, things have
changed so much since 2008 that many things ignore the disable flag.

Do not bother setting the per CPU disabled flag of the array_buffer data
to use to determine what CPUs can write to the buffer and only rely on the
ring buffer code itself to disabled it.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212235.885452497@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:10 -04:00
Steven Rostedt f62e3de375 ftrace: Do not disabled function graph based on "disabled" field
The per CPU "disabled" value was the original way to disable tracing when
the tracing subsystem was first created. Today, the ring buffer
infrastructure has its own way to disable tracing. In fact, things have
changed so much since 2008 that many things ignore the disable flag.

Do not bother disabling the function graph tracer if the per CPU disabled
field is set. Just record as normal. If tracing is disabled in the ring
buffer it will not be recorded.

Also, when tracing is enabled again, it will not drop the return call of
the function.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212235.715752008@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:19:10 -04:00
Steven Rostedt a9839d2048 tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled
The per CPU "disabled" value was the original way to disable tracing when
the tracing subsystem was first created. Today, the ring buffer
infrastructure has its own way to disable tracing. In fact, things have
changed so much since 2008 that many things ignore the disable flag.

The kdb_ftdump() function iterates over all the current tracing CPUs and
increments the "disabled" counter before doing the dump, and decrements it
afterward.

As the disabled flag can be ignored, doing this today is not reliable.
Instead, simply call tracer_tracing_off() and then tracer_tracing_on() to
disable and then enabled the entire ring buffer in one go!

Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Thompson <danielt@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/20250505212235.549033722@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:18:47 -04:00
Steven Rostedt 6ba3e0533f tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one()
The per CPU "disabled" value was the original way to disable tracing when
the tracing subsystem was first created. Today, the ring buffer
infrastructure has its own way to disable tracing. In fact, things have
changed so much since 2008 that many things ignore the disable flag.

The ftrace_dump_one() function iterates over all the current tracing CPUs and
increments the "disabled" counter before doing the dump, and decrements it
afterward.

As the disabled flag can be ignored, doing this today is not reliable.
Instead use the new tracer_tracing_disable() that calls into the ring
buffer code to do the disabling.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212235.381188238@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:18:40 -04:00
Steven Rostedt dbecef68ad tracing: Add tracer_tracing_disable/enable() functions
Allow a tracer to disable writing to its buffer for a temporary amount of
time and re-enable it.

The tracer_tracing_disable() will disable writing to the trace array
buffer, and requires a tracer_tracing_enable() to re-enable it.

The difference between tracer_tracing_disable() and tracer_tracing_off()
is that the disable version can nest, and requires as many enable() calls
as disable() calls to re-enable the buffer. Where as the off() function
can be called multiple times and only requires a singe tracer_tracing_on()
to re-enable the buffer.

Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Thompson <danielt@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/20250505212235.210330010@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-09 15:18:21 -04:00
Feng Yang ee971630f2 bpf: Allow some trace helpers for all prog types
if it works under NMI and doesn't use any context-dependent things,
should be fine for any program type. The detailed discussion is in [1].

[1] https://lore.kernel.org/all/CAEf4Bza6gK3dsrTosk6k3oZgtHesNDSrDd8sdeQ-GiS6oJixQg@mail.gmail.com/

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20250506061434.94277-2-yangfeng59949@163.com
2025-05-09 10:37:10 -07:00
Steven Rostedt 1577683a92 tracing: Just use this_cpu_read() to access ignore_pid
The ignore_pid boolean on the per CPU data descriptor is updated at
sched_switch when a new task is scheduled in. If the new task is to be
ignored, it is set to true, otherwise it is set to false. The current task
should always have the correct value as it is updated when the task is
scheduled in.

Instead of breaking up the read of this value, which requires preemption
to be disabled, just use this_cpu_read() which gives a snapshot of the
value. Since the value will always be correct for a given task (because
it's updated at sched switch) it doesn't need preemption disabled.

This will also allow trace events to be called with preemption enabled.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212235.038958766@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-08 09:36:09 -04:00
Steven Rostedt c638ebd823 ftrace: Do not bother checking per CPU "disabled" flag
The per CPU "disabled" value was the original way to disable tracing when
the tracing subsystem was first created. Today, the ring buffer
infrastructure has its own way to disable tracing. In fact, things have
changed so much since 2008 that many things ignore the disable flag.

There's no reason for the function tracer to check it, if tracing is
disabled, the ring buffer will not record the event anyway.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212234.868972758@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-08 09:36:09 -04:00
Steven Rostedt 6936298393 tracing/mmiotrace: Remove reference to unused per CPU data pointer
The mmiotracer referenced the per CPU array_buffer->data descriptor but
never actually used it. Remove the references to it.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212234.696945463@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-08 09:36:09 -04:00
Tomas Glozar 17f89102fe tracing/osnoise: Allow arbitrarily long CPU string
Allocate kernel memory for processing CPU string
(/sys/kernel/tracing/osnoise/cpus) also in osnoise_cpus_write to allow
the writing of a CPU string of an arbitrary length.

This replaces the 256-byte buffer, which is insufficient with the rising
number of CPUs. For example, if I wanted to measure on every even CPU
on a system with 256 CPUs, the string would be 456 characters long.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250425091839.343289-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-08 09:36:09 -04:00
Steven Rostedt a54665ab7c ftrace: Comment that ftrace_func_mapper is freed with free_ftrace_hash()
The structure ftrace_func_mapper only contains a single field and that is
a ftrace_hash. It is used to abstract it out from a normal hash to control
users of how it gets modified.

The freeing of a ftrace_func_mapper structure is:

  free_ftrace_hash(&mapper->hash);

Without context, this looks like a bug. It should be commented that it is
not a bug and it is freed this way.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250416165420.5c717420@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-08 09:36:08 -04:00
Ilya Leoshkevich 761ef34228 ftrace: Expose call graph depth as unsigned int
Depth is stored as int because the code uses negative values to break
out of iterations. But what is recorded is always zero or positive. So
expose it as unsigned int instead of int.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250412221847.17310-3-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-08 09:36:08 -04:00
Steven Rostedt 88cefd99ee ftrace: Show subops in enabled_functions
The function graph infrastructure uses subops of the function tracer.
These are not shown in enabled_functions. Add a "subops:" section to the
enabled_functions line to show what functions are attached via subops. If
the subops is from the function_graph infrastructure, then show the entry
and return callbacks that are attached.

Here's an example of the output:

schedule_on_each_cpu (1)                tramp: 0xffffffffc03ef000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60     subops: {ent:trace_graph_entry+0x0/0x20 ret:trace_graph_return+0x0/0x150}

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250410153830.5d97f108@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-08 09:36:08 -04:00
Al Viro 2dbf6e0df4 kill vfs_submount()
The last remaining user of vfs_submount() (tracefs) is easy to convert
to fs_context_for_submount(); do that and bury that thing, along with
SB_SUBMOUNT

Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-05-06 12:49:07 -04:00
Steven Rostedt 54c53dfdb6 tracing: Add common_comm to histograms
If one wants to trace the name of the task that wakes up a process and
pass that to the synthetic events, there's nothing currently that lets the
synthetic events do that. Add a "common_comm" to the histogram logic that
allows histograms save the current->comm as a variable that can be passed
through and added to a synthetic event:

 # cd /sys/kernel/tracing
 # echo 's:wake_lat char[] waker; char[] wakee; u64 delta;' >> dynamic_events
 # echo 'hist:keys=pid:comm=common_comm:ts=common_timestamp.usecs if !(common_flags & 0x18)' > events/sched/sched_waking/trigger
 # echo 'hist:keys=next_pid:wake_comm=$comm:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,$wake_comm,next_comm,$delta)' > events/sched/sched_switch/trigger

The above will create a synthetic trace event that will save both the name
of the waker and the wakee but only if the wakeup did not happen in a hard
or soft interrupt context.

The "common_comm" is used to save the task->comm at the time of the
initial event and is passed via the "comm" variable to the second event,
and that is saved as the "waker" field in the "wake_lat" synthetic event.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250407154912.3c6c6246@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:37:03 -04:00
Steven Rostedt 7ab0fc61ce tracing: Move histogram trigger variables from stack to per CPU structure
The histogram trigger has three somewhat large arrays on the kernel stack:

	unsigned long entries[HIST_STACKTRACE_DEPTH];
	u64 var_ref_vals[TRACING_MAP_VARS_MAX];
	char compound_key[HIST_KEY_SIZE_MAX];

Checking the function event_hist_trigger() stack frame size, it currently
uses 816 bytes for its stack frame due to these variables!

Instead, allocate a per CPU structure that holds these arrays for each
context level (normal, softirq, irq and NMI). That is, each CPU will have
4 of these structures. This will be allocated when the first histogram
trigger is enabled and freed when the last is disabled. When the
histogram callback triggers, it will request this structure. The request
will disable preemption, get the per CPU structure at the index of the
per CPU variable, and increment that variable.

The callback will use the arrays in this structure to perform its work and
then release the structure. That in turn will simply decrement the per CPU
index and enable preemption.

Moving the variables from the kernel stack to the per CPU structure brings
the stack frame of event_hist_trigger() down to just 112 bytes.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://lore.kernel.org/20250407123851.74ea8d58@gandalf.local.home
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:36:11 -04:00
Steven Rostedt 872a0d90c1 tracing: Always use memcpy() in histogram add_to_key()
The add_to_key() function tests if the key is a string or some data. If
it's a string it does some further calculations of the string size (still
truncating it to the max size it can be), and calls strncpy().

If the key isn't as string it calls memcpy(). The interesting point is
that both use the exact same parameters:

                strncpy(compound_key + key_field->offset, (char *)key, size);
        } else
                memcpy(compound_key + key_field->offset, key, size);

As strncpy() is being used simply as a memcpy() for a string, and since
strncpy() is deprecated, just call memcpy() for both memory and string
keys.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/20250403210637.1c477d4a@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:35:34 -04:00
Steven Rostedt 3e4b37160b tracing: Show preempt and irq events callsites from the offsets in field print
When the "fields" option is set in a trace instance, it ignores the "print fmt"
portion of the trace event and just prints the raw fields defined by the
TP_STRUCT__entry() of the TRACE_EVENT() macro.

The preempt_disable/enable and irq_disable/enable events record only the
caller offset from _stext to save space in the ring buffer. Even though
the "fields" option only prints the fields, it also tries to print what
they represent too, which includes function names.

Add a check in the output of the event field printing to see if the field
name is "caller_offs" or "parent_offs" and then print the function at the
offset from _stext of that field.

Instead of just showing:

  irq_disable: caller_offs=0xba634d (12215117) parent_offs=0x39d10e2 (60625122)

Show:

  irq_disable: caller_offs=trace_hardirqs_off.part.0+0xad/0x130 0xba634d (12215117) parent_offs=_raw_spin_lock_irqsave+0x62/0x70 0x39d10e2 (60625122)

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250506105131.4b6089a9@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:34:52 -04:00
Steven Rostedt dc6a49d4cd tracing: Adjust addresses for printing out fields
Add adjustments to the values of the "fields" output if the buffer is a
persistent ring buffer to adjust the addresses to both the kernel core and
kernel modules if they match a module in the persistent memory and that
module is also loaded.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250325185619.54b85587@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:32:29 -04:00
Steven Rostedt 00d872dd54 tracing: Only return an adjusted address if it matches the kernel address
The trace_adjust_address() will take a given address and examine the
persistent ring buffer to see if the address matches a module that is
listed there. If it does not, it will just adjust the value to the core
kernel delta. But if the address was for something that was not part of
the core kernel text or data it should not be adjusted.

Check the result of the adjustment and only return the adjustment if it
lands in the current kernel text or data. If not, return the original
address.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250506102300.0ba2f9e0@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:31:45 -04:00
Steven Rostedt 531ee10b43 tracing: Show function names when possible when listing fields
When the "fields" option is enabled, the "print fmt" of the trace event is
ignored and only the fields are printed. But some fields contain function
pointers. Instead of just showing the hex value in this case, show the
function name when possible:

Instead of having:

 # echo 1 > options/fields
 # cat trace
 [..]
  kmem_cache_free: call_site=0xffffffffa9afcf31 (-1448095951) ptr=0xffff888124452910 (-131386736039664) name=kmemleak_object

Have it output:

  kmem_cache_free: call_site=rcu_do_batch+0x3d1/0x14a0 (-1768960207) ptr=0xffff888132ea5ed0 (854220496) name=kmemleak_object

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250325213919.624181915@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:30:56 -04:00
Steven Rostedt e3223e1e9a tracing: Update function trace addresses with module addresses
Now that module addresses are saved in the persistent ring buffer, their
addresses can be used to adjust the address in the persistent ring buffer
to the address of the module that is currently loaded.

Instead of blindly using the text_delta that only works for core kernel
code, call the trace_adjust_address() that will see if the address matches
an address saved in the persistent ring buffer, and then uses that against
the matching module if it is loaded.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250506111648.5df7f3ec@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-06 11:30:17 -04:00
Christoph Hellwig eeadd68e2a block: remove bounce buffering support
The block layer bounce buffering support is unused now, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250505081138.3435992-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05 13:22:39 -06:00
Al Viro 006ff7498f saner calling conventions for ->d_automount()
Currently the calling conventions for ->d_automount() instances have
an odd wart - returned new mount to be attached is expected to have
refcount 2.

That kludge is intended to make sure that mark_mounts_for_expiry() called
before we get around to attaching that new mount to the tree won't decide
to take it out.  finish_automount() drops the extra reference after it's
done with attaching mount to the tree - or drops the reference twice in
case of error.  ->d_automount() instances have rather counterintuitive
boilerplate in them.

There's a much simpler approach: have mark_mounts_for_expiry() skip the
mounts that are yet to be mounted.  And to hell with grabbing/dropping
those extra references.  Makes for simpler correctness analysis, at that...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-05-05 13:42:49 -04:00
Steven Rostedt 0a8f11f856 tracing: Do not take trace_event_sem in print_event_fields()
On some paths in print_event_fields() it takes the trace_event_sem for
read, even though it should always be held when the function is called.

Remove the taking of that mutex and add a lockdep_assert_held_read() to
make sure the trace_event_sem is held when print_event_fields() is called.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250501224128.0b1f0571@batman.local.home
Fixes: 80a76994b2 ("tracing: Add "fields" option to show raw trace event fields")
Reported-by: syzbot+441582c1592938fccf09@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6813ff5e.050a0220.14dd7d.001b.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 22:44:52 -04:00
Steven Rostedt 1be8e54a1e tracing: Fix trace_adjust_address() when there is no modules in scratch area
The function trace_adjust_address() is used to map addresses of modules
stored in the persistent memory and are also loaded in the current boot to
return the current address for the module.

If there's only one module entry, it will simply use that, otherwise it
performs a bsearch of the entry array to find the modules to offset with.

The issue is if there are no modules in the array. The code does not
account for that and ends up referencing the first element in the array
which does not exist and causes a crash.

If nr_entries is zero, exit out early as if this was a core kernel
address.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250501151909.65910359@gandalf.local.home
Fixes: 35a380ddbc ("tracing: Show last module text symbols in the stacktrace")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 16:06:55 -04:00
Colin Ian King 3c1d9cfa84 ftrace: Fix NULL memory allocation check
The check for a failed memory location is incorrectly checking
the wrong level of pointer indirection by checking !filter_hash
rather than !*filter_hash.  Fix this.

Cc: asami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250422221335.89896-1-colin.i.king@gmail.com
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 15:46:19 -04:00
Jeongjun Park f5178c41bb tracing: Fix oob write in trace_seq_to_buffer()
syzbot reported this bug:
==================================================================
BUG: KASAN: slab-out-of-bounds in trace_seq_to_buffer kernel/trace/trace.c:1830 [inline]
BUG: KASAN: slab-out-of-bounds in tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822
Write of size 4507 at addr ffff888032b6b000 by task syz.2.320/7260

CPU: 1 UID: 0 PID: 7260 Comm: syz.2.320 Not tainted 6.15.0-rc1-syzkaller-00301-g3bde70a2c827 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:408 [inline]
 print_report+0xc3/0x670 mm/kasan/report.c:521
 kasan_report+0xe0/0x110 mm/kasan/report.c:634
 check_region_inline mm/kasan/generic.c:183 [inline]
 kasan_check_range+0xef/0x1a0 mm/kasan/generic.c:189
 __asan_memcpy+0x3c/0x60 mm/kasan/shadow.c:106
 trace_seq_to_buffer kernel/trace/trace.c:1830 [inline]
 tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822
 ....
==================================================================

It has been reported that trace_seq_to_buffer() tries to copy more data
than PAGE_SIZE to buf. Therefore, to prevent this, we should use the
smaller of trace_seq_used(&iter->seq) and PAGE_SIZE as an argument.

Link: https://lore.kernel.org/20250422113026.13308-1-aha310510@gmail.com
Reported-by: syzbot+c8cd2d2c412b868263fb@syzkaller.appspotmail.com
Fixes: 3c56819b14 ("tracing: splice support for tracing_pipe")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-01 15:24:15 -04:00
Feng Yang 6aca583f90 bpf: Streamline allowed helpers between tracing and base sets
Many conditional checks in switch-case are redundant
with bpf_base_func_proto and should be removed.

Regarding the permission checks bpf_base_func_proto:
The permission checks in bpf_prog_load (as outlined below)
ensure that the trace has both CAP_BPF and CAP_PERFMON capabilities,
thus enabling the use of corresponding prototypes
in bpf_base_func_proto without adverse effects.
bpf_prog_load
	......
	bpf_cap = bpf_token_capable(token, CAP_BPF);
	......
	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
	    type != BPF_PROG_TYPE_CGROUP_SKB &&
	    !bpf_cap)
		goto put_token;
	......
	if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON))
		goto put_token;
	......

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20250423073151.297103-1-yangfeng59949@163.com
2025-04-23 10:52:16 -07:00
Alexei Starovoitov 5709be4c35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3
Cross-merge bpf and other fixes after downstream PRs.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-21 08:04:38 -07:00
Steven Rostedt a8c5b0ed89 tracing: Fix filter string testing
The filter string testing uses strncpy_from_kernel/user_nofault() to
retrieve the string to test the filter against. The if() statement was
incorrect as it considered 0 as a fault, when it is only negative that it
faulted.

Running the following commands:

  # cd /sys/kernel/tracing
  # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter
  # echo 1 > events/syscalls/sys_enter_openat/enable
  # ls /proc/$$/maps
  # cat trace

Would produce nothing, but with the fix it will produce something like:

      ls-1192    [007] .....  8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0)

Link: https://lore.kernel.org/all/CAEf4BzbVPQ=BjWztmEwBPRKHUwNfKBkS3kce-Rzka6zvbQeVpg@mail.gmail.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250417183003.505835fb@gandalf.local.home
Fixes: 77360f9bbc ("tracing: Add test for user space strings when filtering on string pointers")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 22:16:56 -04:00
Ilya Leoshkevich 3b4e87e6a5 ftrace: Fix type of ftrace_graph_ent_entry.depth
ftrace_graph_ent.depth is int, but ftrace_graph_ent_entry.depth is
unsigned long. This confuses trace-cmd on 64-bit big-endian systems and
makes it print a huge amount of spaces. Fix this by using unsigned int,
which has a matching size, instead.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250412221847.17310-2-iii@linux.ibm.com
Fixes: ff5c9c576e ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:19:15 -04:00
Menglong Dong 92f1d3b401 ftrace: fix incorrect hash size in register_ftrace_direct()
The maximum of the ftrace hash bits is made fls(32) in
register_ftrace_direct(), which seems illogical. So, we fix it by making
the max hash bits FTRACE_HASH_MAX_BITS instead.

Link: https://lore.kernel.org/20250413014444.36724-1-dongml2@chinatelecom.cn
Fixes: d05cb47066 ("ftrace: Fix modification of direct_function hash while in use")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:16:51 -04:00
Steven Rostedt c45c585dde ftrace: Free ftrace hashes after they are replaced in the subops code
The subops processing creates new hashes when adding and removing subops.
There were some places that the old hashes that were replaced were not
freed and this caused some memory leaks.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250417135939.245b128d@gandalf.local.home
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:16:07 -04:00
Steven Rostedt 08275e59a7 ftrace: Reinitialize hash to EMPTY_HASH after freeing
There's several locations that free a ftrace hash pointer but may be
referenced again. Reset them to EMPTY_HASH so that a u-a-f bug doesn't
happen.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250417110933.20ab718b@gandalf.local.home
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:15:28 -04:00
Steven Rostedt 31d1139956 ftrace: Initialize variables for ftrace_startup/shutdown_subops()
The reworking to fix and simplify the ftrace_startup_subops() and the
ftrace_shutdown_subops() made it possible for the filter_hash and
notrace_hash variables to be used uninitialized in a way that the compiler
did not catch it.

Initialize both filter_hash and notrace_hash to the EMPTY_HASH as that is
what they should be if they never are used.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250417104017.3aea66c2@gandalf.local.home
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Fixes: 0ae6b8ce20 ("ftrace: Fix accounting of subop hashes")
Closes: https://lore.kernel.org/all/1db64a42-626d-4b3a-be08-c65e47333ce2@linux.ibm.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17 15:15:05 -04:00
Linus Torvalds 7cdabafc00 tracing fixes for v6.15
- Hide get_vm_area() from MMUless builds
 
   The function get_vm_area() is not defined when CONFIG_MMU is not defined.
   Hide that function within #ifdef CONFIG_MMU.
 
 - Fix output of synthetic events when they have dynamic strings
 
   The print fmt of the synthetic event's format file use to have "%.*s" for
   dynamic size strings even though the user space exported arguments had
   only __get_str() macro that provided just a nul terminated string. This
   was fixed so that user space could parse this properly. But the reason
   that it had "%.*s" was because internally it provided the maximum size of
   the string as one of the arguments. The fix that replaced "%.*s" with "%s"
   caused the trace output (when the kernel reads the event) to write
   "(efault)" as it would now read the length of the string as "%s".
 
   As the string provided is always nul terminated, there's no reason for the
   internal code to use "%.*s" anyway. Just remove the length argument to
   match the "%s" that is now in the format.
 
 - Fix the ftrace subops hash logic of the manager ops hash
 
   The function_graph uses the ftrace subops code. The subops code is a way
   to have a single ftrace_ops registered with ftrace to determine what
   functions will call the ftrace_ops callback. More than one user of
   function graph can register a ftrace_ops with it. The function graph
   infrastructure will then add this ftrace_ops as a subops with the main
   ftrace_ops it registers with ftrace. This is because the functions will
   always call the function graph callback which in turn calls the subops
   ftrace_ops callbacks.
 
   The main ftrace_ops must add a callback to all the functions that the
   subops want a callback from. When a subops is registered, it will update
   the main ftrace_ops hash to include the functions it wants. This is the
   logic that was broken.
 
   The ftrace_ops hash has a "filter_hash" and a "notrace_hash" were all the
   functions in the filter_hash but not in the notrace_hash are attached by
   ftrace. The original logic would have the main ftrace_ops filter_hash be a
   union of all the subops filter_hashes and the main notrace_hash would be a
   intersect of all the subops filter hashes. But this was incorrect because
   the notrace hash depends on the filter_hash it is associated to and not
   the union of all filter_hashes.
 
   Instead, when a subops is added, just include all the functions of the
   subops hash that are in its filter_hash but not in its notrace_hash. The
   main subops hash should not use its notrace hash, unless all of its subops
   hashes have an empty filter_hash (which means to attach to all functions),
   and then, and only then, the main ftrace_ops notrace hash can be the
   intersect of all the subops hashes.
 
   This not only fixes the bug, but also simplifies the code.
 
 - Add a selftest to better test the subops filtering
 
   Add a selftest that would catch the bug fixed by the above change.
 
 - Fix extra newline printed in function tracing with retval
 
   The function parameter code changed the output logic slightly and called
   print_graph_retval() and also printed a newline. The print_graph_retval()
   also prints a newline which caused blank lines to be printed in the
   function graph tracer when retval was added. This caused one of the
   selftests to fail if retvals were enabled. Instead remove the new line
   output from print_graph_retval() and have the callers always print the
   new line so that it doesn't have to do special logic if it calls
   print_graph_retval() or not.
 
 - Fix out-of-bound memory access in the runtime verifier
 
   When rv_is_container_monitor() is called on the last entry on the link
   list it references the next entry, which is the list head and causes an
   out-of-bound memory access.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ/rXQxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoj7AQC0C2awpJSUIRj91qjPtMYuNUE3AVpB
 EEZEkt19LfE//gEA1fOx3Cors/LrY9dthn/3LMKL23vo9c4i0ffhs2X+1gE=
 =XJL5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Hide get_vm_area() from MMUless builds

   The function get_vm_area() is not defined when CONFIG_MMU is not
   defined. Hide that function within #ifdef CONFIG_MMU.

 - Fix output of synthetic events when they have dynamic strings

   The print fmt of the synthetic event's format file use to have "%.*s"
   for dynamic size strings even though the user space exported
   arguments had only __get_str() macro that provided just a nul
   terminated string. This was fixed so that user space could parse this
   properly.

   But the reason that it had "%.*s" was because internally it provided
   the maximum size of the string as one of the arguments. The fix that
   replaced "%.*s" with "%s" caused the trace output (when the kernel
   reads the event) to write "(efault)" as it would now read the length
   of the string as "%s".

   As the string provided is always nul terminated, there's no reason
   for the internal code to use "%.*s" anyway. Just remove the length
   argument to match the "%s" that is now in the format.

 - Fix the ftrace subops hash logic of the manager ops hash

   The function_graph uses the ftrace subops code. The subops code is a
   way to have a single ftrace_ops registered with ftrace to determine
   what functions will call the ftrace_ops callback. More than one user
   of function graph can register a ftrace_ops with it. The function
   graph infrastructure will then add this ftrace_ops as a subops with
   the main ftrace_ops it registers with ftrace. This is because the
   functions will always call the function graph callback which in turn
   calls the subops ftrace_ops callbacks.

   The main ftrace_ops must add a callback to all the functions that the
   subops want a callback from. When a subops is registered, it will
   update the main ftrace_ops hash to include the functions it wants.
   This is the logic that was broken.

   The ftrace_ops hash has a "filter_hash" and a "notrace_hash" where
   all the functions in the filter_hash but not in the notrace_hash are
   attached by ftrace. The original logic would have the main ftrace_ops
   filter_hash be a union of all the subops filter_hashes and the main
   notrace_hash would be a intersect of all the subops filter hashes.
   But this was incorrect because the notrace hash depends on the
   filter_hash it is associated to and not the union of all
   filter_hashes.

   Instead, when a subops is added, just include all the functions of
   the subops hash that are in its filter_hash but not in its
   notrace_hash. The main subops hash should not use its notrace hash,
   unless all of its subops hashes have an empty filter_hash (which
   means to attach to all functions), and then, and only then, the main
   ftrace_ops notrace hash can be the intersect of all the subops
   hashes.

   This not only fixes the bug, but also simplifies the code.

 - Add a selftest to better test the subops filtering

   Add a selftest that would catch the bug fixed by the above change.

 - Fix extra newline printed in function tracing with retval

   The function parameter code changed the output logic slightly and
   called print_graph_retval() and also printed a newline. The
   print_graph_retval() also prints a newline which caused blank lines
   to be printed in the function graph tracer when retval was added.
   This caused one of the selftests to fail if retvals were enabled.
   Instead remove the new line output from print_graph_retval() and have
   the callers always print the new line so that it doesn't have to do
   special logic if it calls print_graph_retval() or not.

 - Fix out-of-bound memory access in the runtime verifier

   When rv_is_container_monitor() is called on the last entry on the
   link list it references the next entry, which is the list head and
   causes an out-of-bound memory access.

* tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix out-of-bound memory access in rv_is_container_monitor()
  ftrace: Do not have print_graph_retval() add a newline
  tracing/selftest: Add test to better test subops filtering of function graph
  ftrace: Fix accounting of subop hashes
  ftrace: Properly merge notrace hashes
  tracing: Do not add length to print format in synthetic events
  tracing: Hide get_vm_area() from MMUless builds
2025-04-12 15:37:40 -07:00
Nam Cao 8d7861ac50 rv: Fix out-of-bound memory access in rv_is_container_monitor()
When rv_is_container_monitor() is called on the last monitor in
rv_monitors_list, KASAN yells:

  BUG: KASAN: global-out-of-bounds in rv_is_container_monitor+0x101/0x110
  Read of size 8 at addr ffffffff97c7c798 by task setup/221

  The buggy address belongs to the variable:
   rv_monitors_list+0x18/0x40

This is due to list_next_entry() is called on the last entry in the list.
It wraps around to the first list_head, and the first list_head is not
embedded in struct rv_monitor_def.

Fix it by checking if the monitor is last in the list.

Cc: stable@vger.kernel.org
Cc: Gabriele Monaco <gmonaco@redhat.com>
Fixes: cb85c660fc ("rv: Add option for nested monitors and include sched")
Link: https://lore.kernel.org/e85b5eeb7228bfc23b8d7d4ab5411472c54ae91b.1744355018.git.namcao@linutronix.de
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-12 12:13:30 -04:00
Steven Rostedt 485acd207d ftrace: Do not have print_graph_retval() add a newline
The retval and retaddr options for function_graph tracer will add a
comment at the end of a function for both leaf and non leaf functions that
looks like:

               __wake_up_common(); /* ret=0x1 */

               } /* pick_next_task_fair ret=0x0 */

The function print_graph_retval() adds a newline after the "*/". But if
that's not called, the caller function needs to make sure there's a
newline added.

This is confusing and when the function parameters code was added, it
added a newline even when calling print_graph_retval() as the fact that
the print_graph_retval() function prints a newline isn't obvious.

This caused an extra newline to be printed and that made it fail the
selftests when the retval option was set, as the selftests were not
expecting blank lines being injected into the trace.

Instead of having print_graph_retval() print a newline, just have the
caller always print the newline regardless if it calls print_graph_retval()
or not. This not only fixes this bug, but it also simplifies the code.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250411133015.015ca393@gandalf.local.home
Reported-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/ccc40f2b-4b9e-4abd-8daf-d22fce2a86f0@sirena.org.uk/
Fixes: ff5c9c576e ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-12 12:13:30 -04:00
Steven Rostedt 0ae6b8ce20 ftrace: Fix accounting of subop hashes
The function graph infrastructure uses ftrace to hook to functions. It has
a single ftrace_ops to manage all the users of function graph. Each
individual user (tracing, bpf, fprobes, etc) has its own ftrace_ops to
track the functions it will have its callback called from. These
ftrace_ops are "subops" to the main ftrace_ops of the function graph
infrastructure.

Each ftrace_ops has a filter_hash and a notrace_hash that is defined as:

  Only trace functions that are in the filter_hash but not in the
  notrace_hash.

If the filter_hash is empty, it means to trace all functions.
If the notrace_hash is empty, it means do not disable any function.

The function graph main ftrace_ops needs to be a superset containing all
the functions to be traced by all the subops it has. The algorithm to
perform this merge was incorrect.

When the first subops was added to the main ops, it simply made the main
ops a copy of the subops (same filter_hash and notrace_hash).

When a second ops was added, it joined the new subops filter_hash with the
main ops filter_hash as a union of the two sets. The intersect between the
new subops notrace_hash and the main ops notrace_hash was created as the
new notrace_hash of the main ops.

The issue here is that it would then start tracing functions than no
subops were tracing. For example if you had two subops that had:

subops 1:

  filter_hash = '*sched*' # trace all functions with "sched" in it
  notrace_hash = '*time*' # except do not trace functions with "time"

subops 2:

  filter_hash = '*lock*' # trace all functions with "lock" in it
  notrace_hash = '*clock*' # except do not trace functions with "clock"

The intersect of '*time*' functions with '*clock*' functions could be the
empty set. That means the main ops will be tracing all functions with
'*time*' and all "*clock*" in it!

Instead, modify the algorithm to be a bit simpler and correct.

First, when adding a new subops, even if it's the first one, do not add
the notrace_hash if the filter_hash is not empty. Instead, just add the
functions that are in the filter_hash of the subops but not in the
notrace_hash of the subops into the main ops filter_hash. There's no
reason to add anything to the main ops notrace_hash.

The notrace_hash of the main ops should only be non empty iff all subops
filter_hashes are empty (meaning to trace all functions) and all subops
notrace_hashes include the same functions.

That is, the main ops notrace_hash is empty if any subops filter_hash is
non empty.

The main ops notrace_hash only has content in it if all subops
filter_hashes are empty, and the content are only functions that intersect
all the subops notrace_hashes. If any subops notrace_hash is empty, then
so is the main ops notrace_hash.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/20250409152720.216356767@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-11 16:02:08 -04:00
Andy Chiu 04a80a34c2 ftrace: Properly merge notrace hashes
The global notrace hash should be jointly decided by the intersection of
each subops's notrace hash, but not the filter hash.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250408160258.48563-1-andybnac@gmail.com
Fixes: 5fccc7552c ("ftrace: Add subops logic to allow one ops to manage many")
Signed-off-by: Andy Chiu <andybnac@gmail.com>
[ fixed removing of freeing of filter_hash ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-11 15:14:54 -04:00
Tao Chen a76116f422 bpf: Check link_create.flags parameter for multi_uprobe
The link_create.flags are currently not used for multi-uprobes, so return
-EINVAL if it is set, same as for other attach APIs.

We allow target_fd to have an arbitrary value for multi-uprobe, though,
as there are existing users (libbpf) relying on this.

Fixes: 89ae89f53d ("bpf: Add multi uprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250407035752.1108927-2-chen.dylane@linux.dev
2025-04-09 16:28:51 -07:00
Tao Chen 243911982a bpf: Check link_create.flags parameter for multi_kprobe
The link_create.flags are currently not used for multi-kprobes, so return
-EINVAL if it is set, same as for other attach APIs.

We allow target_fd, on the other hand, to have an arbitrary value for
multi-kprobe, as there are existing users (libbpf) relying on this.

Fixes: 0dcac27254 ("bpf: Add multi kprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250407035752.1108927-1-chen.dylane@linux.dev
2025-04-09 16:28:22 -07:00
Steven Rostedt e1a453a57b tracing: Do not add length to print format in synthetic events
The following causes a vsnprintf fault:

  # echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
  # echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
  # echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger

Because the synthetic event's "wakee" field is created as a dynamic string
(even though the string copied is not). The print format to print the
dynamic string changed from "%*s" to "%s" because another location
(__set_synth_event_print_fmt()) exported this to user space, and user
space did not need that. But it is still used in print_synth_event(), and
the output looks like:

          <idle>-0       [001] d..5.   193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
    sshd-session-879     [001] d..5.   193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
          <idle>-0       [002] d..5.   193.811198: wake_lat: wakee=(efault)bashdelta=91
            bash-880     [002] d..5.   193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21
          <idle>-0       [001] d..5.   193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129
    sshd-session-879     [001] d..5.   193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50

The length isn't needed as the string is always nul terminated. Just print
the string and not add the length (which was hard coded to the max string
length anyway).

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home
Fixes: 4d38328eb4 ("tracing: Fix synth event printk format for str fields");
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-09 11:34:21 -04:00
Joel Granados 67049b53e0 stack_tracer: move sysctl registration to kernel/trace/trace_stack.c
Move stack_tracer_enabled into trace_stack_sysctl_table. This is part of
a greater effort to move ctl tables into their respective subsystems
which will reduce the merge conflicts in kernel/sysctl.c.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-04-09 13:32:16 +02:00
Joel Granados dd293df639 tracing: Move trace sysctls into trace.c
Move trace ctl tables into their own const array in
kernel/trace/trace.c. The sysctl table register is called with
subsys_initcall placing if after its original place in proc_root_init.
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kernel/sysctl.c.

Signed-off-by: Joel Granados <joel.granados@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-09 13:32:16 +02:00
Linus Torvalds bec7dcbc24 Probes fixes for v6.14:
- fprobe: Fix to remove fprobe_hlist_node when module unloading
 
   When a fprobe target module is removed, the fprobe_hlist_node
   should be removed from the fprobe's hash table to prevent reusing
   accidentally if another module is loaded at the same address.
 
 - fprobe: Fix to lock module while registering fprobe
 
  The module containing the function to be probeed is locked using a
   reference counter until the fprobe registration is complete, which
   prevents use after free.
 
 - fprobe-events: Fix possible UAF on modules
 
   Basically as same as above, but in the fprobe-events layer we also
   need to get module reference counter when we find the tracepoint
   in the module.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmf0kJ8bHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b+bEIALiFuYBn2y26OJnfaRnW
 rgSC2JupswEVg7HwsN5kA/x1ypXl9SPfJGjbiHLUTq9+4KGOBTmY+k5/OpVO+Qkh
 3nYKOkZxKRTglA7hRSTH0rxDV1eobps4nv/xkPjprugcjCGU54+4yb9Hq7Kyflpa
 o8p+VS/0VOJ9f3Iy9a9JRfu9qE7Qzz9USCj4N64WMgx/qczPe27twqFEaUpTf1VW
 Sw9twtKnqGs9hNE2QmhlzUBuq6gOZMXkjH6t1U4pMWBGB51JqZ5ZBhC4kL/5XEIZ
 bEau82El5qdieQC2B7c0RxldceKa4t4QUlJDalZGKpxvTXrCw9rFyv0dRe2cXnKm
 Yo0=
 =I+MO
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - fprobe: remove fprobe_hlist_node when module unloading

   When a fprobe target module is removed, the fprobe_hlist_node should
   be removed from the fprobe's hash table to prevent reusing
   accidentally if another module is loaded at the same address.

 - fprobe: lock module while registering fprobe

   The module containing the function to be probeed is locked using a
   reference counter until the fprobe registration is complete, which
   prevents use after free.

 - fprobe-events: fix possible UAF on modules

   Basically as same as above, but in the fprobe-events layer we also
   need to get module reference counter when we find the tracepoint in
   the module.

* tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: fprobe: Cleanup fprobe hash when module unloading
  tracing: fprobe events: Fix possible UAF on modules
  tracing: fprobe: Fix to lock module while registering fprobe
2025-04-08 12:51:34 -07:00
Masami Hiramatsu (Google) a3dc2983ca tracing: fprobe: Cleanup fprobe hash when module unloading
Cleanup fprobe address hash table on module unloading because the
target symbols will be disappeared when unloading module and not
sure the same symbol is mapped on the same address.

Note that this is at least disables the fprobes if a part of target
symbols on the unloaded modules. Unlike kprobes, fprobe does not
re-enable the probe point by itself. To do that, the caller should
take care register/unregister fprobe when loading/unloading modules.
This simplifies the fprobe state managememt related to the module
loading/unloading.

Link: https://lore.kernel.org/all/174343534473.843280.13988101014957210732.stgit@devnote2/

Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-08 08:46:25 +09:00
Steven Rostedt 4808595a99 tracing: Hide get_vm_area() from MMUless builds
The function get_vm_area() is not defined for non-MMU builds and causes a
build error if it is used. Hide the map_pages() function around a:

 #ifdef CONFIG_MMU

to keep it from being compiled when CONFIG_MMU is not set.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250407120111.2ccc9319@gandalf.local.home
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/4f8ece8b-8862-4f7c-8ede-febd28f8a9fe@roeck-us.net/
Fixes: 394f3f02de ("tracing: Use vmap_page_range() to map memmap ring buffer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-07 12:02:11 -04:00
Linus Torvalds 6cb0bd94c0 Persistent buffer cleanups and simplifications for v6.15:
It was mistaken that the physical memory returned from "reserve_mem" had to
 be vmap()'d to get to it from a virtual address. But reserve_mem already
 maps the memory to the virtual address of the kernel so a simple
 phys_to_virt() can be used to get to the virtual address from the physical
 memory returned by "reserve_mem". With this new found knowledge, the
 code can be cleaned up and simplified.
 
 - Enforce that the persistent memory is page aligned
 
   As the buffers using the persistent memory are all going to be
   mapped via pages, make sure that the memory given to the tracing
   infrastructure is page aligned. If it is not, it will print a warning
   and fail to map the buffer.
 
 - Use phys_to_virt() to get the virtual address from reserve_mem
 
   Instead of calling vmap() on the physical memory returned from
   "reserve_mem", use phys_to_virt() instead.
 
   As the memory returned by "memmap" or any other means where a physical
   address is given to the tracing infrastructure, it still needs to
   be vmap(). Since this memory can never be returned back to the buddy
   allocator nor should it ever be memmory mapped to user space, flag
   this buffer and up the ref count. The ref count will keep it from
   ever being freed, and the flag will prevent it from ever being memory
   mapped to user space.
 
 - Use vmap_page_range() for memmap virtual address mapping
 
   For the memmap buffer, instead of allocating an array of struct pages,
   assigning them to the contiguous phsycial memory and then passing that to
   vmap(), use vmap_page_range() instead
 
 - Replace flush_dcache_folio() with flush_kernel_vmap_range()
 
   Instead of calling virt_to_folio() and passing that to
   flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
   This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE
   only the PAGE_SIZE portion would be flushed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+6oZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhq6AP481KHAgaowQCg7zrKPkMlbYBIigYoU
 7aqoAg2rSLBRSQEAl8fViHZgZ9Q+O7xdozQWiIR7/KQW8VIaTcP/V7cHkAU=
 =+5JB
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:
 "Persistent buffer cleanups and simplifications.

  It was mistaken that the physical memory returned from "reserve_mem"
  had to be vmap()'d to get to it from a virtual address. But
  reserve_mem already maps the memory to the virtual address of the
  kernel so a simple phys_to_virt() can be used to get to the virtual
  address from the physical memory returned by "reserve_mem". With this
  new found knowledge, the code can be cleaned up and simplified.

   - Enforce that the persistent memory is page aligned

     As the buffers using the persistent memory are all going to be
     mapped via pages, make sure that the memory given to the tracing
     infrastructure is page aligned. If it is not, it will print a
     warning and fail to map the buffer.

   - Use phys_to_virt() to get the virtual address from reserve_mem

     Instead of calling vmap() on the physical memory returned from
     "reserve_mem", use phys_to_virt() instead.

     As the memory returned by "memmap" or any other means where a
     physical address is given to the tracing infrastructure, it still
     needs to be vmap(). Since this memory can never be returned back to
     the buddy allocator nor should it ever be memmory mapped to user
     space, flag this buffer and up the ref count. The ref count will
     keep it from ever being freed, and the flag will prevent it from
     ever being memory mapped to user space.

   - Use vmap_page_range() for memmap virtual address mapping

     For the memmap buffer, instead of allocating an array of struct
     pages, assigning them to the contiguous phsycial memory and then
     passing that to vmap(), use vmap_page_range() instead

   - Replace flush_dcache_folio() with flush_kernel_vmap_range()

     Instead of calling virt_to_folio() and passing that to
     flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
     This also fixes a bug where if a subbuffer was bigger than
     PAGE_SIZE only the PAGE_SIZE portion would be flushed"

* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
  tracing: Use vmap_page_range() to map memmap ring buffer
  tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
  tracing: Enforce the persistent ring buffer to be page aligned
2025-04-03 16:09:29 -07:00
Linus Torvalds 41677970ad tracing fixes for 6.15
- Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled
 
   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is enabled.
   In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same configs
   as the function argument tracing requires. Just have the function
   argument tracing depend on PROBE_EVENTS_BTF_ARGS.
 
 - Free module_delta for persistent ring buffer instance
 
   When an instance holds the persistent ring buffer, it allocates
   a helper array to hold the deltas between where modules are loaded
   on the last boot and the current boot. This array needs to be freed
   when the instance is freed.
 
 - Add cond_resched() to loop in ftrace_graph_set_hash()
 
   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This
   loop is known to trigger soft lockup warnings and requires a
   cond_resched(). The loop in ftrace_graph_set_hash() was missing it.
 
 - Fix the event format verifier to include "%*p.." arguments
 
   To prevent events from dereferencing stale pointers that can
   happen if a trace event uses a dereferece pointer to something
   that was not copied into the ring buffer and can be freed by the
   time the trace is read, a verifier is called. At boot or module
   load, the verifier scans the print format string for pointers
   that can be dereferenced and it checks the arguments to make sure
   they do not contain something that can be freed. The "%*p" was
   not handled, which would add another argument and cause the verifier
   to not only not verify this pointer, but it will look at the wrong
   argument for every pointer after that.
 
 - Fix mcount sorttable building for different endian type target
 
   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used
   to determine if the bytes need to be swapped when calculations are
   done. A change was made to the sorting of the mcount_loc that read
   the values from the ELF file into an array and the swap happened
   on the filling of the array. But one of the calculations of the
   array still did the swap when it did not need to. This caused building
   on a little endian machine for a big endian target to not find
   the mcount function in the 'nm' table and it zeroed it out, causing
   there to be no functions available to trace.
 
 - Add goto out_unlock jump to rv_register_monitor() on error path
 
   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+2tyBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjPYAPwJDti6nHTqheFwIa1WzJ3yC2tKRYKt
 1E5PYW/2Ct5NmwEAqgg3TvJppXHymVdutLghhGFnlBnyTWMI+KIhparSBw8=
 =NFM5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled

   The tracing of arguments in the function tracer depends on some
   functions that are only defined when PROBE_EVENTS_BTF_ARGS is
   enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same
   configs as the function argument tracing requires. Just have the
   function argument tracing depend on PROBE_EVENTS_BTF_ARGS.

 - Free module_delta for persistent ring buffer instance

   When an instance holds the persistent ring buffer, it allocates a
   helper array to hold the deltas between where modules are loaded on
   the last boot and the current boot. This array needs to be freed when
   the instance is freed.

 - Add cond_resched() to loop in ftrace_graph_set_hash()

   The hash functions in ftrace loop over every function that can be
   enabled by ftrace. This can be 50,000 functions or more. This loop is
   known to trigger soft lockup warnings and requires a cond_resched().
   The loop in ftrace_graph_set_hash() was missing it.

 - Fix the event format verifier to include "%*p.." arguments

   To prevent events from dereferencing stale pointers that can happen
   if a trace event uses a dereferece pointer to something that was not
   copied into the ring buffer and can be freed by the time the trace is
   read, a verifier is called. At boot or module load, the verifier
   scans the print format string for pointers that can be dereferenced
   and it checks the arguments to make sure they do not contain
   something that can be freed. The "%*p" was not handled, which would
   add another argument and cause the verifier to not only not verify
   this pointer, but it will look at the wrong argument for every
   pointer after that.

 - Fix mcount sorttable building for different endian type target

   When modifying the ELF file to sort the mcount_loc table in the
   sorttable.c code, the endianess of the file and the host is used to
   determine if the bytes need to be swapped when calculations are done.
   A change was made to the sorting of the mcount_loc that read the
   values from the ELF file into an array and the swap happened on the
   filling of the array. But one of the calculations of the array still
   did the swap when it did not need to. This caused building on a
   little endian machine for a big endian target to not find the mcount
   function in the 'nm' table and it zeroed it out, causing there to be
   no functions available to trace.

 - Add goto out_unlock jump to rv_register_monitor() on error path

   One of the error paths in rv_register_monitor() just returned the
   error when it should have jumped to the out_unlock label to release
   the mutex.

* tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix missing unlock on double nested monitors return path
  scripts/sorttable: Fix endianness handling in build-time mcount sort
  tracing: Verify event formats that have "%*p.."
  ftrace: Add cond_resched() to ftrace_graph_set_hash()
  tracing: Free module_delta on freeing of persistent ring buffer
  ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
2025-04-03 09:52:44 -07:00
Linus Torvalds af54a3a151 more printk changes for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmftIx0ACgkQUqAMR0iA
 lPJKUxAArJaWC7EXtDtd8u8Rl/CYpIEaMdPd7V+XA5sqdUyjkSJI+jRswonpOsNX
 Zn9pbGMds1LNBXm1NO9039+2TrPSJFTCGK6OgeCJC17/O31wnnm0LhZiU+JElgfi
 iQI5fdTnc3sB37bsjkvEUr9HFizRxY2fHHMWZ8ngiLfkKfki4ET+1u/yf7CraRk1
 6+LK9mM/WyytP6gYaSlL5YYVYs9fNcR/ND6IQgpfIN15/fOAOXWbMB1jE2iDRzqt
 MQUD4+DTYQYmeS6jQ4ToZdx3Ql9NwcP2nJnA5fxXeqPFHc/SgRS6KqOPQgQUD4tV
 N4q6ozLPlzDFeHVHMhPz/PzlSEn0zC1ZX87xXCUAilnkJpbEujcPxf44R/3RHu3d
 y7kmCRj0RwgHpLIwzLH5POrF4il9/wVlyZFRaYBPMkj09l0WBwYvfMhlnzvAtCP8
 pRKqHkjJ1FOWQFJyn98ONqcCmm2pZ8XKW2enikAhISVXcptI/1lIQ6IIpRdTjte1
 r60CbiJ7UFL+TrVqsWBuqWQRi5u5HykPkZiCL/YYXzZmrl3zLO+0ti9YzEU8Yrzd
 K1VAB/1aK/MDrTgOI+VaqlPq79uJBwtbrflgFhFBKAKsqTpBcsZUv9/1KHthnqXV
 Y84SsY2XpoGtjn58mU6eEc+8lLTOTDVXs+ZZL4/M3maW7ygNiYY=
 =Biv4
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull more printk updates from Petr Mladek:

 - Silence warnings about candidates for ‘gnu_print’ format attribute

* tag 'printk-for-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsnprintf: Silence false positive GCC warning for va_format()
  vsnprintf: Drop unused const char fmt * in va_format()
  vsnprintf: Mark binary printing functions with __printf() attribute
  tracing: Mark binary printing functions with __printf() attribute
  seq_file: Mark binary printing functions with __printf() attribute
  seq_buf: Mark binary printing functions with __printf() attribute
2025-04-02 10:05:55 -07:00
Steven Rostedt e4d4b8670c ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
Some architectures do not have data cache coherency between user and
kernel space. For these architectures, the cache needs to be flushed on
both the kernel and user addresses so that user space can see the updates
the kernel has made.

Instead of using flush_dcache_folio() and playing with virt_to_folio()
within the call to that function, use flush_kernel_vmap_range() which
takes the virtual address and does the work for those architectures that
need it.

This also fixes a bug where the flush of the reader page only flushed one
page. If the sub-buffer order is 1 or more, where the sub-buffer size
would be greater than a page, it would miss the rest of the sub-buffer
content, as the "reader page" is not just a page, but the size of a
sub-buffer.

Link: https://lore.kernel.org/all/CAG48ez3w0my4Rwttbc5tEbNsme6tc0mrSN95thjXUFaJ3aQ6SA@mail.gmail.com/

Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/20250402144953.920792197@goodmis.org
Fixes: 117c39200d ("ring-buffer: Introducing ring-buffer mapping functions");
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:27 -04:00
Steven Rostedt 394f3f02de tracing: Use vmap_page_range() to map memmap ring buffer
The code to map the physical memory retrieved by memmap currently
allocates an array of pages to cover the physical memory and then calls
vmap() to map it to a virtual address. Instead of using this temporary
array of struct page descriptors, simply use vmap_page_range() that can
directly map the contiguous physical memory to a virtual address.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.754618481@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:27 -04:00
Steven Rostedt 34ea8fa084 tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
The reserve_mem kernel command line option may pass back a physical
address, but the memory is still part of the normal memory just like
using memblock_alloc() would be. This means that the physical memory
returned by the reserve_mem command line option can be converted directly
to virtual memory by simply using phys_to_virt().

When freeing the buffer there's no need to call vunmap() anymore as the
memory allocated by reserve_mem is freed by the call to
reserve_mem_release_by_name().

Because the persistent ring buffer can also be allocated via the memmap
option, which *is* different than normal memory as it cannot be added back
to the buddy system, it must be treated differently. It still needs to be
virtually mapped to have access to it. It also can not be freed nor can it
ever be memory mapped to user space.

Create a new trace_array flag called TRACE_ARRAY_FL_MEMMAP which gets set
if the buffer is created by the memmap option, and this will prevent the
buffer from being memory mapped by user space.

Also increment the ref count for memmap'ed buffers so that they can never
be freed.

Link: https://lore.kernel.org/all/Z-wFszhJ_9o4dc8O@kernel.org/

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.583750106@goodmis.org
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Steven Rostedt c44a14f216 tracing: Enforce the persistent ring buffer to be page aligned
Enforce that the address and the size of the memory used by the persistent
ring buffer is page aligned. Also update the documentation to reflect this
requirement.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.412882844@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Masami Hiramatsu (Google) dd941507a9 tracing: fprobe events: Fix possible UAF on modules
Commit ac91052f0a ("tracing: tprobe-events: Fix leakage of module
refcount") moved try_module_get() from __find_tracepoint_module_cb()
to find_tracepoint() caller, but that introduced a possible UAF
because the module can be unloaded before try_module_get(). In this
case, the module object should be freed too. Thus, try_module_get()
does not only fail but may access to the freed object.

To avoid that, try_module_get() in __find_tracepoint_module_cb()
again.

Link: https://lore.kernel.org/all/174342990779.781946.9138388479067729366.stgit@devnote2/

Fixes: ac91052f0a ("tracing: tprobe-events: Fix leakage of module refcount")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-02 23:18:58 +09:00
Masami Hiramatsu (Google) d24fa977ee tracing: fprobe: Fix to lock module while registering fprobe
Since register_fprobe() does not get the module reference count while
registering fgraph filter, if the target functions (symbols) are in
modules, those modules can be unloaded when registering fprobe to
fgraph.

To avoid this issue, get the reference counter of module for each
symbol, and put it after register the fprobe.

Link: https://lore.kernel.org/all/174330568792.459674.16874380163991113156.stgit@devnote2/

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250325130628.3a9e234c@gandalf.local.home/
Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-04-02 23:18:57 +09:00
Gabriele Monaco fc0585c7fa rv: Fix missing unlock on double nested monitors return path
RV doesn't support nested monitors having children monitors themselves
and exits with the EINVAL code. However, it returns without unlocking
the rv_interface_lock.

Unlock the lock before returning from the initialisation function.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250402071351.19864-2-gmonaco@redhat.com
Fixes: cb85c660fc ("rv: Add option for nested monitors and include sched")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202503310200.UBXGitB4-lkp@intel.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:26 -04:00
Steven Rostedt ea8d7647f9 tracing: Verify event formats that have "%*p.."
The trace event verifier checks the formats of trace events to make sure
that they do not point at memory that is not in the trace event itself or
in data that will never be freed. If an event references data that was
allocated when the event triggered and that same data is freed before the
event is read, then the kernel can crash by reading freed memory.

The verifier runs at boot up (or module load) and scans the print formats
of the events and checks their arguments to make sure that dereferenced
pointers are safe. If the format uses "%*p.." the verifier will ignore it,
and that could be dangerous. Cover this case as well.

Also add to the sample code a use case of "%*pbl".

Link: https://lore.kernel.org/all/bcba4d76-2c3f-4d11-baf0-02905db953dd@oracle.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Link: https://lore.kernel.org/20250327195311.2d89ec66@gandalf.local.home
Reported-by: Libo Chen <libo.chen@oracle.com>
Reviewed-by: Libo Chen <libo.chen@oracle.com>
Tested-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:26 -04:00
zhoumin 42ea22e754 ftrace: Add cond_resched() to ftrace_graph_set_hash()
When the kernel contains a large number of functions that can be traced,
the loop in ftrace_graph_set_hash() may take a lot of time to execute.
This may trigger the softlockup watchdog.

Add cond_resched() within the loop to allow the kernel to remain
responsive even when processing a large number of functions.

This matches the cond_resched() that is used in other locations of the
code that iterates over all functions that can be traced.

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Link: https://lore.kernel.org/tencent_3E06CE338692017B5809534B9C5C03DA7705@qq.com
Signed-off-by: zhoumin <teczm@foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:25 -04:00
Steven Rostedt 2c9ee74a6d tracing: Free module_delta on freeing of persistent ring buffer
If a persistent ring buffer is created, a "module_delta" array is also
allocated to hold the module deltas of loaded modules that match modules
in the scratch area. If this buffer gets freed, the module_delta array is
not freed and causes a memory leak.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250401124525.1f9ac02a@gandalf.local.home
Fixes: 35a380ddbc ("tracing: Show last module text symbols in the stacktrace")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:51:25 -04:00
Steven Rostedt b81ff11c21 ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
The option PROBE_EVENTS_BTF_ARGS enables the functions
btf_find_func_proto() and btf_get_func_param() which are used by the
function argument tracing code. The option FUNCTION_TRACE_ARGS was
dependent on the same configs that PROBE_EVENTS_BTF_ARGS was dependent on,
but it was also dependent on PROBE_EVENTS_BTF_ARGS. In fact, if
PROBE_EVENTS_BTF_ARGS is supported then FUNCTION_TRACE_ARGS is supported.

Just make FUNCTION_TRACE_ARGS depend on PROBE_EVENTS_BTF_ARGS.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250401113601.17fa1129@gandalf.local.home
Fixes: 533c20b062 ("ftrace: Add print_function_args()")
Closes: https://lore.kernel.org/all/DB9PR08MB75820599801BAD118D123D7D93AD2@DB9PR08MB7582.eurprd08.prod.outlook.com/
Reported-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Christian Loehle <Christian.Loehle@arm.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 09:50:56 -04:00
Linus Torvalds 46d29f23a7 ring-buffer updates for v6.15
- Restructure the persistent memory to have a "scratch" area
 
   Instead of hard coding the KASLR offset in the persistent memory
   by the ring buffer, push that work up to the callers of the persistent
   memory as they are the ones that need this information. The offsets
   and such is not important to the ring buffer logic and it should
   not be part of that.
 
   A scratch pad is now created when the caller allocates a ring buffer
   from persistent memory by stating how much memory it needs to save.
 
 - Allow where modules are loaded to be saved in the new scratch pad
 
   Save the addresses of modules when they are loaded into the persistent
   memory scratch pad.
 
 - A new module_for_each_mod() helper function was created
 
   With the acknowledgement of the module maintainers a new module helper
   function was created to iterate over all the currently loaded modules.
   This has a callback to be called for each module. This is needed for
   when tracing is started in the persistent buffer and the currently loaded
   modules need to be saved in the scratch area.
 
 - Expose the last boot information where the kernel and modules were loaded
 
   The last_boot_info file is updated to print out the addresses of where
   the kernel "_text" location was loaded from a previous boot, as well
   as where the modules are loaded. If the buffer is recording the current
   boot, it only prints "# Current" so that it does not expose the KASLR
   offset of the currently running kernel.
 
 - Allow the persistent ring buffer to be released (freed)
 
   To have this in production environments, where the kernel command line can
   not be changed easily, the ring buffer needs to be freed when it is not
   going to be used. The memory for the buffer will always be allocated at
   boot up, but if the system isn't going to enable tracing, the memory needs
   to be freed. Allow it to be freed and added back to the kernel memory
   pool.
 
 - Allow stack traces to print the function names in the persistent buffer
 
   Now that the modules are saved in the persistent ring buffer, if the same
   modules are loaded, the printing of the function names will examine the
   saved modules. If the module is found in the scratch area and is also
   loaded, then it will do the offset shift and use kallsyms to display the
   function name. If the address is not found, it simply displays the address
   from the previous boot in hex.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+cUERQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrAsAQCFt2nfzxoe3wtF5EqIT1VHp/8bQVjG
 gBe8B6ouboreogD/dS7yK8MRy24ZAmObGwYG0RbVicd50S7P8Rf7+823ng8=
 =OJKk
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

 - Restructure the persistent memory to have a "scratch" area

   Instead of hard coding the KASLR offset in the persistent memory by
   the ring buffer, push that work up to the callers of the persistent
   memory as they are the ones that need this information. The offsets
   and such is not important to the ring buffer logic and it should not
   be part of that.

   A scratch pad is now created when the caller allocates a ring buffer
   from persistent memory by stating how much memory it needs to save.

 - Allow where modules are loaded to be saved in the new scratch pad

   Save the addresses of modules when they are loaded into the
   persistent memory scratch pad.

 - A new module_for_each_mod() helper function was created

   With the acknowledgement of the module maintainers a new module
   helper function was created to iterate over all the currently loaded
   modules. This has a callback to be called for each module. This is
   needed for when tracing is started in the persistent buffer and the
   currently loaded modules need to be saved in the scratch area.

 - Expose the last boot information where the kernel and modules were
   loaded

   The last_boot_info file is updated to print out the addresses of
   where the kernel "_text" location was loaded from a previous boot, as
   well as where the modules are loaded. If the buffer is recording the
   current boot, it only prints "# Current" so that it does not expose
   the KASLR offset of the currently running kernel.

 - Allow the persistent ring buffer to be released (freed)

   To have this in production environments, where the kernel command
   line can not be changed easily, the ring buffer needs to be freed
   when it is not going to be used. The memory for the buffer will
   always be allocated at boot up, but if the system isn't going to
   enable tracing, the memory needs to be freed. Allow it to be freed
   and added back to the kernel memory pool.

 - Allow stack traces to print the function names in the persistent
   buffer

   Now that the modules are saved in the persistent ring buffer, if the
   same modules are loaded, the printing of the function names will
   examine the saved modules. If the module is found in the scratch area
   and is also loaded, then it will do the offset shift and use kallsyms
   to display the function name. If the address is not found, it simply
   displays the address from the previous boot in hex.

* tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Use _text and the kernel offset in last_boot_info
  tracing: Show last module text symbols in the stacktrace
  ring-buffer: Remove the unused variable bmeta
  tracing: Skip update_last_data() if cleared and remove active check for save_mod()
  tracing: Initialize scratch_size to zero to prevent UB
  tracing: Fix a compilation error without CONFIG_MODULES
  tracing: Freeable reserved ring buffer
  mm/memblock: Add reserved memory release function
  tracing: Update modules to persistent instances when loaded
  tracing: Show module names and addresses of last boot
  tracing: Have persistent trace instances save module addresses
  module: Add module_for_each_mod() function
  tracing: Have persistent trace instances save KASLR offset
  ring-buffer: Add ring_buffer_meta_scratch()
  ring-buffer: Add buffer meta data for persistent ring buffer
  ring-buffer: Use kaslr address instead of text delta
  ring-buffer: Fix bytes_dropped calculation issue
2025-03-31 13:37:22 -07:00
Linus Torvalds 01d5b167dc Modules changes for 6.15-rc1
- Use RCU instead of RCU-sched
 
   The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable()
   in the module code and its users has been replaced with just
   rcu_read_lock().
 
 - The rest of changes are smaller fixes and updates.
 
 The changes have been on linux-next for at least 2 weeks, with the RCU
 cleanup present for 2 months. One performance problem was reported with the
 RCU change when KASAN + lockdep were enabled, but it was effectively
 addressed by the already merged ee57ab5a32 ("locking/lockdep: Disable
 KASAN instrumentation of lockdep.c").
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEIduBR9MnFA82q/jtumpXJwqY6poFAmfmwrsUHHBldHIucGF2
 bHVAc3VzZS5jb20ACgkQumpXJwqY6prWxgf/S7Pvdywm10vJ6fooYa+GxXNMwhyh
 XRjZ4m9gjeTNf2KLwX0XHv0XZeFHOmHfjd3iI+pS6CXZnCFTN9J3XPLYsrTxXUb6
 U6zzLf8Zsz8TzeI4dgvSBsZln7oICSACkAgdJCq23hpNKeaeRo91dgiZaIwyZJG3
 FekqSFtP7pYhfFoNkrFKysqbgl1+RWWZ79L2qRJA0bPzVFlvRUuh6cOHQw+8RMqf
 BYLwnArjTkW8AcXpxIXSiwphDHVZ81B96xoplavyoprA5FDpv1W+8y4DtxdWFn+1
 QVWCs/ZV3KrwXWpZev625w3fIOOIXILqRINOzLfvXTw+1xFS3TzSQEpVeg==
 =4OKc
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux

Pull modules updates from Petr Pavlu:

 - Use RCU instead of RCU-sched

   The mix of rcu_read_lock(), rcu_read_lock_sched() and
   preempt_disable() in the module code and its users has
   been replaced with just rcu_read_lock()

 - The rest of changes are smaller fixes and updates

* tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits)
  MAINTAINERS: Update the MODULE SUPPORT section
  module: Remove unnecessary size argument when calling strscpy()
  module: Replace deprecated strncpy() with strscpy()
  params: Annotate struct module_param_attrs with __counted_by()
  bug: Use RCU instead RCU-sched to protect module_bug_list.
  static_call: Use RCU in all users of __module_text_address().
  kprobes: Use RCU in all users of __module_text_address().
  bpf: Use RCU in all users of __module_text_address().
  jump_label: Use RCU in all users of __module_text_address().
  jump_label: Use RCU in all users of __module_address().
  x86: Use RCU in all users of __module_address().
  cfi: Use RCU while invoking __module_address().
  powerpc/ftrace: Use RCU in all users of __module_text_address().
  LoongArch: ftrace: Use RCU in all users of __module_text_address().
  LoongArch/orc: Use RCU in all users of __module_address().
  arm64: module: Use RCU in all users of __module_text_address().
  ARM: module: Use RCU in all users of __module_text_address().
  module: Use RCU in all users of __module_text_address().
  module: Use RCU in all users of __module_address().
  module: Use RCU in search_module_extables().
  ...
2025-03-30 15:44:36 -07:00
Linus Torvalds fa593d0f96 bpf-next-6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfi6ZAACgkQ6rmadz2v
 bTpLOg/+J7xUddPMhlpFAUlifQEadE5hmw6v1tXpM3zyKHzUWJiv/qsx3j8/ckgD
 D+d4P8bqIbI9SSuIS4oZ0+D9pr/g7GYztnoYZmPiYJ7v2AijPuof5dsagFQE8E2y
 rhfbt9KHTMzzkdkTvaAZaITS/HWAoJ2YVRB6gfLex2ghcXYHcgmtKRZniQrbBiFZ
 MIXBN8Rg6HP+pUdIVllSXFcQCb3XIgjPONRAos4hr5tIm+3Ku7Jvkgk2H/9vUcoF
 bdXAcg8xygyH7eY+1l3e7nEPQlG0jUZEsL+tq+vpdoLRLqlIpAUYmwUvqcmq4dPS
 QGFjiUcpDbXlxsUFpzjXHIFto7fXCfND7HEICQPwAncdflIIfYaATSQUfkEexn0a
 wBCFlAChrEzAmg2vFl4EeEr0fdSe/3jswrgKx0m6ctKieMjgloBUeeH4fXOpfkhS
 9tvhuduVFuronlebM8ew4w9T/mBgbyxkE5KkvP4hNeB3ni3N0K6Mary5/u2HyN1e
 lqTlnZxRA4p6lrvxce/mDrR4VSwlKLcSeQVjxAL1afD5KRkuZJnUv7bUhS361vkG
 IjNrQX30EisDAz+X7tMn3ndBf9vVatwFT4+c3yaxlQRor1WofhDfT88HPiyB4QqQ
 Kdx2EHgbQxJp4vkzhp4/OXlTfkihsMEn8egzZuphdPEQ9Y+Jdwg=
 =aN/V
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:
 "For this merge window we're splitting BPF pull request into three for
  higher visibility: main changes, res_spin_lock, try_alloc_pages.

  These are the main BPF changes:

   - Add DFA-based live registers analysis to improve verification of
     programs with loops (Eduard Zingerman)

   - Introduce load_acquire and store_release BPF instructions and add
     x86, arm64 JIT support (Peilin Ye)

   - Fix loop detection logic in the verifier (Eduard Zingerman)

   - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)

   - Add kfunc for populating cpumask bits (Emil Tsalapatis)

   - Convert various shell based tests to selftests/bpf/test_progs
     format (Bastien Curutchet)

   - Allow passing referenced kptrs into struct_ops callbacks (Amery
     Hung)

   - Add a flag to LSM bpf hook to facilitate bpf program signing
     (Blaise Boscaccy)

   - Track arena arguments in kfuncs (Ihor Solodrai)

   - Add copy_remote_vm_str() helper for reading strings from remote VM
     and bpf_copy_from_user_task_str() kfunc (Jordan Rome)

   - Add support for timed may_goto instruction (Kumar Kartikeya
     Dwivedi)

   - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)

   - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
     local storage (Martin KaFai Lau)

   - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)

   - Allow retrieving BTF data with BTF token (Mykyta Yatsenko)

   - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
     (Song Liu)

   - Reject attaching programs to noreturn functions (Yafang Shao)

   - Introduce pre-order traversal of cgroup bpf programs (Yonghong
     Song)"

* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
  selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
  bpf: Fix out-of-bounds read in check_atomic_load/store()
  libbpf: Add namespace for errstr making it libbpf_errstr
  bpf: Add struct_ops context information to struct bpf_prog_aux
  selftests/bpf: Sanitize pointer prior fclose()
  selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
  selftests/bpf: test_xdp_vlan: Rename BPF sections
  bpf: clarify a misleading verifier error message
  selftests/bpf: Add selftest for attaching fexit to __noreturn functions
  bpf: Reject attaching fexit/fmod_ret to __noreturn functions
  bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
  bpf: Make perf_event_read_output accessible in all program types.
  bpftool: Using the right format specifiers
  bpftool: Add -Wformat-signedness flag to detect format errors
  selftests/bpf: Test freplace from user namespace
  libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
  bpf: Return prog btf_id without capable check
  bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
  bpf, x86: Fix objtool warning for timed may_goto
  bpf: Check map->record at the beginning of check_and_free_fields()
  ...
2025-03-30 12:43:03 -07:00
Steven Rostedt 028a58ec15 tracing: Use _text and the kernel offset in last_boot_info
Instead of using kaslr_offset() just record the location of "_text". This
makes it possible for user space to use either the System.map or
/proc/kallsyms as what to map all addresses to functions with.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250326220304.38dbedcd@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google) 35a380ddbc tracing: Show last module text symbols in the stacktrace
Since the previous boot trace buffer can include module text address in
the stacktrace. As same as the kernel text address, convert the module
text address using the module address information.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174282689201.356346.17647540360450727687.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Jiapeng Chong de48d7fff7 ring-buffer: Remove the unused variable bmeta
Variable bmeta is not effectively used, so delete it.

kernel/trace/ring_buffer.c:1952:27: warning: variable ‘bmeta’ set but not used.

Link: https://lore.kernel.org/20250317015524.3902-1-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=19524
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google) 486fbcb380 tracing: Skip update_last_data() if cleared and remove active check for save_mod()
If the last boot data is already cleared, there is no reason to update it
again. Skip if the TRACE_ARRAY_FL_LAST_BOOT is cleared.
Also, for calling save_mod() when module loading, we don't need to check
the trace is active or not because any module address can be on the
stacktrace.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174165660328.1173316.15529357882704817499.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Steven Rostedt 5dbeb56bb9 tracing: Initialize scratch_size to zero to prevent UB
In allocate_trace_buffer() the following code:

  buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
				      tr->range_addr_start,
				      tr->range_addr_size,
				      struct_size(tscratch, entries, 128));

  tscratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size);
  setup_trace_scratch(tr, tscratch, scratch_size);

Has undefined behavior if ring_buffer_alloc_range() fails because
"scratch_size" is not initialize. If the allocation fails, then
buf->buffer will be NULL. The ring_buffer_meta_scratch() will return
NULL immediately if it is passed a NULL buffer and it will not update
scratch_size. Then setup_trace_scratch() will return immediately if
tscratch is NULL.

Although there's no real issue here, but it is considered undefined
behavior to pass an uninitialized variable to a function as input, and
UBSan may complain about it.

Just initialize scratch_size to zero to make the code defined behavior and
a little more robust.

Link: https://lore.kernel.org/all/44c5deaa-b094-4852-90f9-52f3fb10e67a@stanley.mountain/

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google) f00c9201f9 tracing: Fix a compilation error without CONFIG_MODULES
There are some code which depends on CONFIG_MODULES. #ifdef
to enclose it.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174230515367.2909896.8132122175220657625.stgit@mhiramat.tok.corp.google.com
Fixes: dca91c1c5468 ("tracing: Have persistent trace instances save module addresses")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:56 -04:00
Masami Hiramatsu (Google) fb6d03238e tracing: Freeable reserved ring buffer
Make the ring buffer on reserved memory to be freeable. This allows us
to free the trace instance on the reserved memory without changing
cmdline and rebooting. Even if we can not change the kernel cmdline
for security reason, we can release the reserved memory for the ring
buffer as free (available) memory.

For example, boot kernel with reserved memory;
"reserve_mem=20M:2M:trace trace_instance=boot_mapped^traceoff@trace"

~ # free
              total        used        free      shared  buff/cache   available
Mem:        1995548       50544     1927568       14964       17436     1911480
Swap:             0           0           0
~ # rmdir /sys/kernel/tracing/instances/boot_mapped/
[   23.704023] Freeing reserve_mem:trace memory: 20476K
~ # free
              total        used        free      shared  buff/cache   available
Mem:        2016024       41844     1956740       14968       17440     1940572
Swap:             0           0           0

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/173989134814.230693.18199312930337815629.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:28 -04:00
Steven Rostedt 5f3719f697 tracing: Update modules to persistent instances when loaded
When a module is loaded and a persistent buffer is actively tracing, add
it to the list of modules in the persistent memory.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.469844721@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt 1bd25a6f71 tracing: Show module names and addresses of last boot
Add the last boot module's names and addresses to the last_boot_info file.
This only shows the module information from a previous boot. If the buffer
is started and is recording the current boot, this file still will only
show "current".

  ~# cat instances/boot_mapped/last_boot_info
  10c00000		[kernel]
  ffffffffc00ca000	usb_serial_simple
  ffffffffc00ae000	usbserial
  ffffffffc008b000	bfq

  ~# echo function > instances/boot_mapped/current_tracer
  ~# cat instances/boot_mapped/last_boot_info
  # Current

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.299186021@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt fd39e48fe8 tracing: Have persistent trace instances save module addresses
For trace instances that are mapped to persistent memory, have them use
the scratch area to save the currently loaded modules. This will allow
where the modules have been loaded on the next boot so that their
addresses can be deciphered by using where they were loaded previously.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.129741650@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt b65334825f tracing: Have persistent trace instances save KASLR offset
There's no reason to save the KASLR offset for the ring buffer itself.
That is used by the tracer. Now that the tracer has a way to save data in
the persistent memory of the ring buffer, have the tracing infrastructure
take care of the saving of the KASLR offset.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.792722274@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt 4af0a9c518 ring-buffer: Add ring_buffer_meta_scratch()
Now that there's one meta data at the start of the persistent memory used by
the ring buffer, allow the caller to request some memory right after that
data that it can use as its own persistent memory.

Also fix some white space issues with ring_buffer_alloc().

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.619631731@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:27 -04:00
Steven Rostedt 4009cc31e7 ring-buffer: Add buffer meta data for persistent ring buffer
Instead of just having a meta data at the first page of each sub buffer
that has duplicate data, add a new meta page to the entire block of memory
that holds the duplicate data and remove it from the sub buffer meta data.

This will open up the extra memory in this first page to be used by the
tracer for its own persistent data.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.446351513@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:26 -04:00
Steven Rostedt bcba8d4dbe ring-buffer: Use kaslr address instead of text delta
Instead of saving off the text and data pointers and using them to compare
with the current boot's text and data pointers, just save off the KASLR
offset. Then that can be used to figure out how to read the previous boots
buffer.

The last_boot_info will now show this offset, but only if it is for a
previous boot:

  ~# cat instances/boot_mapped/last_boot_info
  39000000	[kernel]

  ~# echo function > instances/boot_mapped/current_tracer
  ~# cat instances/boot_mapped/last_boot_info
  # Current

If the KASLR offset saved is for the current boot, the last_boot_info will
show the value of "current".

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.274956504@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:26 -04:00
Feng Yang c73f0b6964 ring-buffer: Fix bytes_dropped calculation issue
The calculation of bytes-dropped and bytes_dropped_nested is reversed.
Although it does not affect the final calculation of total_dropped,
it should still be modified.

Link: https://lore.kernel.org/20250223070106.6781-1-yangfeng59949@163.com
Fixes: 6c43e554a2 ("ring-buffer: Add ring buffer startup selftest")
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:26 -04:00
Andy Shevchenko 196a062641 tracing: Mark binary printing functions with __printf() attribute
Binary printing functions are using printf() type of format, and compiler
is not happy about them as is:

kernel/trace/trace.c:3292:9: error: function ‘trace_vbprintk’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]
kernel/trace/trace_seq.c:182:9: error: function ‘trace_seq_bprintf’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]

Fix the compilation errors by adding __printf() attribute.

While at it, move existing __printf() attributes from the implementations
to the declarations. IT also fixes incorrect attribute parameters that are
used for trace_array_printk().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250321144822.324050-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-28 13:37:11 +01:00
Linus Torvalds a7e135fe59 Probes updates for v6.15:
- probe-events: Add comments about entry data storing code to clarify
   where and how the entry data is stored for function return events.
 
 - probe-events: Log error for exceeding the number of arguments to help
   user to identify error reason via tracefs/error_log file.
 
 - selftests/ftrace: Improve the ftracetest to add followngs.
   . Expand the tprobe event test to check if it can correctly find
     the wrong format tracepoint name.
   . Add new syntax error test to check whether error_log correctly
     indicates a wrong character in the tracepoint name.
   . Add a new dynamic events argument limitation test case which checks
     max number of probe arguments.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmflTlobHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8ba9UIALzzZQxzUhJYO/B/XaCz
 KTuSrU2594cLr3nCrmCfL3UHUCr5IcjXKCfUdrgzE9mckWF+nVRXWwbp29KpOQky
 fzU9Ardbr7ksGAYFk4My+P/BeYa7vh9LwofXzWlJibANVxWvq66+GfKnWnh1P8Bl
 /zjov61DAEQmDfXNGZ2oTmKnYYMoPkJU4voRvAEUgiP02SrF04UUa00uC7hmZJJV
 aI5b6TatE5FDEmAoHWh0cf04HoyevRyLVOQd5GSswH/oWpyhs90M7a9WENHamBx6
 RXEZ3xYyuH9zBSvYVeRn2H1eHnGCR4RMctZiRGXF8EdLWo7RXRQ1Hp9CgvwDrU8Z
 IL0=
 =vZ3e
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:

 - probe-events: Add comments about entry data storing code to clarify
   where and how the entry data is stored for function return events.

 - probe-events: Log error for exceeding the number of arguments to help
   user to identify error reason via tracefs/error_log file.

 - Improve the ftracetest selftests:
    - Expand the tprobe event test to check if it can correctly find the
      wrong format tracepoint name.
    - Add new syntax error test to check whether error_log correctly
      indicates a wrong character in the tracepoint name.
    - Add a new dynamic events argument limitation test case which
      checks max number of probe arguments.

* tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: probe-events: Add comments about entry data storing code
  selftests/ftrace: Add dynamic events argument limitation test case
  selftests/ftrace: Add new syntax error test
  selftests/ftrace: Expand the tprobe event test to check wrong format
  tracing: probe-events: Log error for exceeding the number of arguments
2025-03-27 19:31:34 -07:00
Linus Torvalds 744fab2d9f tracing updates for v6.15:
- Add option traceoff_after_boot
 
   In order to debug kernel boot, it sometimes is helpful to enable tracing
   via the kernel command line. Unfortunately, by the time the login prompt
   appears, the trace is overwritten by the init process and other user space
   start up applications. Adding a "traceoff_after_boot" will disable tracing
   when the kernel passes control to init which will allow developers to be
   able to see the traces that occurred during boot.
 
 - Clean up the mmflags macros that display the GFP flags in trace events
 
   The macros to print the GFP flags for trace events had a bit of
   duplication. The code was restructured to remove duplication and in the
   process it also adds some flags that were missed before.
 
 - Removed some dead code and scripts/draw_functrace.py
 
   draw_functrace.py hasn't worked in years and as nobody complained
   about it, remove it.
 
 - Constify struct event_trigger_ops
 
   The event_trigger_ops is just a structure that has function pointers that
   are assigned when the variables are created. These variables should all be
   constants.
 
 - Other minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+V9IhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qr4RAP9JhE3n69pGuOVaJTN/LGLr2Axl59n4
 KqZSZS1nUM76/gD6AxYpR7nxyxgJ7VjNkLptS9tSjJVdPDxGAl0v3eO04w4=
 =SU30
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add option traceoff_after_boot

   In order to debug kernel boot, it sometimes is helpful to enable
   tracing via the kernel command line. Unfortunately, by the time the
   login prompt appears, the trace is overwritten by the init process
   and other user space start up applications.

   Adding a "traceoff_after_boot" will disable tracing when the kernel
   passes control to init which will allow developers to be able to see
   the traces that occurred during boot.

 - Clean up the mmflags macros that display the GFP flags in trace
   events

   The macros to print the GFP flags for trace events had a bit of
   duplication. The code was restructured to remove duplication and in
   the process it also adds some flags that were missed before.

 - Removed some dead code and scripts/draw_functrace.py

   draw_functrace.py hasn't worked in years and as nobody complained
   about it, remove it.

 - Constify struct event_trigger_ops

   The event_trigger_ops is just a structure that has function pointers
   that are assigned when the variables are created. These variables
   should all be constants.

 - Other minor clean ups and fixes

* tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Replace strncpy with memcpy for fixed-length substring copy
  tracing: Fix synth event printk format for str fields
  tracing: Do not use PERF enums when perf is not defined
  tracing: Ensure module defining synth event cannot be unloaded while tracing
  tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER
  tracing/osnoise: Fix possible recursive locking for cpus_read_lock()
  tracing: Align synth event print fmt
  tracing: gfp: vsprintf: Do not print "none" when using %pGg printf format
  tracepoint: Print the function symbol when tracepoint_debug is set
  tracing: Constify struct event_trigger_ops
  scripts/tracing: Remove scripts/tracing/draw_functrace.py
  tracing: Update MAINTAINERS file to include tracepoint.c
  tracing/user_events: Slightly simplify user_seq_show()
  tracing/user_events: Don't use %pK through printk
  tracing: gfp: Remove duplication of recording GFP flags
  tracing: Remove orphaned event_trace_printk
  ring-buffer: Fix typo in comment about header page pointer
  tracing: Add traceoff_after_boot option
2025-03-27 16:22:12 -07:00
Linus Torvalds 88221ac0d5 Latency tracing changes for v6.15:
- Add some trace events to osnoise and timerlat sample generation
 
   This adds more information to the osnoise and timerlat tracers as well as
   allows BPF programs to be attached to these locations to extract even more
   data.
 
 - Fix to DECLARE_TRACE_CONDITION() macro
 
   It wasn't used but now will be and it happened to be broken causing the
   build to fail.
 
 - Add scheduler specification monitors to runtime verifier (RV)
 
   This is a continuation of Daniel Bristot's work.
 
   RV allows monitors to run and react concurrently. Running the cumulative
   model is equivalent to running single components using the same
   reactors, with the advantage that it's easier to point out which
   specification failed in case of error.
 
   This update introduces nested monitors to RV, in short, the sysfs
   monitor folder will contain a monitor named sched, which is nothing but
   an empty container for other monitors. Controlling the sched monitor
   (enable, disable, set reactors) controls all nested monitors.
 
   The following scheduling monitors are added:
 
   * sco: scheduling context operations
       Monitor to ensure sched_set_state happens only in thread context
   * tss: task switch while scheduling
       Monitor to ensure sched_switch happens only in scheduling context
   * snroc: set non runnable on its own context
       Monitor to ensure set_state happens only in the respective task's context
   * scpd: schedule called with preemption disabled
       Monitor to ensure schedule is called with preemption disabled
   * snep: schedule does not enable preempt
       Monitor to ensure schedule does not enable preempt
   * sncid: schedule not called with interrupt disabled
       Monitor to ensure schedule is not called with interrupt disabled
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+QhuxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qg62AP9bkeNDbiCuAqjZGddV09Hw26wC3yum
 kQoNSebD8G52rQEA9GDjK37xGzYwW/fJokhJVTV39qfub6inAJE5dS6WeQY=
 =8Ikv
 -----END PGP SIGNATURE-----

Merge tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull latency tracing updates from Steven Rostedt:

 - Add some trace events to osnoise and timerlat sample generation

   This adds more information to the osnoise and timerlat tracers as
   well as allows BPF programs to be attached to these locations to
   extract even more data.

 - Fix to DECLARE_TRACE_CONDITION() macro

   It wasn't used but now will be and it happened to be broken causing
   the build to fail.

 - Add scheduler specification monitors to runtime verifier (RV)

   This is a continuation of Daniel Bristot's work.

   RV allows monitors to run and react concurrently. Running the
   cumulative model is equivalent to running single components using the
   same reactors, with the advantage that it's easier to point out which
   specification failed in case of error.

   This update introduces nested monitors to RV, in short, the sysfs
   monitor folder will contain a monitor named sched, which is nothing
   but an empty container for other monitors. Controlling the sched
   monitor (enable, disable, set reactors) controls all nested monitors.

   The following scheduling monitors are added:

     - sco: scheduling context operations
       Monitor to ensure sched_set_state happens only in thread context

     - tss: task switch while scheduling
       Monitor to ensure sched_switch happens only in scheduling context

     - snroc: set non runnable on its own context
       Monitor to ensure set_state happens only in the respective task's context

     - scpd: schedule called with preemption disabled
       Monitor to ensure schedule is called with preemption disabled

     - snep: schedule does not enable preempt
       Monitor to ensure schedule does not enable preempt

     - sncid: schedule not called with interrupt disabled
       Monitor to ensure schedule is not called with interrupt disabled

* tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tools/rv: Allow rv list to filter for container
  Documentation/rv: Add docs for the sched monitors
  verification/dot2k: Add support for nested monitors
  tools/rv: Add support for nested monitors
  rv: Add scpd, snep and sncid per-cpu monitors
  rv: Add snroc per-task monitor
  rv: Add sco and tss per-cpu monitors
  rv: Add option for nested monitors and include sched
  sched: Add sched tracepoints for RV task model
  rv: Add license identifiers to monitor files
  tracing: Fix DECLARE_TRACE_CONDITION
  trace/osnoise: Add trace events for samples
2025-03-27 16:03:52 -07:00
Linus Torvalds 31eb415bf6 ftrace changes for v6.15:
- Record function parameters for function and function graph tracers
 
   An option has been added to function tracer (func-args) and the function
   graph tracer (funcgraph-args) that when set, the tracers will record the
   registers that hold the arguments into each function event. On reading of
   the trace, it will use BTF to print those arguments. Most archs support up
   to 6 arguments (depending on the complexity of the arguments) and those
   are printed. If a function has more arguments then what was recorded, the
   output will end with " ... )".
 
     Example of function graph tracer:
 
     6)              | dummy_xmit [dummy](skb = 0x8887c100, dev = 0x872ca000) {
     6)              |   consume_skb(skb = 0x8887c100) {
     6)              |     skb_release_head_state(skb = 0x8887c100) {
     6)  0.178 us    |       sock_wfree(skb = 0x8887c100)
     6)  0.627 us    |     }
 
 - The rest of the changes are minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+M9vRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrBGAP9NTn4Lpci69n8FJGfz+UKUEKbzOOeg
 QrkIZIxbm8N56gEA0RaUionWjt1znZXUdBTsE3h+en/Ik0//VZytQcmJBwM=
 =fxuG
 -----END PGP SIGNATURE-----

Merge tag 'ftrace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace updates from Steven Rostedt:

 - Record function parameters for function and function graph tracers

   An option has been added to function tracer (func-args) and the
   function graph tracer (funcgraph-args) that when set, the tracers
   will record the registers that hold the arguments into each function
   event. On reading of the trace, it will use BTF to print those
   arguments. Most archs support up to 6 arguments (depending on the
   complexity of the arguments) and those are printed.

   If a function has more arguments then what was recorded, the output
   will end with " ... )".

   Example of function graph tracer:

	6)              | dummy_xmit [dummy](skb = 0x8887c100, dev = 0x872ca000) {
	6)              |   consume_skb(skb = 0x8887c100) {
	6)              |     skb_release_head_state(skb = 0x8887c100) {
	6)  0.178 us    |       sock_wfree(skb = 0x8887c100)
	6)  0.627 us    |     }

 - The rest of the changes are minor clean ups and fixes

* tag 'ftrace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Use hashtable.h for event_hash
  tracing: Fix use-after-free in print_graph_function_flags during tracer switching
  function_graph: Remove the unused variable func
  ftrace: Add arguments to function tracer
  ftrace: Have funcgraph-args take affect during tracing
  ftrace: Add support for function argument to graph tracer
  ftrace: Add print_function_args()
  ftrace: Have ftrace_free_filter() WARN and exit if ops is active
  fgraph: Correct typo in ftrace_return_to_handler comment
2025-03-27 15:57:29 -07:00
Linus Torvalds dd161f74f8 tracing and sorttable updates for 6.15:
- Implement arm64 build time sorting of the mcount location table
 
   When gcc is used to build arm64, the mcount_loc section is all zeros in
   the vmlinux elf file. The addresses are stored in the Elf_Rela location.
   To sort at build time, an array is allocated and the addresses are added
   to it via the content of the mcount_loc section as well as he Elf_Rela
   data. After sorting, the information is put back into the Elf_Rela which
   now has the section sorted.
 
 - Make sorting of mcount location table for arm64 work with clang as well
 
   When clang is used, the mcount_loc section contains the addresses, unlike
   the gcc build. An array is still created and the sorting works for both
   methods.
 
 - Remove weak functions from the mcount_loc section
 
   Have the sorttable code pass in the data of functions defined via nm -S
   which shows the functions as well as their sizes. Using this information
   the sorttable code can determine if a function in the mcount_loc section
   was weak and overridden. If the function is not found, it is set to be
   zero. On boot, when the mcount_loc section is read and the ftrace table is
   created, if the address in the mcount_loc is not in the kernel core text
   then it is removed and not added to the ftrace_filter_functions (the
   functions that can be attached by ftrace callbacks).
 
 - Update and fix the reporting of how much data is used for ftrace functions
 
   On boot, a report of how many pages were used by the ftrace table as well
   as how they were grouped (the table holds a list of sections that are
   groups of pages that were able to be allocated). The removing of the weak
   functions required the accounting to be updated.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+MnThQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qivsAQDhPOCaONai7rvHX9T1aOHGjdajZ7SI
 qoZgBOsc2ZUkoQD/U2M/m7Yof9aR4I+VFKtT5NsAwpfqPSOL/t/1j6UEOQ8=
 =45AV
 -----END PGP SIGNATURE-----

Merge tag 'trace-sorttable-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing / sorttable updates from Steven Rostedt:

 - Implement arm64 build time sorting of the mcount location table

   When gcc is used to build arm64, the mcount_loc section is all zeros
   in the vmlinux elf file. The addresses are stored in the Elf_Rela
   location.

   To sort at build time, an array is allocated and the addresses are
   added to it via the content of the mcount_loc section as well as he
   Elf_Rela data. After sorting, the information is put back into the
   Elf_Rela which now has the section sorted.

 - Make sorting of mcount location table for arm64 work with clang as
   well

   When clang is used, the mcount_loc section contains the addresses,
   unlike the gcc build. An array is still created and the sorting works
   for both methods.

 - Remove weak functions from the mcount_loc section

   Have the sorttable code pass in the data of functions defined via
   'nm -S' which shows the functions as well as their sizes. Using this
   information the sorttable code can determine if a function in the
   mcount_loc section was weak and overridden. If the function is not
   found, it is set to be zero. On boot, when the mcount_loc section is
   read and the ftrace table is created, if the address in the
   mcount_loc is not in the kernel core text then it is removed and not
   added to the ftrace_filter_functions (the functions that can be
   attached by ftrace callbacks).

 - Update and fix the reporting of how much data is used for ftrace
   functions

   On boot, a report of how many pages were used by the ftrace table as
   well as how they were grouped (the table holds a list of sections
   that are groups of pages that were able to be allocated). The
   removing of the weak functions required the accounting to be updated.

* tag 'trace-sorttable-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  scripts/sorttable: Allow matches to functions before function entry
  scripts/sorttable: Use normal sort if theres no relocs in the mcount section
  ftrace: Check against is_kernel_text() instead of kaslr_offset()
  ftrace: Test mcount_loc addr before calling ftrace_call_addr()
  ftrace: Have ftrace pages output reflect freed pages
  ftrace: Update the mcount_loc check of skipped entries
  scripts/sorttable: Zero out weak functions in mcount_loc table
  scripts/sorttable: Always use an array for the mcount_loc sorting
  scripts/sorttable: Have mcount rela sort use direct values
  arm64: scripts/sorttable: Implement sorting mcount_loc at boot for arm64
2025-03-27 15:44:34 -07:00
Masami Hiramatsu (Google) bb9c6020f4 tracing: probe-events: Add comments about entry data storing code
Add comments about entry data storing code to __store_entry_arg() and
traceprobe_get_entry_data_size(). These are a bit complicated because of
building the entry data storing code and scanning it.

This just add comments, no behavior change.

Link: https://lore.kernel.org/all/174061715004.501424.333819546601401102.stgit@devnote2/

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250226102223.586d7119@gandalf.local.home/
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-03-27 21:19:54 +09:00
Masami Hiramatsu (Google) 57faaa0480 tracing: probe-events: Log error for exceeding the number of arguments
Add error message when the number of arguments exceeds the limitation.

Link: https://lore.kernel.org/all/174055075075.4079315.10916648136898316476.stgit@mhiramat.tok.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-27 21:19:54 +09:00
Linus Torvalds 054570267d lsm/stable-6.15 PR 20250323
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmfgWgMUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNW5RAAvCDq5gBtY0aTNlULe637EVLSh+t8
 PkSzHzu/NlzU6BfjtwSm2fuML8welTGxSwUPxUzMCI91gPdkGeFktefavT3xa+QI
 BHWROn7fEJ/KmRZvngPeIkgLr5xhF5nBJmc/Jw71qem20zRzNgJnpzMX16d10Phx
 dxd2xOO1qM3bv6Z9RcIssZRGaN+PHngpWWg+0B69XuaBUso87S6NDyKNn1XPmvoz
 as96k+Wk/xAZGVEeCbs/+H5rBx6DLg+FfTRa06Oh4BFsqedpkDPxLrTgCJGJkA0H
 dsK6O/993zvjx0Jn4ZPoJ9n35S82BmkCsz4bGq1xVl6FYUiMcm3/8yO41wllS+w4
 j+RlTU/RIdB7n8EKyMMl1hj1stTvt3Bi9F5Cbf7ZEv0snfR00K4KVpi17jnFjUHv
 kpOiEtXZb/NGQip7UAuUq0PisfqbiO4jJurYHRetDgv1WCy6+C8ufM5t6I+cnvmG
 VG+dlxcW+rDIn6bLRVuGi9TJRsQ6eox9ipa+qEKNNiOXgftELcgT7m74nAS5m0uv
 n5rDa221nPXecEB0X7d6YUFk711lly90dbelNeLrmv1w6jl8L1PpS1oBaW+UzGu9
 46eGBd6pzu9otvK9WVyDEdotDOCrgH0sd7pTetqDhLJZ7KrGwyyqO2gD/JroUKcC
 lnxBQwPnat86iI8=
 =oxfV
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Various minor updates to the LSM Rust bindings

   Changes include marking trivial Rust bindings as inlines and comment
   tweaks to better reflect the LSM hooks.

 - Add LSM/SELinux access controls to io_uring_allowed()

   Similar to the io_uring_disabled sysctl, add a LSM hook to
   io_uring_allowed() to enable LSMs a simple way to enforce security
   policy on the use of io_uring. This pull request includes SELinux
   support for this new control using the io_uring/allowed permission.

 - Remove an unused parameter from the security_perf_event_open() hook

   The perf_event_attr struct parameter was not used by any currently
   supported LSMs, remove it from the hook.

 - Add an explicit MAINTAINERS entry for the credentials code

   We've seen problems in the past where patches to the credentials code
   sent by non-maintainers would often languish on the lists for
   multiple months as there was no one explicitly tasked with the
   responsibility of reviewing and/or merging credentials related code.

   Considering that most of the code under security/ has a vested
   interest in ensuring that the credentials code is well maintained,
   I'm volunteering to look after the credentials code and Serge Hallyn
   has also volunteered to step up as an official reviewer. I posted the
   MAINTAINERS update as a RFC to LKML in hopes that someone else would
   jump up with an "I'll do it!", but beyond Serge it was all crickets.

 - Update Stephen Smalley's old email address to prevent confusion

   This includes a corresponding update to the mailmap file.

* tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  mailmap: map Stephen Smalley's old email addresses
  lsm: remove old email address for Stephen Smalley
  MAINTAINERS: add Serge Hallyn as a credentials reviewer
  MAINTAINERS: add an explicit credentials entry
  cred,rust: mark Credential methods inline
  lsm,rust: reword "destroy" -> "release" in SecurityCtx
  lsm,rust: mark SecurityCtx methods inline
  perf: Remove unnecessary parameter of security check
  lsm: fix a missing security_uring_allowed() prototype
  io_uring,lsm,selinux: add LSM hooks for io_uring_setup()
  io_uring: refactor io_uring_allowed()
2025-03-25 15:44:19 -07:00
Siddarth G e0344f9564 tracing: Replace strncpy with memcpy for fixed-length substring copy
checkpatch.pl reports the following warning:
WARNING: Prefer strscpy, strscpy_pad, or __nonstring over strncpy
(see: https://github.com/KSPP/linux/issues/90)

In synth_field_string_size(), replace strncpy() with memcpy() to copy 'len'
characters from 'start' to 'buf'. The code manually adds a NUL terminator
after the copy, making memcpy safe here.

Link: https://lore.kernel.org/20250325181232.38284-1-siddarthsgml@gmail.com
Signed-off-by: Siddarth G <siddarthsgml@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-25 17:44:32 -04:00