mirror-linux/kernel/trace
Steven Rostedt 90f9f5d64c tracing: Fix crash on synthetic stacktrace field usage
When creating a synthetic event based on an existing synthetic event that
had a stacktrace field and the new synthetic event used that field a
kernel crash occurred:

 ~# cd /sys/kernel/tracing
 ~# echo 's:stack unsigned long stack[];' > dynamic_events
 ~# echo 'hist:keys=prev_pid:s0=common_stacktrace if prev_state & 3' >> events/sched/sched_switch/trigger
 ~# echo 'hist:keys=next_pid:s1=$s0:onmatch(sched.sched_switch).trace(stack,$s1)' >> events/sched/sched_switch/trigger

The above creates a synthetic event that takes a stacktrace when a task
schedules out in a non-running state and passes that stacktrace to the
sched_switch event when that task schedules back in. It triggers the
"stack" synthetic event that has a stacktrace as its field (called "stack").

 ~# echo 's:syscall_stack s64 id; unsigned long stack[];' >> dynamic_events
 ~# echo 'hist:keys=common_pid:s2=stack' >> events/synthetic/stack/trigger
 ~# echo 'hist:keys=common_pid:s3=$s2,i0=id:onmatch(synthetic.stack).trace(syscall_stack,$i0,$s3)' >> events/raw_syscalls/sys_exit/trigger

The above makes another synthetic event called "syscall_stack" that
attaches the first synthetic event (stack) to the sys_exit trace event and
records the stacktrace from the stack event with the id of the system call
that is exiting.

When enabling this event (or using it in a historgram):

 ~# echo 1 > events/synthetic/syscall_stack/enable

Produces a kernel crash!

 BUG: unable to handle page fault for address: 0000000000400010
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP PTI
 CPU: 6 UID: 0 PID: 1257 Comm: bash Not tainted 6.16.3+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.16.3-1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:trace_event_raw_event_synth+0x90/0x380
 Code: c5 00 00 00 00 85 d2 0f 84 e1 00 00 00 31 db eb 34 0f 1f 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 <49> 8b 04 24 48 83 c3 01 8d 0c c5 08 00 00 00 01 cd 41 3b 5d 40 0f
 RSP: 0018:ffffd2670388f958 EFLAGS: 00010202
 RAX: ffff8ba1065cc100 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: fffff266ffda7b90 RDI: ffffd2670388f9b0
 RBP: 0000000000000010 R08: ffff8ba104e76000 R09: ffffd2670388fa50
 R10: ffff8ba102dd42e0 R11: ffffffff9a908970 R12: 0000000000400010
 R13: ffff8ba10a246400 R14: ffff8ba10a710220 R15: fffff266ffda7b90
 FS:  00007fa3bc63f740(0000) GS:ffff8ba2e0f48000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000400010 CR3: 0000000107f9e003 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ? __tracing_map_insert+0x208/0x3a0
  action_trace+0x67/0x70
  event_hist_trigger+0x633/0x6d0
  event_triggers_call+0x82/0x130
  trace_event_buffer_commit+0x19d/0x250
  trace_event_raw_event_sys_exit+0x62/0xb0
  syscall_exit_work+0x9d/0x140
  do_syscall_64+0x20a/0x2f0
  ? trace_event_raw_event_sched_switch+0x12b/0x170
  ? save_fpregs_to_fpstate+0x3e/0x90
  ? _raw_spin_unlock+0xe/0x30
  ? finish_task_switch.isra.0+0x97/0x2c0
  ? __rseq_handle_notify_resume+0xad/0x4c0
  ? __schedule+0x4b8/0xd00
  ? restore_fpregs_from_fpstate+0x3c/0x90
  ? switch_fpu_return+0x5b/0xe0
  ? do_syscall_64+0x1ef/0x2f0
  ? do_fault+0x2e9/0x540
  ? __handle_mm_fault+0x7d1/0xf70
  ? count_memcg_events+0x167/0x1d0
  ? handle_mm_fault+0x1d7/0x2e0
  ? do_user_addr_fault+0x2c3/0x7f0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The reason is that the stacktrace field is not labeled as such, and is
treated as a normal field and not as a dynamic event that it is.

In trace_event_raw_event_synth() the event is field is still treated as a
dynamic array, but the retrieval of the data is considered a normal field,
and the reference is just the meta data:

// Meta data is retrieved instead of a dynamic array
  str_val = (char *)(long)var_ref_vals[val_idx];

// Then when it tries to process it:
  len = *((unsigned long *)str_val) + 1;

It triggers a kernel page fault.

To fix this, first when defining the fields of the first synthetic event,
set the filter type to FILTER_STACKTRACE. This is used later by the second
synthetic event to know that this field is a stacktrace. When creating
the field of the new synthetic event, have it use this FILTER_STACKTRACE
to know to create a stacktrace field to copy the stacktrace into.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260122194824.6905a38e@gandalf.local.home
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-23 13:34:21 -05:00
..
rv rv: Convert to use __free 2025-12-02 07:28:32 +01:00
Kconfig tracing updates for v6.19: 2025-12-05 09:51:37 -08:00
Makefile ftrace: Allow tracing of some of the tracing code 2025-11-26 15:13:30 -05:00
blktrace.c tracing updates for v6.19: 2025-12-05 09:51:37 -08:00
bpf_trace.c bpf: Fix verifier assumptions of bpf_d_path's output buffer 2025-12-10 01:34:04 -08:00
bpf_trace.h
error_report-traces.c
fgraph.c tracing: Fix typo in fpgraph.c 2025-12-05 15:43:39 -05:00
fprobe.c tracing fixes for v6.19: 2025-12-06 13:49:40 -08:00
ftrace.c ftrace: Do not over-allocate ftrace memory 2026-01-15 10:17:53 -05:00
ftrace_internal.h
kprobe_event_gen_test.c
pid_list.c trace/pid_list: optimize pid_list->lock contention 2025-11-13 15:15:54 -05:00
pid_list.h trace/pid_list: optimize pid_list->lock contention 2025-11-13 15:15:54 -05:00
power-traces.c PM: cpufreq: powernv/tracing: Move powernv_throttle trace event 2025-07-21 16:40:56 -04:00
preemptirq_delay_test.c kernel: trace: preemptirq_delay_test: use offstack cpu mask 2025-07-08 18:17:38 -04:00
rethook.c
ring_buffer.c ring-buffer: Avoid softlockup in ring_buffer_resize() during memory free 2026-01-07 14:52:22 -05:00
ring_buffer_benchmark.c tracing: Fix typo in ring_buffer_benchmark.c 2025-12-05 15:43:40 -05:00
rpm-traces.c
synth_event_gen_test.c
trace.c trace: ftrace_dump_on_oops[] is not exported, make it static 2026-01-07 14:52:22 -05:00
trace.h ftrace fixes for v6.19: 2025-12-05 10:13:04 -08:00
trace_benchmark.c
trace_benchmark.h
trace_boot.c
trace_branch.c tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field 2025-05-09 15:19:10 -04:00
trace_btf.c
trace_btf.h
trace_clock.c
trace_dynevent.c tracing: Report wrong dynamic event command 2025-11-10 19:26:14 -05:00
trace_dynevent.h tracing: probes: Fix a possible race in trace_probe_log APIs 2025-05-13 22:23:34 +09:00
trace_entries.h function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously 2025-11-26 15:13:30 -05:00
trace_eprobe.c Probes for v6.19 2025-12-05 10:55:47 -08:00
trace_event_perf.c
trace_events.c tracing: Drop unneeded assignment to soft_mode 2026-01-07 14:52:22 -05:00
trace_events_filter.c tracing: Fix typo in trace_events_filter.c 2025-12-05 15:43:40 -05:00
trace_events_filter_test.h
trace_events_hist.c tracing: Fix crash on synthetic stacktrace field usage 2026-01-23 13:34:21 -05:00
trace_events_inject.c
trace_events_synth.c tracing: Fix crash on synthetic stacktrace field usage 2026-01-23 13:34:21 -05:00
trace_events_trigger.c tracing: Fix typo in trace_events_trigger.c 2025-12-05 15:43:41 -05:00
trace_events_user.c tracing: Fix multiple typos in trace_events_user.c 2025-12-05 15:43:41 -05:00
trace_export.c
trace_fprobe.c tracing updates for v6.19: 2025-12-05 09:51:37 -08:00
trace_functions.c tracing: Have function tracer define options per instance 2025-11-12 09:59:54 -05:00
trace_functions_graph.c ftrace fixes for v6.19: 2025-12-05 10:13:04 -08:00
trace_hwlat.c tracing: Replace opencoded cpumask_next_wrap() in move_to_next_cpu() 2025-07-08 18:17:29 -04:00
trace_irqsoff.c tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_kdb.c tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_kprobe.c tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_kprobe_selftest.c
trace_kprobe_selftest.h
trace_mmiotrace.c tracing/mmiotrace: Remove reference to unused per CPU data pointer 2025-05-08 09:36:09 -04:00
trace_nop.c
trace_osnoise.c tracing: Fix multiple typos in trace_osnoise.c 2025-12-05 15:43:41 -05:00
trace_output.c tracing updates for v6.19: 2025-12-05 09:51:37 -08:00
trace_output.h tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_preemptirq.c
trace_printk.c
trace_probe.c tracing fixes for v6.19: 2025-12-06 13:49:40 -08:00
trace_probe.h tracing: probes: Use __free() for trace_probe_log 2025-11-01 01:10:28 +09:00
trace_probe_kernel.h
trace_probe_tmpl.h
trace_recursion_record.c
trace_sched_switch.c tracing: Ensure optimized hashing works 2025-09-30 17:27:58 -04:00
trace_sched_wakeup.c tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_selftest.c
trace_selftest_dynamic.c
trace_seq.c tracing: Fix typo in trace_seq.c 2025-12-05 15:43:41 -05:00
trace_stack.c tracing updates for v6.16: 2025-05-29 21:04:36 -07:00
trace_stat.c
trace_stat.h
trace_synth.h
trace_syscalls.c tracing: Hide __NR_utimensat and _NR_mq_timedsend when not defined 2025-11-10 14:23:53 -05:00
trace_uprobe.c tracing: uprobe: eprobes: Allocate traceprobe_parse_context per probe 2025-11-01 01:10:29 +09:00
tracing_map.c tracing: Use vmalloc_array() to improve code 2025-09-23 09:31:58 -04:00
tracing_map.h