Commit Graph

50379 Commits (1c172febdf065375359b2b95156e476bfee30b60)

Author SHA1 Message Date
Linus Torvalds b83a8ff87a tracing fixes for v6.19:
- Fix a crash with passing a stacktrace between synthetic events
 
   A synthetic event is an event that combines two events into a single event
   that can display fields from both events as well as the time delta that
   took place between the events. It can also pass a stacktrace from the
   first event so that it can be displayed by the synthetic event (this is
   useful to get a stacktrace of a task scheduling out when blocked and
   recording the time it was blocked for).
 
   A synthetic event can also connect an existing synthetic event to another
   event. An issue was found that if the first synthetic event had a stacktrace
   as one of its fields, and that stacktrace field was passed to the new
   synthetic event to be displayed, it would crash the kernel. This was due to
   the stacktrace not being saved as a stacktrace but was still marked as one.
   When the stacktrace was read, it would try to read an array but instead read
   the integer metadata of the stacktrace and dereferenced a bad value.
 
   Fix this by saving the stacktrace field as a stracktrace.
 
 - Fix possible overflow in cmp_mod_entry() compare function
 
   A binary search is used to find a module address and if the addresses are
   greater than 2GB apart it could lead to truncation and cause a bad search
   result. Use normal compares instead of a subtraction between addresses to
   calculate the compare value.
 
 - Fix output of entry arguments in function graph tracer
 
   Depending on the configurations enabled, the entry can be two different
   types that hold the argument array. The macro FGRAPH_ENTRY_ARGS() is used
   to find the correct arguments from the given type. One location was missed
   and still referenced the arguments directly via entry->args and could
   produce the wrong value depending on how the kernel was configured.
 
 - Fix memory leak in scripts/tracepoint-update build tool
 
   If the array fails to allocate, the memory for the values needs to be
   freed and was not. Free the allocated values if the array failed to
   allocate.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaXUQLxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgsJAQDgtWH9DWUkJKgzXTkiOA0l8JArPOVf
 tCSMla2wWJA70QD/as2ptacYAFU9v1oxO5YIgsKOLFBF68ZUIhJtvXpqtAE=
 =JeC6
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix a crash with passing a stacktrace between synthetic events

   A synthetic event is an event that combines two events into a single
   event that can display fields from both events as well as the time
   delta that took place between the events. It can also pass a
   stacktrace from the first event so that it can be displayed by the
   synthetic event (this is useful to get a stacktrace of a task
   scheduling out when blocked and recording the time it was blocked
   for).

   A synthetic event can also connect an existing synthetic event to
   another event. An issue was found that if the first synthetic event
   had a stacktrace as one of its fields, and that stacktrace field was
   passed to the new synthetic event to be displayed, it would crash the
   kernel. This was due to the stacktrace not being saved as a
   stacktrace but was still marked as one. When the stacktrace was read,
   it would try to read an array but instead read the integer metadata
   of the stacktrace and dereferenced a bad value.

   Fix this by saving the stacktrace field as a stacktrace.

 - Fix possible overflow in cmp_mod_entry() compare function

   A binary search is used to find a module address and if the addresses
   are greater than 2GB apart it could lead to truncation and cause a
   bad search result. Use normal compares instead of a subtraction
   between addresses to calculate the compare value.

 - Fix output of entry arguments in function graph tracer

   Depending on the configurations enabled, the entry can be two
   different types that hold the argument array. The macro
   FGRAPH_ENTRY_ARGS() is used to find the correct arguments from the
   given type. One location was missed and still referenced the
   arguments directly via entry->args and could produce the wrong value
   depending on how the kernel was configured.

 - Fix memory leak in scripts/tracepoint-update build tool

   If the array fails to allocate, the memory for the values needs to be
   freed and was not. Free the allocated values if the array failed to
   allocate.

* tag 'trace-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  scripts/tracepoint-update: Fix memory leak in add_string() on failure
  function_graph: Fix args pointer mismatch in print_graph_retval()
  tracing: Avoid possible signed 64-bit truncation
  tracing: Fix crash on synthetic stacktrace field usage
2026-01-24 17:18:57 -08:00
Linus Torvalds 12a0094839 Misc fixes:
- Fix auxiliary timekeeper update & locking bug
 
  - Reduce the sensitivity of the clocksource watchdog, to
    fix false positive measurements that marked the
    TSC clocksource unstable.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml0ksERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gvtRAAvLWNMb9YPW/65Gn8dkyMQUPzXBjEaq8A
 yP4L1b8EujoM6fSeQC0Y367hpn1GKHhEGZyj9ksRcU4dsU5XWlzPZr9QXCETmMuh
 ffTCvrUGI6d95685S+R1VplmhoCQAkerQAFcPQDAQd0QgfoEJO+hf2AWHrilnicu
 gCcZGDE/+gLAPYjR7LaRu7vb0W6VtwqhXvz8xTCGALmMlU84BDT3deLzCmujxtbF
 PvNaShBAtppm468Ln6HY2mk4mN5kWthPnonNF4n0zVYy8uAHLEEUERr/LndZ60Ua
 KlFgKukfoPXyJoU0M0umNcX6oXaRw7DeyNcPtJovZUwtfXyjkTPWrcfZ4sD3r37K
 QWjFqmbTCtj70vlUFP2RiHusOmNkuzcWKww5KdpA+HoeXEI4zcjhZq7zObyjDPIZ
 t0Cs5sZoWWpL7o53ikMjsO2Fe/zSDRaocYyImCWh2U+DdBn3/fh8a0pboXQakujx
 kjmuDrHaLXFNMI9h7NvlP143IW8g7AHUpu0piDGLVFFkZoNcII/8g7qawemQw8T9
 ZCUmL3oq1Zu0z3aGq9GRFz31ysVLXwDZdtY8CCuHxgVTuZQQnRNrLiNiTjZn75E/
 PY63jtSgKNJsAOTHJZ5hnyvcGb8w05anU0T7M38kTJFtiX4R6JaaDJVmj3eFG3g8
 es9cQ4gJGmo=
 =O1ly
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:

 - Fix auxiliary timekeeper update & locking bug

 - Reduce the sensitivity of the clocksource watchdog,
   to fix false positive measurements that marked the
   TSC clocksource unstable

* tag 'timers-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Reduce watchdog readout delay limit to prevent false positives
  timekeeping: Adjust the leap state for the correct auxiliary timekeeper
2026-01-24 09:36:03 -08:00
Linus Torvalds af5a3fae86 Miscellaneous scheduler fixes:
- Fix PELT clock synchronization bug when entering idle
 
  - Disable the NEXT_BUDDY feature, as during extensive testing
    Mel found that the negatives outweigh the positives.
 
  - Make wakeup preemption less aggressive, which resulted in
    an unreasonable increase in preemption frequency.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml0kYYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j/Bw/+M5UK6EOPzLTs0Tj87X/kI7vXLN/uf9+K
 FpDtFNmoJnJKrQxPFfa8aT8OAz7zLfnjrGmRgxOG1fFkACMuB8s+TO4+i9lhzAsF
 gHOg6lDczi4K14mkTyiP/Bdaf3lZThghZitdRCBZFN4gjBpAh1rRI4ikQ0a3pwJH
 6y1NF8fX/xRBm6Bgprgbv/em+fKsO89i6NwunPtYaxHsOGA5U0FVPFSa3yghz+X2
 7Sl4U8LRdkU3Z62M+I8DKWuAMMb9nLwEXrSiNy+KJHIJ7FNFb1+U10gPrLq+z2Oi
 hvd25pzDzGUgOglZ7f1IRL5H48putMjCqv+A2wOZnzl7fHt4c02hpTBqMDle5hEP
 DYHyNNC1ZIRQ1zuy8j33LzB10ycP6pX9nawt+S1trEZcDVwaKwq9TbYsBjbJKtmZ
 V7C181fTUaNQ6M4wJY3gx+hD1ocrLv+Y0vhXJnK7sg+j4IhWJ7Yk6ne9hrpXbbUj
 cK8gzvo4OkShYtMlB9ut/RIiuyvFThyJd9rcNkeN4tTM9DVFURvOlIK2W6U70Yty
 OJDkhI6KBmkERPQoa43OsGvxZwF+g0KY8ISsLq3bJ2vLyATPdpJ2ns+zRg6I+B+q
 zHZmI86Fc8gr+O/y9oqi2VJhgio34RcoSId/QcAPC1Gl4TaIcmqFFTSzMhI+kLk9
 Ng7vHy/3KQg=
 =pGnx
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

 - Fix PELT clock synchronization bug when entering idle

 - Disable the NEXT_BUDDY feature, as during extensive testing
   Mel found that the negatives outweigh the positives

 - Make wakeup preemption less aggressive, which resulted in
   an unreasonable increase in preemption frequency

* tag 'sched-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Revert force wakeup preemption
  sched/fair: Disable scheduler feature NEXT_BUDDY
  sched/fair: Fix pelt clock sync when entering idle
2026-01-24 09:29:41 -08:00
Linus Torvalds ceaeaf66a2 Two perf events fixes:
- Fix mmap_count warning & bug when creating a group member event
    with the PERF_FLAG_FD_OUTPUT flag.
 
  - Disable the sample period == 1 branch events BTS optimization
    on guests, because BTS is not virtualized.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml0kF8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1icZxAAhsYBY0uB3OzAgJYRIVlw/vzF7ubHh8+r
 PhQ+azap/uSk/dS8IXZ0FrP/vq9kgqEFpsYwWbB1EFRxB24nd8AAYarASyc22Dhb
 Qe6tKh4hzlDC9Cw0vJ111PgKJLiPDrmkxzS4C2HkPwYwriNql81hQjLW5nF6dHwg
 i8aP7+/uNB2h/xnPIRgkUFSUzBacRz1Snqgi/vSHmwEhk1GFI48rXoYywM9ItAnR
 CGN7tGxJ45YwDYeknZf9Ngsd3q+/eh38ihPfMEKoulf4oFvhIsiG4rJ1Vee0V9fD
 aD1NhukULtjw3lkKnMM75W+Jdb7fHDTOuxzUov9WpyIOIUTMqRkbvaKHgJzSD2SK
 01TbP3kQixhGhiRXx78GDQwYGjX8JQxngbqyJvL708GNvxbcKYrLhqtr5Ho00pWH
 ERxx/ajoDXB7Neo7XPhgYRHk/lrlnZsK1LoidhMzN1UX6C12VnhPcD+zUSqRxz5w
 yFuJp0+7wF+G74FpO7kv8jv5KDB/lery5mdzWu0kqAKMfYWdw5eGoaI6T48DVoBy
 IwNYU8bxDBRPzu64XudDE4xBuTuy4HpJmbvOUkxMUkBGMTZ+nysqHu8D7cJJqYtL
 2xYJOpR+P4Se9fRyR9xo5vtTZy27TQqTp+AjA5RlIoVOJhYK5T9mAFCJaEhdupbM
 2IA4xdYhbb8=
 =lNSW
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events fixes from Ingo Molnar:

 - Fix mmap_count warning & bug when creating a group member event
   with the PERF_FLAG_FD_OUTPUT flag

 - Disable the sample period == 1 branch events BTS optimization
   on guests, because BTS is not virtualized

* tag 'perf-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Do not enable BTS for guests
  perf: Fix refcount warning on event->mmap_count increment
2026-01-24 09:24:17 -08:00
Donglin Peng c9703d17d2 function_graph: Fix args pointer mismatch in print_graph_retval()
When funcgraph-args and funcgraph-retaddr are both enabled, many kernel
functions display invalid parameters in trace logs.

The issue occurs because print_graph_retval() passes a mismatched args
pointer to print_function_args(). Fix this by retrieving the correct
args pointer using the FGRAPH_ENTRY_ARGS() macro.

Link: https://patch.msgid.link/20260112021601.1300479-1-dolinux.peng@gmail.com
Fixes: f83ac7544f ("function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-23 13:34:38 -05:00
Ian Rogers 00f13e28a9 tracing: Avoid possible signed 64-bit truncation
64-bit truncation to 32-bit can result in the sign of the truncated
value changing. The cmp_mod_entry is used in bsearch and so the
truncation could result in an invalid search order. This would only
happen were the addresses more than 2GB apart and so unlikely, but
let's fix the potentially broken compare anyway.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260108002625.333331-1-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-23 13:34:30 -05:00
Steven Rostedt 90f9f5d64c tracing: Fix crash on synthetic stacktrace field usage
When creating a synthetic event based on an existing synthetic event that
had a stacktrace field and the new synthetic event used that field a
kernel crash occurred:

 ~# cd /sys/kernel/tracing
 ~# echo 's:stack unsigned long stack[];' > dynamic_events
 ~# echo 'hist:keys=prev_pid:s0=common_stacktrace if prev_state & 3' >> events/sched/sched_switch/trigger
 ~# echo 'hist:keys=next_pid:s1=$s0:onmatch(sched.sched_switch).trace(stack,$s1)' >> events/sched/sched_switch/trigger

The above creates a synthetic event that takes a stacktrace when a task
schedules out in a non-running state and passes that stacktrace to the
sched_switch event when that task schedules back in. It triggers the
"stack" synthetic event that has a stacktrace as its field (called "stack").

 ~# echo 's:syscall_stack s64 id; unsigned long stack[];' >> dynamic_events
 ~# echo 'hist:keys=common_pid:s2=stack' >> events/synthetic/stack/trigger
 ~# echo 'hist:keys=common_pid:s3=$s2,i0=id:onmatch(synthetic.stack).trace(syscall_stack,$i0,$s3)' >> events/raw_syscalls/sys_exit/trigger

The above makes another synthetic event called "syscall_stack" that
attaches the first synthetic event (stack) to the sys_exit trace event and
records the stacktrace from the stack event with the id of the system call
that is exiting.

When enabling this event (or using it in a historgram):

 ~# echo 1 > events/synthetic/syscall_stack/enable

Produces a kernel crash!

 BUG: unable to handle page fault for address: 0000000000400010
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP PTI
 CPU: 6 UID: 0 PID: 1257 Comm: bash Not tainted 6.16.3+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.16.3-1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:trace_event_raw_event_synth+0x90/0x380
 Code: c5 00 00 00 00 85 d2 0f 84 e1 00 00 00 31 db eb 34 0f 1f 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 <49> 8b 04 24 48 83 c3 01 8d 0c c5 08 00 00 00 01 cd 41 3b 5d 40 0f
 RSP: 0018:ffffd2670388f958 EFLAGS: 00010202
 RAX: ffff8ba1065cc100 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: fffff266ffda7b90 RDI: ffffd2670388f9b0
 RBP: 0000000000000010 R08: ffff8ba104e76000 R09: ffffd2670388fa50
 R10: ffff8ba102dd42e0 R11: ffffffff9a908970 R12: 0000000000400010
 R13: ffff8ba10a246400 R14: ffff8ba10a710220 R15: fffff266ffda7b90
 FS:  00007fa3bc63f740(0000) GS:ffff8ba2e0f48000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000400010 CR3: 0000000107f9e003 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ? __tracing_map_insert+0x208/0x3a0
  action_trace+0x67/0x70
  event_hist_trigger+0x633/0x6d0
  event_triggers_call+0x82/0x130
  trace_event_buffer_commit+0x19d/0x250
  trace_event_raw_event_sys_exit+0x62/0xb0
  syscall_exit_work+0x9d/0x140
  do_syscall_64+0x20a/0x2f0
  ? trace_event_raw_event_sched_switch+0x12b/0x170
  ? save_fpregs_to_fpstate+0x3e/0x90
  ? _raw_spin_unlock+0xe/0x30
  ? finish_task_switch.isra.0+0x97/0x2c0
  ? __rseq_handle_notify_resume+0xad/0x4c0
  ? __schedule+0x4b8/0xd00
  ? restore_fpregs_from_fpstate+0x3c/0x90
  ? switch_fpu_return+0x5b/0xe0
  ? do_syscall_64+0x1ef/0x2f0
  ? do_fault+0x2e9/0x540
  ? __handle_mm_fault+0x7d1/0xf70
  ? count_memcg_events+0x167/0x1d0
  ? handle_mm_fault+0x1d7/0x2e0
  ? do_user_addr_fault+0x2c3/0x7f0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The reason is that the stacktrace field is not labeled as such, and is
treated as a normal field and not as a dynamic event that it is.

In trace_event_raw_event_synth() the event is field is still treated as a
dynamic array, but the retrieval of the data is considered a normal field,
and the reference is just the meta data:

// Meta data is retrieved instead of a dynamic array
  str_val = (char *)(long)var_ref_vals[val_idx];

// Then when it tries to process it:
  len = *((unsigned long *)str_val) + 1;

It triggers a kernel page fault.

To fix this, first when defining the fields of the first synthetic event,
set the filter type to FILTER_STACKTRACE. This is used later by the second
synthetic event to know that this field is a stacktrace. When creating
the field of the new synthetic event, have it use this FILTER_STACKTRACE
to know to create a stacktrace field to copy the stacktrace into.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260122194824.6905a38e@gandalf.local.home
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-23 13:34:21 -05:00
Vincent Guittot 15257cc2f9 sched/fair: Revert force wakeup preemption
This agressively bypasses run_to_parity and slice protection with the
assumpiton that this is what waker wants but there is no garantee that
the wakee will be the next to run. It is a better choice to use
yield_to_task or WF_SYNC in such case.

This increases the number of resched and preemption because a task becomes
quickly "ineligible" when it runs; We update the task vruntime periodically
and before the task exhausted its slice or at least quantum.

Example:
2 tasks A and B wake up simultaneously with lag = 0. Both are
eligible. Task A runs 1st and wakes up task C. Scheduler updates task
A's vruntime which becomes greater than average runtime as all others
have a lag == 0 and didn't run yet. Now task A is ineligible because
it received more runtime than the other task but it has not yet
exhausted its slice nor a min quantum. We force preemption, disable
protection but Task B will run 1st not task C.

Sidenote, DELAY_ZERO increases this effect by clearing positive lag at
wake up.

Fixes: e837456fdc ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260123102858.52428-1-vincent.guittot@linaro.org
2026-01-23 11:53:20 +01:00
Mel Gorman 4f70f106bc sched/fair: Disable scheduler feature NEXT_BUDDY
NEXT_BUDDY was disabled with the introduction of EEVDF and enabled again
after NEXT_BUDDY was rewritten for EEVDF by commit e837456fdc ("sched/fair:
Reimplement NEXT_BUDDY to align with EEVDF goals"). It was not expected
that this would be a universal win without a crystal ball instruction
but the reported regressions are a concern [1][2] even if gains were
also reported. Specifically;

o mysql with client/server running on different servers regresses
o specjbb reports lower peak metrics
o daytrader regresses

The mysql is realistic and a concern. It needs to be confirmed if
specjbb is simply shifting the point where peak performance is measured
but still a concern. daytrader is considered to be representative of a
real workload.

Access to test machines is currently problematic for verifying any fix to
this problem. Disable NEXT_BUDDY for now by default until the root causes
are addressed.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Link: https://lore.kernel.org/lkml/4b96909a-f1ac-49eb-b814-97b8adda6229@arm.com [1]
Link: https://lore.kernel.org/lkml/ec3ea66f-3a0d-4b5a-ab36-ce778f159b5b@linux.ibm.com [2]
Link: https://patch.msgid.link/fyqsk63pkoxpeaclyqsm5nwtz3dyejplr7rg6p74xwemfzdzuu@7m7xhs5aqpqw
2026-01-23 11:53:19 +01:00
Vincent Guittot 98c88dc8a1 sched/fair: Fix pelt clock sync when entering idle
Samuel and Alex reported regressions of the util_avg of RT rq with
commit 17e3e88ed0 ("sched/fair: Fix pelt lost idle time detection").
It happens that fair is updating and syncing the pelt clock with task one
when pick_next_task_fair() fails to pick a task but before the prev
scheduling class got a chance to update its pelt signals.

Move update_idle_rq_clock_pelt() in set_next_task_idle() which is called
after prev class has been called.

Fixes: 17e3e88ed0 ("sched/fair: Fix pelt lost idle time detection")
Closes: https://lore.kernel.org/all/CAG2KctpO6VKS6GN4QWDji0t92_gNBJ7HjjXrE+6H+RwRXt=iLg@mail.gmail.com/
Closes: https://lore.kernel.org/all/8cf19bf0e0054dcfed70e9935029201694f1bb5a.camel@mediatek.com/
Reported-by: Samuel Wu <wusamuel@google.com>
Reported-by: Alex Hoh <Alex.Hoh@mediatek.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Samuel Wu <wusamuel@google.com>
Tested-by: Alex Hoh <Alex.Hoh@mediatek.com>
Link: https://patch.msgid.link/20260121163317.505635-1-vincent.guittot@linaro.org
2026-01-21 17:46:08 +01:00
Will Rosenberg d06bf78e55 perf: Fix refcount warning on event->mmap_count increment
When calling refcount_inc(&event->mmap_count) inside perf_mmap_rb(), the
following warning is triggered:

        refcount_t: addition on 0; use-after-free.
        WARNING: lib/refcount.c:25

PoC:

    struct perf_event_attr attr = {0};
    int fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
    mmap(NULL, 0x3000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    int victim = syscall(__NR_perf_event_open, &attr, 0, -1, fd,
                         PERF_FLAG_FD_OUTPUT);
    mmap(NULL, 0x3000, PROT_READ | PROT_WRITE, MAP_SHARED, victim, 0);

This occurs when creating a group member event with the flag
PERF_FLAG_FD_OUTPUT. The group leader should be mmap-ed and then mmap-ing
the event triggers the warning.

Since the event has copied the output_event in perf_event_set_output(),
event->rb is set. As a result, perf_mmap_rb() calls
refcount_inc(&event->mmap_count) when event->mmap_count = 0.

Disallow the case when event->mmap_count = 0. This also prevents two
events from updating the same user_page.

Fixes: 448f97fba9 ("perf: Convert mmap() refcounts to refcount_t")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Rosenberg <whrosenb@asu.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260119184956.801238-1-whrosenb@asu.edu
2026-01-21 16:28:58 +01:00
Thomas Gleixner c06343be0b clocksource: Reduce watchdog readout delay limit to prevent false positives
The "valid" readout delay between the two reads of the watchdog is larger
than the valid delta between the resulting watchdog and clocksource
intervals, which results in false positive watchdog results.

Assume TSC is the clocksource and HPET is the watchdog and both have a
uncertainty margin of 250us (default). The watchdog readout does:

  1) wdnow = read(HPET);
  2) csnow = read(TSC);
  3) wdend = read(HPET);

The valid window for the delta between #1 and #3 is calculated by the
uncertainty margins of the watchdog and the clocksource:

   m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 750us for the TSC/HPET case.

The actual interval comparison uses a smaller margin:

   m = watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 500us for the TSC/HPET case.

That means the following scenario will trigger the watchdog:

 Watchdog cycle N:

 1)       wdnow[N] = read(HPET);
 2)       csnow[N] = read(TSC);
 3)       wdend[N] = read(HPET);

Assume the delay between #1 and #2 is 100us and the delay between #1 and

 Watchdog cycle N + 1:

 4)       wdnow[N + 1] = read(HPET);
 5)       csnow[N + 1] = read(TSC);
 6)       wdend[N + 1] = read(HPET);

If the delay between #4 and #6 is within the 750us margin then any delay
between #4 and #5 which is larger than 600us will fail the interval check
and mark the TSC unstable because the intervals are calculated against the
previous value:

    wd_int = wdnow[N + 1] - wdnow[N];
    cs_int = csnow[N + 1] - csnow[N];

Putting the above delays in place this results in:

    cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us);
 -> cs_int = wd_int + 510us;

which is obviously larger than the allowed 500us margin and results in
marking TSC unstable.

Fix this by using the same margin as the interval comparison. If the delay
between two watchdog reads is larger than that, then the readout was either
disturbed by interconnect congestion, NMIs or SMIs.

Fixes: 4ac1dd3245 ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin")
Reported-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/
Link: https://patch.msgid.link/87bjjxc9dq.ffs@tglx
2026-01-21 11:33:11 +01:00
Linus Torvalds c25f2fb1f4 17 hotfixes. 12 are cc:stable, 16 are for MM.
- A 4 patch series from David Hildenbrand which fixes a few things
   realted to hugetlb PMD sharing
 
 - The remainder are singletons, please see their changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaW/vEQAKCRDdBJ7gKXxA
 jtMcAQCK1tKnINRmVjY3UJCqZMAaXvOdOoUIgHDaTXD/DWKm9AD9HRwWzYB4+TNr
 k/Te8F33d418LcMBTW9CLhrplQpaIAI=
 =+d1A
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:

 - A patch series from David Hildenbrand which fixes a few things
   related to hugetlb PMD sharing

 - The remainder are singletons, please see their changelogs for details

* tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: restore per-memcg proactive reclaim with !CONFIG_NUMA
  mm/kfence: fix potential deadlock in reboot notifier
  Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode
  mm: do not copy page tables unnecessarily for VM_UFFD_WP
  mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
  mm/rmap: fix two comments related to huge_pmd_unshare()
  mm/hugetlb: fix two comments related to huge_pmd_unshare()
  mm/hugetlb: fix hugetlb_pmd_shared()
  mm: remove unnecessary and incorrect mmap lock assert
  x86/kfence: avoid writing L1TF-vulnerable PTEs
  mm/vma: do not leak memory when .mmap_prepare swaps the file
  migrate: correct lock ordering for hugetlb file folios
  panic: only warn about deprecated panic_print on write access
  fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
  mm: take into account mm_cid size for mm_struct static definitions
  mm: rename cpu_bitmap field to flexible_array
  mm: add missing static initializer for init_mm::mm_cid.lock
2026-01-20 13:32:16 -08:00
Linus Torvalds c03e9c42ae dma-mapping fixes for Linux 6.19
- minor fixes for the corner cases of the SWIOTLB pool management (Robin Murphy)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaW+NbgAKCRCJp1EFxbsS
 RKU3AQCIpNwIYN6evmnTXO/Y+AFRjsrb1pE1cUyHn5QTY4/o4wEA6BiSgMCrZEbv
 HTEs1ZSHgLOwBwZre1Z11icGg3BkRgY=
 =6fBC
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fixes from Marek Szyprowski:

 - minor fixes for the corner cases of the SWIOTLB pool management
   (Robin Murphy)

* tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma/pool: Avoid allocating redundant pools
  mm_zone: Generalise has_managed_dma()
  dma/pool: Improve pool lookup
2026-01-20 10:16:18 -08:00
Thomas Weißschuh e806f7dde8 timekeeping: Adjust the leap state for the correct auxiliary timekeeper
When __do_ajdtimex() was introduced to handle adjtimex for any
timekeeper, this reference to tk_core was not updated. When called on an
auxiliary timekeeper, the core timekeeper would be updated incorrectly.

This gets caught by the lock debugging diagnostics because the
timekeepers sequence lock gets written to without holding its
associated spinlock:

WARNING: include/linux/seqlock.h:226 at __do_adjtimex+0x394/0x3b0, CPU#2: test/125
aux_clock_adj (kernel/time/timekeeping.c:2979)
__do_sys_clock_adjtime (kernel/time/posix-timers.c:1161 kernel/time/posix-timers.c:1173)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)

Update the correct auxiliary timekeeper.

Fixes: 775f71ebed ("timekeeping: Make do_adjtimex() reusable")
Fixes: ecf3e70304 ("timekeeping: Provide adjtimex() for auxiliary clocks")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260120-timekeeper-auxclock-leapstate-v1-1-5b358c6b3cfd@linutronix.de
2026-01-20 10:18:53 +01:00
Gal Pressman 90f3c12324 panic: only warn about deprecated panic_print on write access
The panic_print_deprecated() warning is being triggered on both read and
write operations to the panic_print parameter.

This causes spurious warnings when users run 'sysctl -a' to list all
sysctl values, since that command reads /proc/sys/kernel/panic_print and
triggers the deprecation notice.

Modify the handlers to only emit the deprecation warning when the
parameter is actually being set:

 - sysctl_panic_print_handler(): check 'write' flag before warning.
 - panic_print_get(): remove the deprecation call entirely.

This way, users are only warned when they actively try to use the
deprecated parameter, not when passively querying system state.

Link: https://lkml.kernel.org/r/20260106163321.83586-1-gal@nvidia.com
Fixes: ee13240cd7 ("panic: add note that panic_print sysctl interface is deprecated")
Fixes: 2683df6539 ("panic: add note that 'panic_print' parameter is deprecated")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Cc: Feng Tang <feng.tang@linux.alibaba.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-19 12:30:01 -08:00
Linus Torvalds 6f32aa9161 cgroup: Another fix for v6.19-rc5
- Add Chen Ridong as cpuset reviewer.
 
 - Add SPDX license identifiers to cgroup files that were missing them.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaW0ToA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGel7AQD3eTbiR/OU0i5yJj6YZsIdVYcteqqsWBkDFnZ3
 9pZZugEA+KCwi5eQz+cryKZ7y5fU5g5O6f4OP/lfLEcSmFDpfQU=
 =QHwy
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.19-rc5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - Add Chen Ridong as cpuset reviewer

 - Add SPDX license identifiers to cgroup files that were missing them

* tag 'cgroup-for-6.19-rc5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  kernel: cgroup: Add LGPL-2.1 SPDX license ID to legacy_freezer.c
  kernel: cgroup: Add SPDX-License-Identifier lines
  MAINTAINERS: Add Chen Ridong as cpuset reviewer
2026-01-18 14:30:27 -08:00
Linus Torvalds b671c1dad2 Fix the update_needs_ipi() check in the hrtimer code that
may result in incorrect skipping of hrtimer IPIs.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmlsr+oRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iLPw/+OuA8hPlFY8nxrpVKASoiP7u+mGQ82Qmf
 xdf1Kimpmc5ydpGsxpRLKY19l3fLuGSJuOa2Url0Ro52JHxU+yPPOkN2ONjsPxLI
 mFMdhyXiS1PIgYt307OmNmKFadxdg5cGJTKY0wIS9JCcPFxI5Y3NFhWZ4ybPlGHK
 /bxtsjUctekUTDVahqP65lRIYMvjhO1I2LQfoPQyg/C+2n6Owy7TAX8xZovDZLHz
 a9RkkX6FPc8aR8Kj1VYm1yABNlUhGvFvTE6wfcfOMjiHHeDv2qYdwNHF2tK2eoKC
 QX609zBTCT/n0ioZlikLrKDZitj6uRgS2blBc+hodHEs4cY5gE2umWSU4IirT7HH
 GDjvG/vns/BLRr97gPdMaJ+N/sjdCSQ5tsDRrvHwCGC27VXPu3O3DYU128eH4eXW
 +KNJGQHJvR7slBKElQ4wxJaLhbLBB5rw/AqqdXa4PU8LqvMBBNcLZjHnrYlbD5Ei
 ovdX5VtRYOjrVl7pCNrXXdzFfBpXK+sOPHQMoJxnLHKXO+cdsZz8zIB7AHZEVe20
 CIOMRcfgw2nvpS+0MqaHzre0ixyq+U2XSIu7MRIlq8GZ+zdjiVMqx3pPa19QRdyr
 hA9JupeRC9eZdymlms2MiSzqIVmWSVvgcyhIFSYw9yLe55vXcGyArF27RKGtBHRA
 uF+K5aT8gqA=
 =CAdu
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix the update_needs_ipi() check in the hrtimer code that may result
  in incorrect skipping of hrtimer IPIs"

* tag 'timers-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Fix softirq base check in update_needs_ipi()
2026-01-18 10:56:32 -08:00
Linus Torvalds 837c8180e3 Misc DL scheduler fixes, mainly for a new category of bugs that were
discovered and fixed recently:
 
  - Fix a race condition in the DL server
 
  - Fix a DL server bug which can result in incorrectly going
    idle when there's work available
 
  - Fix DL server bug which triggers a WARN() due to broken
    get_prio_dl() logic and subsequent misbehavior
 
  - Fix double update_rq_clock() calls
 
  - Fix setscheduler() assumption about static priorities
 
  - Make sure balancing callbacks are always called
 
  - Plus a handful of preparatory commits for the fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmlsrfIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hgUA/9HQi51sxF/TlhKeoj9Z0zSO/L5Fbd7d5O
 ucsr+exoU7B+X95AnYC64pV3YlTNibD5y6HR4zpPzhlmEdFw8K9hTmDeSOF2cB0T
 dPdEY2KtvNoQwVwytPYKMsXtBTcJtZZqOec9BgGBzSIpjPm6Zhxr1J2/lKFy/udA
 00aqNuJwfQW94dxvyPf+jPXbm0JW9HHZGNuuOXkkmZpkH+HIzBDrTWcAjHUaZt9H
 doPyEfO08dg3Uu4OxNBtxx94X7bQsn5RU1plSEc+xF7zynqwEIwDbelFftpHiM+z
 UqfFgkzCV8A8o/72BfZVhQgaspMZ+M4XSsaGBbouOTg3Rgx8gRKC23s8/xhDJSpy
 ayMWhuR5bsI04GD1xZVOwLvUi8oDIgAFIAKRI6gcW5eQELlejii0bAvs9Jw9/OjI
 QhhrX54sJCPPKMZpKO2n+AzQvv9a6+/p7ikiA2U5dR4wBEERFUw8ffRguHwSm+BU
 U1M+t+2tw+UiC7yML0zvIC3HdyfBhj33ZeW7PWyvFU0vjGm78U6NjqliAxbqW21l
 GdtQUpi9aXCtijXqHhLFvBtWGmqowj9UQiHroC0+nq1BsTpBDe575vwJNYF2K9he
 ht0hZqreUIf7xTsJHXlLFbtFxPQw0xAKzWJWUswaaqO8eS2JcWJbmHUAsHQuxmJB
 MbkJAlU7j/w=
 =DgiW
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Misc deadline scheduler fixes, mainly for a new category of bugs that
  were discovered and fixed recently:

   - Fix a race condition in the DL server

   - Fix a DL server bug which can result in incorrectly going idle when
     there's work available

   - Fix DL server bug which triggers a WARN() due to broken
     get_prio_dl() logic and subsequent misbehavior

   - Fix double update_rq_clock() calls

   - Fix setscheduler() assumption about static priorities

   - Make sure balancing callbacks are always called

   - Plus a handful of preparatory commits for the fixes"

* tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Use ENQUEUE_MOVE to allow priority change
  sched: Deadline has dynamic priority
  sched: Audit MOVE vs balance_callbacks
  sched: Fold rq-pin swizzle into __balance_callbacks()
  sched/deadline: Avoid double update_rq_clock()
  sched/deadline: Ensure get_prio_dl() is up-to-date
  sched/deadline: Fix server stopping with runnable tasks
  sched: Provide idle_rq() helper
  sched/deadline: Fix potential race in dl_add_task_root_domain()
  sched/deadline: Remove unnecessary comment in dl_add_task_root_domain()
2026-01-18 10:17:40 -08:00
Linus Torvalds b62ce2547f Power management fixes for 6.19-rc6
- Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)
 
  - Fix stale description of the cost field in struct em_perf_state to
    reflect the current code (Yaxiong Tian)
 
  - Fix and revamp the energy model YNL specification added recently
    along with the energy model netlink interface (Changwoo Min)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmlqWS8SHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1BhAH/iSA/9YP9/xIW+rIZj2irsncGVc/hJ+V
 8PcM2tar966AEQBlmDPc7ug2MXT0Mb/5WRtQo/IvZ1tdAALB+bC70FHjdu3y7SAX
 IzFGyHKJ9OVJHGxYq0TCBbRXgYubUZiqvAnaBTBZ/ZIFlt3B4KEyatfFfxJMb1Pe
 H9zQIyUVBN1tjPfgNVbc22XGfkwfmmla72le6GmHseKZRqpwutIwzxm1PoMNVeLb
 fQv625LsWrDD9NU6j5uGq3WEq3xPltubtHN545FTFi4aGwBYJaG1FuIi+kPiQuCR
 VZmxTWVEiVwjttiZf/xWbOkPfwQqdgeXlTk5j+iBlF1jkrrW75IuLYQ=
 =eJ2p
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix an error path memory leak in the energy model management
  code, fix a kerneldoc comment in it, and fix and revamp the energy
  model YNL specification added recently along with the new energy model
  management netlink interface (that received feedback after being
  added):

   - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)

   - Fix stale description of the cost field in struct em_perf_state to
     reflect the current code (Yaxiong Tian)

   - Fix and revamp the energy model YNL specification added recently
     along with the energy model netlink interface (Changwoo Min)"

* tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: EM: Add dump to get-perf-domains in the EM YNL spec
  PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
  PM: EM: Rename em.yaml to dev-energymodel.yaml
  PM: EM: Fix yamllint warnings in the EM YNL spec
  PM: EM: Fix memory leak in em_create_pd() error path
  PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16 12:08:19 -08:00
Linus Torvalds 7a2c1b27cd printk fixup for 6.19 rc6
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmlqGMYbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyT0UP/3Wn4tm0h0n3XeyujQEA
 IPNisJCczF6aWIAuwccRigR8hQFriPNvkZ5AhMGtotZJrY3uUoe1/8aF+XIdqnl/
 Yb1434AGwVNIpSaap+vtEclHrLDmzZH+Z75/FTWQwldM/hPkPWJI8fsEuRLqBZsn
 v520NBFtrQVcOZKKNy0npBnHsC0DsAmqoZuOvLTx0mx5AyE029CfPbDMZuVnSNix
 KjZ4U5KL0qDs2LIMpdB/mqprydGkHdogdIbrPK3WtzStVgNbi9VmnV19ZwbUlXJM
 rYPbtbQg3htwuspgR+yM6O21qsthRf2qZF5+2/a929IzOBsD/qAXQbbxQWVpF7Qb
 ELYXNV4N5hqm9EW8WeOOpLKUUG7k0fRPf81X/07uGVafPMQKQJ8kFNgLBkBFR4ya
 RAMNxTPHbHQvVaLcRujxZXoC4Wh3ZTunQXpIouy0p9dKOzbsCAj0ZqeqDa09UsaW
 rCEm50p/Pd1csML8a9A/2nNoWjQzuSVmML7F6obGCOWaW6p21GhSKHzqqDIjBab9
 3wxhpllVeYRYS2yhkKjOPJkQKXo3idIdpieLpW8IVbJvp/gQgmeKjWdhnvNvUan9
 hyCzfI8OZXVz0vItBfWsoX44+6UtpLHd4o16aDYjyDItflJOPuQbWzky+icT2R3x
 8B5xPC08tmEGitf3miv2EGq9
 =d/xV
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fix from Petr Mladek:

 - Prevent softlockup by restoring IRQs in atomic flush after each
   record

* tag 'printk-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk/nbcon: Restore IRQ in atomic flush after each emitted record
2026-01-16 09:46:59 -08:00
Rafael J. Wysocki d51e68b700 Merge branch 'pm-em'
Merge fixes related to the energy model management for 6.19-rc6:

 - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)

 - Fix stale description of the cost field in struct em_perf_state to
   reflect the current code (Yaxiong Tian)

 - Fix and revamp the energy model YNL specification added recently
   along with the energy model netlink interface (Changwoo Min)

* pm-em:
  PM: EM: Add dump to get-perf-domains in the EM YNL spec
  PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
  PM: EM: Rename em.yaml to dev-energymodel.yaml
  PM: EM: Fix yamllint warnings in the EM YNL spec
  PM: EM: Fix memory leak in em_create_pd() error path
  PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16 16:16:24 +01:00
Tim Bird 84697bf553 kernel: cgroup: Add LGPL-2.1 SPDX license ID to legacy_freezer.c
Add an appropriate SPDX-License-Identifier line to the file,
and remove the GNU boilerplate text.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-01-15 22:03:15 -10:00
Tim Bird a1b3421a02 kernel: cgroup: Add SPDX-License-Identifier lines
Add GPL-2.0 SPDX license id lines to a few old
files, replacing the reference to the COPYING file.

The COPYING file at the time of creation of these files
(2007 and 2005) was GPL-v2.0, with an additional clause
indicating that only v2 applied.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-01-15 22:03:09 -10:00
Tim Bird 983d014aaf kernel: modules: Add SPDX license identifier to kmod.c
Add a GPL-2.0 license identifier line for this file.

kmod.c was originally introduced in the kernel in February
of 1998 by Linus Torvalds - who was familiar with kernel
licensing at the time this was introduced.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-15 16:58:28 -08:00
Linus Torvalds 88e490913f ftrace fixes for v6.19:
- Fix allocation accounting on boot up
 
   The ftrace records for each function that ftrace can attach to is
   done in a group of pages. At boot up, the number of pages are
   calculated and allocated. After that, the pages are filled with data.
   It may allocate more than needed due to some functions not being
   recorded (because they are unused weak functions), this too is
   recorded.
 
   After the data is filled in, a check is made to make sure the right
   number of pages were allocated. But this was off due to the
   assumption that the same number of entries fit per every page.
   Because the size of an entry does not evenly divide into PAGE_SIZE,
   there is a rounding error when a large number of pages is allocated
   to hold the events. This causes the check to fail and triggers a
   warning.
 
   Fix the accounting by finding out how many pages are actually
   allocated from the functions that allocate them and use that to see
   if all the pages allocated were used and the ones not used are
   properly freed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaWlyoxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoZoAQDuPOaly/RM9cS70zngDo/UsZmi2lQL
 RxuMR7sPVMwkrAEAp/gnpJenSRxCzaO4EdQYUYI28/ihkdNGp3KG8Q2MNw0=
 =v899
 -----END PGP SIGNATURE-----

Merge tag 'ftrace-v6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace fix from Steven Rostedt:

 - Fix allocation accounting on boot up

   The ftrace records for each function that ftrace can attach to is
   done in a group of pages. At boot up, the number of pages are
   calculated and allocated. After that, the pages are filled with data.
   It may allocate more than needed due to some functions not being
   recorded (because they are unused weak functions), this too is
   recorded.

   After the data is filled in, a check is made to make sure the right
   number of pages were allocated. But this was off due to the
   assumption that the same number of entries fit per every page.
   Because the size of an entry does not evenly divide into PAGE_SIZE,
   there is a rounding error when a large number of pages is allocated
   to hold the events. This causes the check to fail and triggers a
   warning.

   Fix the accounting by finding out how many pages are actually
   allocated from the functions that allocate them and use that to see
   if all the pages allocated were used and the ones not used are
   properly freed.

* tag 'ftrace-v6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Do not over-allocate ftrace memory
2026-01-15 15:13:05 -08:00
Peter Zijlstra 627cc25f84 sched/deadline: Use ENQUEUE_MOVE to allow priority change
Pierre reported hitting balance callback warnings for deadline tasks
after commit 6455ad5346 ("sched: Move sched_class::prio_changed()
into the change pattern").

It turns out that DEQUEUE_SAVE+ENQUEUE_RESTORE does not preserve DL
priority and subsequently trips a balance pass -- where one was not
expected.

From discussion with Juri and Luca, the purpose of this clause was to
deal with tasks new to DL and all those sites will have MOVE set (as
well as CLASS, but MOVE is move conservative at this point).

Per the previous patches MOVE is audited to always run the balance
callbacks, so switch enqueue_dl_entity() to use MOVE for this case.

Fixes: 6455ad5346 ("sched: Move sched_class::prio_changed() into the change pattern")
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260114130528.GB831285@noisy.programming.kicks-ass.net
2026-01-15 21:57:53 +01:00
Peter Zijlstra e008ec6c79 sched: Deadline has dynamic priority
While FIFO/RR have static priority, DEADLINE is a dynamic priority
scheme. Notably it has static priority -1. Do not assume the priority
doesn't change for deadline tasks just because the static priority
doesn't change.

This ensures DL always sees {DE,EN}QUEUE_MOVE where appropriate.

Fixes: ff77e46853 ("sched/rt: Fix PI handling vs. sched_setscheduler()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260114130528.GB831285@noisy.programming.kicks-ass.net
2026-01-15 21:57:53 +01:00
Peter Zijlstra 53439363c0 sched: Audit MOVE vs balance_callbacks
The {DE,EN}QUEUE_MOVE flag indicates a task is allowed to change
priority, which means there could be balance callbacks queued.

Therefore audit all MOVE users and make sure they do run balance
callbacks before dropping rq-lock.

Fixes: 6455ad5346 ("sched: Move sched_class::prio_changed() into the change pattern")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260114130528.GB831285@noisy.programming.kicks-ass.net
2026-01-15 21:57:53 +01:00
Peter Zijlstra 49041e87f9 sched: Fold rq-pin swizzle into __balance_callbacks()
Prepare for more users needing the rq-pin swizzle.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260114130528.GB831285@noisy.programming.kicks-ass.net
2026-01-15 21:57:52 +01:00
Peter Zijlstra 4de9ff7606 sched/deadline: Avoid double update_rq_clock()
When setup_new_dl_entity() is called from enqueue_task_dl() ->
enqueue_dl_entity(), the rq-clock should already be updated, and
calling update_rq_clock() again is not right.

Move the update_rq_clock() to the one other caller of
setup_new_dl_entity(): sched_init_dl_server().

Fixes: 9f239df555 ("sched/deadline: Initialize dl_servers after SMP")
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Link: https://patch.msgid.link/20260113115622.GA831285@noisy.programming.kicks-ass.net
2026-01-15 21:57:52 +01:00
Peter Zijlstra 375410bb9a sched/deadline: Ensure get_prio_dl() is up-to-date
Pratheek tripped a WARN and noted the following issue:

> Inspecting the set of events that led to the warning being triggered
> showed the following:
>
>     systemd-1  [008] dN.31 ...: do_set_cpus_allowed: set_cpus_allowed begin!
>
>     systemd-1  [008] dN.31 ...: sched_change_begin: Begin!
>     systemd-1  [008] dN.31 ...: sched_change_begin: Before dequeue_task()!
>     systemd-1  [008] dN.31 ...: update_curr_dl_se: update_curr_dl_se: ENQUEUE_REPLENISH
>     systemd-1  [008] dN.31 ...: enqueue_dl_entity: enqueue_dl_entity: ENQUEUE_REPLENISH
>     systemd-1  [008] dN.31 ...: replenish_dl_entity: Replenish before: 14815760217
>     systemd-1  [008] dN.31 ...: replenish_dl_entity: Replenish after: 14816960047
>     systemd-1  [008] dN.31 ...: sched_change_begin: Before put_prev_task()!
>
>     systemd-1  [008] dN.31 ...: sched_change_end: Before enqueue_task()!
>     systemd-1  [008] dN.31 ...: sched_change_end: Before put_prev_task()!
>     systemd-1  [008] dN.31 ...: prio_changed_dl: Queuing pull task on prio change: 14815760217 -> 14816960047
>     systemd-1  [008] dN.31 ...: prio_changed_dl: Queuing balance callback!
>     systemd-1  [008] dN.31 ...: sched_change_end: End!
>
>     systemd-1  [008] dN.31 ...: do_set_cpus_allowed: set_cpus_allowed end!
>     systemd-1  [008] dN.21 ...: __schedule: Woops! Balance callback found!
>
> 1. sched_change_begin() from guard(sched_change) in
>    do_set_cpus_allowed() stashes the priority, which for the deadline
>    task, is "p->dl.deadline".
> 2. The dequeue of the deadline task replenishes the deadline.
> 3. The task is enqueued back after guard's scope ends and since there is
>    no *_CLASS flags set, sched_change_end() calls
>    dl_sched_class->prio_changed() which compares the deadline.
> 4. Since deadline was moved on dequeue, prio_changed_dl() sees the value
>    differ from the stashed value and queues a balance pull callback.
> 5. do_set_cpus_allowed() finishes and drops the rq_lock without doing a
>    do_balance_callbacks().
> 6. Grabbing the rq_lock() at subsequent __schedule() triggers the
>    warning since the balance pull callback was never executed before
>    dropping the lock.

Meaning get_prio_dl() ought to update current and return an up-to-date
value.

Fixes: 6455ad5346 ("sched: Move sched_class::prio_changed() into the change pattern")
Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260106104113.GX3707891@noisy.programming.kicks-ass.net
2026-01-15 21:57:52 +01:00
Linus Torvalds 13b2d15d99 32 hotfixes. 16 are cc:stable, 24 are for MM.
- four kerneldoc fixes from Bagas Sanjaya
 
 - four DAMON fixes from SeongJae
 
 - four mremap VMA-related fixes from Lorenzo
 
 - various singletons - please see the changelogs for details
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaWkP3gAKCRDdBJ7gKXxA
 jnmTAQCBiCyw7Pvvak5OM8Of0qQ8m/jQmNMBPRrd5M3tAAEOlwD9F7E6RjcmyItU
 k+sboDgNKTvgdRHTmRzj1t96aXJrSQQ=
 =OxTh
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2026-01-15-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:

 - kerneldoc fixes from Bagas Sanjaya

 - DAMON fixes from SeongJae

 - mremap VMA-related fixes from Lorenzo

 - various singletons - please see the changelogs for details

* tag 'mm-hotfixes-stable-2026-01-15-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (30 commits)
  drivers/dax: add some missing kerneldoc comment fields for struct dev_dax
  mm: numa,memblock: include <asm/numa.h> for 'numa_nodes_parsed'
  mailmap: add entry for Daniel Thompson
  tools/testing/selftests: fix gup_longterm for unknown fs
  mm/page_alloc: prevent pcp corruption with SMP=n
  iommu/sva: include mmu_notifier.h header
  mm: kmsan: fix poisoning of high-order non-compound pages
  tools/testing/selftests: add forked (un)/faulted VMA merge tests
  mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
  tools/testing/selftests: add tests for !tgt, src mremap() merges
  mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
  mm/zswap: fix error pointer free in zswap_cpu_comp_prepare()
  mm/damon/sysfs-scheme: cleanup access_pattern subdirs on scheme dir setup failure
  mm/damon/sysfs-scheme: cleanup quotas subdirs on scheme dir setup failure
  mm/damon/sysfs: cleanup attrs subdirs on context dir setup failure
  mm/damon/sysfs: cleanup intervals subdirs on attrs dir setup failure
  mm/damon/core: remove call_control in inactive contexts
  powerpc/watchdog: add support for hardlockup_sys_info sysctl
  mips: fix HIGHMEM initialization
  mm/hugetlb: ignore hugepage kernel args if hugepages are unsupported
  ...
2026-01-15 10:47:14 -08:00
Guenter Roeck be55257fab ftrace: Do not over-allocate ftrace memory
The pg_remaining calculation in ftrace_process_locs() assumes that
ENTRIES_PER_PAGE multiplied by 2^order equals the actual capacity of the
allocated page group. However, ENTRIES_PER_PAGE is PAGE_SIZE / ENTRY_SIZE
(integer division). When PAGE_SIZE is not a multiple of ENTRY_SIZE (e.g.
4096 / 24 = 170 with remainder 16), high-order allocations (like 256 pages)
have significantly more capacity than 256 * 170. This leads to pg_remaining
being underestimated, which in turn makes skip (derived from skipped -
pg_remaining) larger than expected, causing the WARN(skip != remaining)
to trigger.

Extra allocated pages for ftrace: 2 with 654 skipped
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7295 ftrace_process_locs+0x5bf/0x5e0

A similar problem in ftrace_allocate_records() can result in allocating
too many pages. This can trigger the second warning in
ftrace_process_locs().

Extra allocated pages for ftrace
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7276 ftrace_process_locs+0x548/0x580

Use the actual capacity of a page group to determine the number of pages
to allocate. Have ftrace_allocate_pages() return the number of allocated
pages to avoid having to calculate it. Use the actual page group capacity
when validating the number of unused pages due to skipped entries.
Drop the definition of ENTRIES_PER_PAGE since it is no longer used.

Cc: stable@vger.kernel.org
Fixes: 4a3efc6baf ("ftrace: Update the mcount_loc check of skipped entries")
Link: https://patch.msgid.link/20260113152243.3557219-1-linux@roeck-us.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-15 10:17:53 -05:00
Feng Tang e561383a39 powerpc/watchdog: add support for hardlockup_sys_info sysctl
Commit a9af76a787 ("watchdog: add sys_info sysctls to dump sys info on
system lockup") adds 'hardlock_sys_info' systcl knob for general kernel
watchdog to control what kinds of system debug info to be dumped on
hardlockup.

Add similar support in powerpc watchdog code to make the sysctl knob more
general, which also fixes a compiling warning in general watchdog code
reported by 0day bot.

Link: https://lkml.kernel.org/r/20251231080309.39642-1-feng.tang@linux.alibaba.com
Fixes: a9af76a787 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512030920.NFKtekA7-lkp@intel.com/
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14 22:16:22 -08:00
Pasha Tatashin 582f0f3864 kho: validate preserved memory map during population
If the previous kernel enabled KHO but did not call kho_finalize() (e.g.,
CONFIG_LIVEUPDATE=n or userspace skipped the finalization step), the
'preserved-memory-map' property in the FDT remains empty/zero.

Previously, kho_populate() would succeed regardless of the memory map's
state, reserving the incoming scratch regions in memblock.  However,
kho_memory_init() would later fail to deserialize the empty map.  By that
time, the scratch regions were already registered, leading to partial
initialization and subsequent list corruption (freeing scratch area twice)
during kho_init().

Move the validation of the preserved memory map earlier into
kho_populate(). If the memory map is empty/NULL:
1. Abort kho_populate() immediately with -ENOENT.
2. Do not register or reserve the incoming scratch memory, allowing the new
   kernel to reclaim those pages as standard free memory.
3. Leave the global 'kho_in' state uninitialized.

Consequently, kho_memory_init() sees no active KHO context
(kho_in.mem_chunks_phys is 0) and falls back to kho_reserve_scratch(),
allocating fresh scratch memory as if it were a standard cold boot.

Link: https://lkml.kernel.org/r/20251223140140.2090337-1-pasha.tatashin@soleen.com
Fixes: de51999e68 ("kho: allow memory preservation state updates after finalization")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reported-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Closes: https://lore.kernel.org/all/20251218215613.GA17304@ranerica-svr.sc.intel.com
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14 22:16:21 -08:00
Robin Murphy c6ccd09880 dma/pool: Avoid allocating redundant pools
On smaller systems, e.g. embedded arm64, it is common for all memory
to end up in ZONE_DMA32 or even ZONE_DMA. In such cases it is redundant
to allocate a nominal pool for an empty higher zone that just ends up
coming from a lower zone that should already have its own pool anyway.
We already have logic to skip allocating a ZONE_DMA pool when that is
empty, so generalise that to save memory in the case of other zones too.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Vladimir Kondratiev <vladimir.kondratiev@mobileye.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/8ab8d8a620dee0109f33f5cb63d6bfeed35aac37.1768230104.git.robin.murphy@arm.com
2026-01-14 11:00:00 +01:00
Robin Murphy b31ac41b59 dma/pool: Improve pool lookup
If CONFIG_ZONE_DMA32 is enabled, but we have not allocated the
corresponding atomic_pool_dma32, dma_guess_pool() may return the NULL
value of that and fail a GFP_DMA32 allocation without trying to fall
back to other pools which may exist. Furthermore, if no GFP_DMA pool
exists, it is preferable to try GFP_DMA32 rather than immediately fall
back to GFP_KERNEL with even less chance of success. Improve matters
by encoding an explicit order of pool preference for each flag.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Vladimir Kondratiev <vladimir.kondratiev@mobileye.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/c846b1a2f43295cac926c7af2ce907f62baec518.1768230104.git.robin.murphy@arm.com
2026-01-14 11:00:00 +01:00
Linus Torvalds c537e12dae bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmlm+x8ACgkQ6rmadz2v
 bTrC5Q/+NSCjBpHAinVL/09pahh0YJKczxWnpHG0rs6hSSZ88o+1uY+z3PQ3lRX9
 cHSwjIicRvwpX+DosGcXESqspbvo5MLx5DcB0WBSCnLodJ5MDap0iWltBCrGzqme
 mpeuZiQMSpsAGQLjICiywxcCndoaKEMSRZFmjlzNVJPbYkbxbZCQvwUQqg+Z4rQX
 t5zBJjb0X3Oaz3TcSiiW+yulZPoejFyF4P+4eTmFy+mAqq5D7CpMXrrjVTf78mSf
 IqMPbY/PssTExzna2d/ZKtxmKQ+HF6lN7+JaTuqbQsqQtxwh4eXsswezY/UKX7j+
 FdAuwz6W7ybO8UC3i4fvA1jFR60Hi5bBBoSXu+0hb/UGQQYXc+gRtB8DxqeUEgeb
 oVByGwrBWLRSLuBbQVPl3PRTNgwpBn41DkyIpXS6DK6thzdAqZ8zM15vFTYbWanF
 W/skdubkgIdhREunXMpLfZD5h0b/pXIYxzA59x//hZ+W8qvTuaabICja18x3SnqY
 hytoIQMo5d1UytyngfcHvXe79DCi5xFAqqVj0AqYR+CDdt5n9vrXgavX76hrYeHe
 /yYUHZ0h5A564zMXITMmUWDWvgmvTs1Q2hiV+rmtWPJ1bpOpPFxUosIWEBl4edA3
 vKlGoh/CtU9NJp/E/7nYdUV1vXlLJaareGQ/UXpFBl4NYsnNe4w=
 =Vt4X
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK in riscv JIT (Menglong
   Dong)

 - Fix reference count leak in bpf_prog_test_run_xdp() (Tetsuo Handa)

 - Fix metadata size check in bpf_test_run() (Toke Høiland-Jørgensen)

 - Check that BPF insn array is not allowed as a map for const strings
   (Deepanshu Kartikey)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix reference count leak in bpf_prog_test_run_xdp()
  bpf: Reject BPF_MAP_TYPE_INSN_ARRAY in check_reg_const_str()
  selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
  bpf, test_run: Subtract size of xdp_frame from allowed metadata size
  riscv, bpf: Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK
2026-01-13 21:21:13 -08:00
Gabriele Monaco ca1e8eede4 sched/deadline: Fix server stopping with runnable tasks
The deadline server can currently stop due to idle although fair tasks
are runnable. This happens essentially when:

 * the server is set to idle, a task wakes up, the server stops
 * a task wakes up, the server sets itself to idle and stops right away

Address both cases by clearing the server idle flag whenever a fair task
wakes up and accounting also for pending tasks in the definition of idle.

Fixes: f5a538c07d ("sched/deadline: Fix dl_server stop condition")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260113085159.114226-3-gmonaco@redhat.com
2026-01-13 11:37:52 +01:00
Peter Zijlstra 1e0a2ba7af sched: Provide idle_rq() helper
A fix for the dl_server 'requires' idle_cpu() usage, which made me
note that it and available_idle_cpu() are extern function calls.

And while idle_cpu() is used outside of kernel/sched/,
available_idle_cpu() is not.

This makes it hard to make idle_cpu() an inline helper, so provide
idle_rq() and implement idle_cpu() and available_idle_cpu() using
that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2026-01-13 11:37:52 +01:00
Pingfan Liu 64e6fa7661 sched/deadline: Fix potential race in dl_add_task_root_domain()
The access rule for local_cpu_mask_dl requires it to be called on the
local CPU with preemption disabled. However, dl_add_task_root_domain()
currently violates this rule.

Without preemption disabled, the following race can occur:

1. ThreadA calls dl_add_task_root_domain() on CPU 0
2. Gets pointer to CPU 0's local_cpu_mask_dl
3. ThreadA is preempted and migrated to CPU 1
4. ThreadA continues using CPU 0's local_cpu_mask_dl
5. Meanwhile, the scheduler on CPU 0 calls find_later_rq() which also
   uses local_cpu_mask_dl (with preemption properly disabled)
6. Both contexts now corrupt the same per-CPU buffer concurrently

Fix this by moving the local_cpu_mask_dl access to the preemption
disabled section.

Closes: https://lore.kernel.org/lkml/aSBjm3mN_uIy64nz@jlelli-thinkpadt14gen4.remote.csb
Fixes: 318e18ed22 ("sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug")
Reported-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://patch.msgid.link/20251125032630.8746-3-piliu@redhat.com
2026-01-13 11:37:51 +01:00
Pingfan Liu 479972efc2 sched/deadline: Remove unnecessary comment in dl_add_task_root_domain()
The comments above dl_get_task_effective_cpus() and
dl_add_task_root_domain() already explain how to fetch a valid
root domain and protect against races. There's no need to repeat
this inside dl_add_task_root_domain(). Remove the redundant comment
to keep the code clean.

No functional change.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://patch.msgid.link/20251125032630.8746-2-piliu@redhat.com
2026-01-13 11:37:51 +01:00
Thomas Weißschuh 05dc4a9fc8 hrtimer: Fix softirq base check in update_needs_ipi()
The 'clockid' field is not the correct way to check for a softirq base.

Fix the check to correctly compare the base type instead of the clockid.

Fixes: 1e7f7fbcd4 ("hrtimer: Avoid more SMP function calls in clock_was_set()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260107-hrtimer-clock-base-check-v1-1-afb5dbce94a1@linutronix.de
2026-01-13 11:04:41 +01:00
Linus Torvalds b71e635fee cgroup: Fixes for v6.19-rc5
- Fix -Wflex-array-member-not-at-end warnings in cgroup_root.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaWVQhw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGasOAQDmAej2XKpvd7zHxExKJ8xLYUc5dkqg00FLq86r
 iyikqAD9HYZNcTYE0pvqkgzpYbypQnMWs5F65tSjqJTyJNnL/Qw=
 =vI2H
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:

 - Fix -Wflex-array-member-not-at-end warnings in cgroup_root

* tag 'cgroup-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Eliminate cgrp_ancestor_storage in cgroup_root
2026-01-12 09:56:17 -10:00
Linus Torvalds fac4bdbaca Fix a crash in sched_mm_cid_after_execve().
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmljevkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iKTw//Y9jfjB5VrESjnQumkJeV9A6v/ueVMEhF
 lA6ktDOKJwUjIBqqTxDuMmbuTUVg1AEmfUX8GCv2DqS2ngXsbcJmR++3E/S9vWz9
 QvChizDNrw03fq7WDryPK/0Pk04VJsqFcpR89IlMPAOkygVZ06T0+JTSo9BVqMYb
 xs4B1TlP6JUz0AtIRKqIGVVt57Zx8kNoGHHhQIJ6ziZA30Srh1/n1Y+B3iJxRklO
 EalGmNQECVobjyYaGqNB1Z73fMmJIgOkZeUQF6TB+KWYPc9DelRTUS2JH9bBhh4O
 h6Pau9y45YHu5j5qlQxhnw8CmGVjPHYhMAUsuxje1AK4mvgg+ki4lqqWSjhtiuFr
 0TX4+Z1ZSF1AB0dD0CtunqZ1K++202LY0uzQ2sCHhS3hNzWOP6vYR+qvZ0fLBfxI
 trWsz9KUrM07POww6S3nNrGKqyEUwZbmaywX0WOTwQzPhCsU9gqwK/SwSoYEkzhs
 TEG2xjHeVmqRmJLK7WBKXxBwz6e/hd1YiCi+hSY7PYnCvly6vZOKZA8CuKueF495
 Th+IUX5obCNxAy0ZKrOclpGrfE0vl7JBpye6pnYDCvtCKLmF29rOnNDBo1SZvwsh
 xlu61B7ezdrNp9JneErUZbJRrQL+PQlCem3MF5UmMgRUfqwUP1s1tOjooMOfPE6o
 Jv9UIQO8MGs=
 =jEMw
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix a crash in sched_mm_cid_after_execve()"

* tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/mm_cid: Prevent NULL mm dereference in sched_mm_cid_after_execve()
2026-01-11 07:11:53 -10:00
Linus Torvalds fe948326e9 Fix perf swevent hrtimer deinit regression.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmljeiYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g9XhAAthSFL6Tl+fBLFYjwAFCeGUyfB5lqewT0
 z6Sf7s+TOrOCn/3zP4JoDbuV+golsazcKk/9e3VnzccS9F9VdryML1z01O8mz4ub
 yHTX53mWRu98d+o+B+IwVOlxW+yeyydyHXQu3Be5mStfdla3i6K95Xu314OQKH4F
 zSDO1X67zSDdUvT8wSDlhXh9tM3Pd+kFDAEzGAY7n/n/6PJMflSVid/qCOVEInaV
 WRZoHhZnQUNBuorPLKwhVcduHMS0fh1E5lFfN3WdiQCRBl8P1YXutN4KigS95M7x
 ymlNhzuQwh7doCursDL/Rbc9hIRviZsMSTtmHUMWf/rlthvtbh4knct+noNSF+bZ
 VTy6SAHNq0mgvQM1vBmWKqUIgy1ekpfainUeUXJ8cKNoKnUIo9+Y7cH1PwBCvfIx
 R1vgK9qb2SpfqisNBxMzCiovfXQ657ZNNAdRzWOUErCxDKalo+rZ+e5a5etetzVY
 YMAJfsjnwmiPDwXhWjBUqYS5gGbh9cHE5KGB6qI6UJQ3odxd3H6/nRLEcyweMMs2
 kxFqHaKNnIuwsGl+VAlgwTuUbaR7feVzcvV7yGFnTMclP7cXjVgJRSpXcbMd4FrQ
 sB03TuqBIpMAVrZ/kbBWJh43BPn+nH9dXlTGaWv+AAe5uq4a3two7u4XIibnoICg
 MG6uXO1xWtk=
 =XNWD
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fix from Ingo Molnar:
 "Fix perf swevent hrtimer deinit regression"

* tag 'perf-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Ensure swevent hrtimer is properly destroyed
2026-01-11 06:55:27 -10:00
Thomas Gleixner 2e4b28c48f treewide: Update email address
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-11 06:09:11 -10:00
Changwoo Min 380ff27af2 PM: EM: Add dump to get-perf-domains in the EM YNL spec
Add dump to get-perf-domains, so that a user can fetch either information
about a specific performance domain with do or information about all
performance domains with dump. Share the reply format of do and dump using
perf-domain-attrs, so remove perf-domains. The YNL spec, autogenerated
files, and the do implementation are updated, and the dump implementation
is added.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-5-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00
Changwoo Min d29b900cf4 PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
Previously, the cpus attribute was a string format which was a "%*pb"
stringification of a bitmap. That is not very consumable for a UAPI,
so let’s change it to an u64 array of CPU ids.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://patch.msgid.link/20260108053212.642478-4-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:44:46 +01:00