mirror-linux/kernel
Mayank Rungta 077ba03600 watchdog/hardlockup: improve buddy system detection timeliness
Currently, the buddy system only performs checks every 3rd sample.  With a
4-second interval.  If a check window is missed, the next check occurs 12
seconds later, potentially delaying hard lockup detection for up to 24
seconds.

Modify the buddy system to perform checks at every interval (4s). 
Introduce a missed-interrupt threshold to maintain the existing grace
period while reducing the detection window to 8-12 seconds.

Best and worst case detection scenarios:

Before (12s check window):
- Best case: Lockup occurs after first check but just before heartbeat
  interval. Detected in ~8s (8s till next check).
- Worst case: Lockup occurs just after a check.
  Detected in ~24s (missed check + 12s till next check + 12s logic).

After (4s check window with threshold of 3):
- Best case: Lockup occurs just before a check.
  Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
- Worst case: Lockup occurs just after a check.
  Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-4-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27 21:19:47 -07:00
..
bpf bpf: Fix sync_linked_regs regarding BPF_ADD_CONST32 zext propagation 2026-03-21 13:19:40 -07:00
cgroup cgroup: Don't expose dead tasks in cgroup 2026-03-06 12:43:25 -10:00
configs Remove WARN_ALL_UNSEEDED_RANDOM kernel config option 2026-02-23 11:18:48 -08:00
debug treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
dma dma-mapping: avoid random addr value print out on error path 2026-02-23 08:26:54 +01:00
entry Merge branch 'core/entry' into sched/core 2026-01-30 15:40:05 +01:00
events perf: Make sure to use pmu_ctx->pmu for groups 2026-03-12 11:29:16 +01:00
futex Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
gcov Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
irq Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kcsan kcsan: test: Adjust "expect" allocation type for kmalloc_obj 2026-02-26 09:54:08 -08:00
livepatch Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
liveupdate liveupdate: luo_file: remember retrieve() status 2026-02-24 11:13:26 -08:00
locking Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
module module: Fix kernel panic when a symbol st_shndx is out of bounds 2026-02-23 19:37:28 +00:00
power Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
printk Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
rcu Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
sched sched: idle: Consolidate the handling of two special cases 2026-03-16 20:29:47 +01:00
time time/jiffies: Mark jiffies_64_to_clock_t() notrace 2026-03-11 10:33:12 +01:00
trace ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod 2026-03-21 16:51:04 -04:00
unwind Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.kexec liveupdate: kho: move to kernel/liveupdate 2025-11-27 14:24:33 -08:00
Kconfig.locks
Kconfig.preempt sched: Further restrict the preemption modes 2026-01-08 12:43:57 +01:00
Makefile kcov: Enable context analysis 2026-01-05 16:43:34 +01:00
acct.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
async.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit.h
audit_fsnotify.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit_tree.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
audit_watch.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
auditfilter.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
auditsc.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
backtracetest.c
bounds.c x86/asm: Remove ANNOTATE_DATA_SPECIAL usage 2025-12-03 16:53:19 +01:00
capability.c
cfi.c
compat.c
configs.c
context_tracking.c context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}() 2026-01-01 16:39:46 +08:00
cpu.c SPDX updates for 7.0-rc1 2026-02-17 09:46:03 -08:00
cpu_pm.c
crash_core.c kernel/crash: remove inclusion of crypto/sha1.h 2026-03-27 21:19:46 -07:00
crash_core_test.c
crash_dump_dm_crypt.c crash_dump: use sysfs_emit in sysfs show functions 2026-03-27 21:19:38 -07:00
crash_reserve.c kernel/crash: remove inclusion of crypto/sha1.h 2026-03-27 21:19:46 -07:00
cred.c cred: remove unused set_security_override_from_ctx() 2026-01-06 20:52:57 -05:00
delayacct.c delayacct: fix uapi timespec64 definition 2026-02-08 00:13:32 -08:00
dma.c
elfcorehdr.c
exec_domain.c
exit.c exit: kill unnecessary thread_group_leader() checks in exit_notify() and do_notify_parent() 2026-03-27 21:19:34 -07:00
exit.h
extable.c
fail_function.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
fork.c pid: document the PIDNS_ADDING checks in alloc_pid() and copy_process() 2026-03-27 21:19:36 -07:00
freezer.c
gen_kheaders.sh
groups.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
hung_task.c hung_task: explicitly report I/O wait state in log output 2026-03-27 21:19:40 -07:00
iomem.c
irq_work.c
jump_label.c
kallsyms.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
kallsyms_internal.h kallsyms: Get rid of kallsyms relative base 2026-01-22 15:58:22 -07:00
kallsyms_selftest.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kallsyms_selftest.h
kcmp.c
kcov.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kexec.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kexec_core.c kernel/kexec: remove inclusion of crypto/hash.h 2026-03-27 21:19:46 -07:00
kexec_elf.c
kexec_file.c kexec: derive purgatory entry from symbol 2026-01-31 16:16:07 -08:00
kexec_internal.h
kheaders.c
kprobes.c kprobes: Remove unneeded warnings from __arm_kprobe_ftrace() 2026-03-13 23:15:26 +09:00
kstack_erase.c
ksyms_common.c
ksysfs.c kexec: move sysfs entries to /sys/kernel/kexec 2025-11-27 14:24:42 -08:00
kthread.c kthread: consolidate kthread exit paths to prevent use-after-free 2026-02-26 10:45:49 +01:00
latencytop.c
module_signature.c
notifier.c
nscommon.c nsfs: tighten permission checks for ns iteration ioctls 2026-02-27 22:00:08 +01:00
nsproxy.c
nstree.c nstree: tighten permission checks for listing 2026-02-27 22:00:11 +01:00
padata.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
panic.c kernel/panic: mark init_taint_buf as __initdata and panic instead of warning in alloc_taint_buf() 2026-03-27 21:19:33 -07:00
params.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
pid.c pid: document the PIDNS_ADDING checks in alloc_pid() and copy_process() 2026-03-27 21:19:36 -07:00
pid_namespace.c
pid_sysctl.h
profile.c
ptrace.c
range.c
reboot.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
regset.c
relay.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
resource.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
resource_kunit.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
rseq.c rseq: slice ext: Ensure rseq feature size differs from original rseq size 2026-02-23 11:19:19 +01:00
scftorture.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
scs.c
seccomp.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
signal.c complete_signal: kill always-true "core_state || !SIGNAL_GROUP_EXIT" check 2026-03-27 21:19:35 -07:00
smp.c
smpboot.c
smpboot.h
softirq.c
stacktrace.c
static_call.c
static_call_inline.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
stop_machine.c
sys.c RISC-V updates for v7.0 2026-02-12 19:17:44 -08:00
sys_ni.c rseq: Implement sys_rseq_slice_yield() 2026-01-22 11:11:17 +01:00
sysctl-test.c
sysctl.c sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions 2026-01-06 11:27:10 +01:00
task_work.c
taskstats.c
torture.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
tracepoint.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
tsacct.c tsacct: skip all kernel threads 2026-01-26 19:07:13 -08:00
ucount.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
uid16.c
uid16.h
umh.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
up.c
user-return-notifier.c
user.c
user_namespace.c Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses 2026-02-22 08:26:33 -08:00
utsname.c
utsname_sysctl.c
vhost_task.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
vmcore_info.c kernel/crash: remove inclusion of crypto/sha1.h 2026-03-27 21:19:46 -07:00
watch_queue.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
watchdog.c watchdog/hardlockup: improve buddy system detection timeliness 2026-03-27 21:19:47 -07:00
watchdog_buddy.c watchdog/hardlockup: improve buddy system detection timeliness 2026-03-27 21:19:47 -07:00
watchdog_perf.c watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency 2026-02-08 00:13:35 -08:00
workqueue.c workqueue: Rename show_cpu_pool{s,}_hog{s,}() to reflect broadened scope 2026-03-06 06:38:16 -10:00
workqueue_internal.h workqueue: Show in-flight work item duration in stall diagnostics 2026-03-05 07:27:48 -10:00