mirror-linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	0923fd0419	Locking updates for v6.20: Lock debugging: - Implement compiler-driven static analysis locking context checking, using the upcoming Clang 22 compiler's context analysis features. (Marco Elver) We removed Sparse context analysis support, because prior to removal even a defconfig kernel produced 1,700+ context tracking Sparse warnings, the overwhelming majority of which are false positives. On an allmodconfig kernel the number of false positive context tracking Sparse warnings grows to over 5,200... On the plus side of the balance actual locking bugs found by Sparse context analysis is also rather ... sparse: I found only 3 such commits in the last 3 years. So the rate of false positives and the maintenance overhead is rather high and there appears to be no active policy in place to achieve a zero-warnings baseline to move the annotations & fixers to developers who introduce new code. Clang context analysis is more complete and more aggressive in trying to find bugs, at least in principle. Plus it has a different model to enabling it: it's enabled subsystem by subsystem, which results in zero warnings on all relevant kernel builds (as far as our testing managed to cover it). Which allowed us to enable it by default, similar to other compiler warnings, with the expectation that there are no warnings going forward. This enforces a zero-warnings baseline on clang-22+ builds. (Which are still limited in distribution, admittedly.) Hopefully the Clang approach can lead to a more maintainable zero-warnings status quo and policy, with more and more subsystems and drivers enabling the feature. Context tracking can be enabled for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default disabled), but this will generate a lot of false positives. ( Having said that, Sparse support could still be added back, if anyone is interested - the removal patch is still relatively straightforward to revert at this stage. ) Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng) - Add support for Atomic<i8/i16/bool> and replace most Rust native AtomicBool usages with Atomic<bool> - Clean up LockClassKey and improve its documentation - Add missing Send and Sync trait implementation for SetOnce - Make ARef Unpin as it is supposed to be - Add __rust_helper to a few Rust helpers as a preparation for helper LTO - Inline various lock related functions to avoid additional function calls. WW mutexes: - Extend ww_mutex tests and other test-ww_mutex updates (John Stultz) Misc fixes and cleanups: - rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd Bergmann) - locking/local_lock: Include more missing headers (Peter Zijlstra) - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap) - rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir Duberstein) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmIXiURHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gH+A/9GX5UmU6+HuDfDrCtXm9GDve6wkwahvcW jLDxOYjs764I2BhyjZnjKjyF5zw60hbykem7Wcf5EV2YH30nM4XRgEWVJfkr1UAI Pra415X4DdOzZ6qYQIpO8Udt1LtR7BMSaXITVLJaLicxEoOVtq3SKxjqyhCFs7UW MfJdqleB+RMLqq3LlzgB4l43eKk1xyeHh+oQwI0RSxuIpVZme3p4TObnCKjIWnK7 Ihd+dkgC852WBjANgNL7F/sd5UsF5QX3wjtOrLhMKvkIgTPdXln0g398pivjN/G/ Kpnw18SFeb159JfJu8eMotsYvVnQ0D5aOcTBfL4qvOHCImhpcu2s6ik9BcXqt2yT 8IiuWk9xEM3Ok+I/I4ClT5cf5GYpyigV2QsXxn+IjDX5Na8v4zlHh0r8SElP8fOt 7dpQx7iw8UghAib3AzA3suN78Oh39m8l5BNobj7LAjnqOQcVvoPo4o7/48ntuH7A 38EucFrXfxQBMfGbMwvxEmgYuX7MyVfQLaPE06MHy1BkZkffT8Um38TB0iNtZmtf WUx01yLKWYspehlwFi319uVI4/Zp7FnTfqa5uKv1oSXVdL9vZojSXUzrgDV7FVqT Z4xAAw/kwNHpUG7y0zNOqd6PukovG1t+CjbLvK+eHPwc5c0vEGG2oTRAfEvvP1z/ kesYDmCyJnk= =N1gA -----END PGP SIGNATURE----- Merge tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Lock debugging: - Implement compiler-driven static analysis locking context checking, using the upcoming Clang 22 compiler's context analysis features (Marco Elver) We removed Sparse context analysis support, because prior to removal even a defconfig kernel produced 1,700+ context tracking Sparse warnings, the overwhelming majority of which are false positives. On an allmodconfig kernel the number of false positive context tracking Sparse warnings grows to over 5,200... On the plus side of the balance actual locking bugs found by Sparse context analysis is also rather ... sparse: I found only 3 such commits in the last 3 years. So the rate of false positives and the maintenance overhead is rather high and there appears to be no active policy in place to achieve a zero-warnings baseline to move the annotations & fixers to developers who introduce new code. Clang context analysis is more complete and more aggressive in trying to find bugs, at least in principle. Plus it has a different model to enabling it: it's enabled subsystem by subsystem, which results in zero warnings on all relevant kernel builds (as far as our testing managed to cover it). Which allowed us to enable it by default, similar to other compiler warnings, with the expectation that there are no warnings going forward. This enforces a zero-warnings baseline on clang-22+ builds (Which are still limited in distribution, admittedly) Hopefully the Clang approach can lead to a more maintainable zero-warnings status quo and policy, with more and more subsystems and drivers enabling the feature. Context tracking can be enabled for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default disabled), but this will generate a lot of false positives. ( Having said that, Sparse support could still be added back, if anyone is interested - the removal patch is still relatively straightforward to revert at this stage. ) Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng) - Add support for Atomic<i8/i16/bool> and replace most Rust native AtomicBool usages with Atomic<bool> - Clean up LockClassKey and improve its documentation - Add missing Send and Sync trait implementation for SetOnce - Make ARef Unpin as it is supposed to be - Add __rust_helper to a few Rust helpers as a preparation for helper LTO - Inline various lock related functions to avoid additional function calls WW mutexes: - Extend ww_mutex tests and other test-ww_mutex updates (John Stultz) Misc fixes and cleanups: - rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd Bergmann) - locking/local_lock: Include more missing headers (Peter Zijlstra) - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap) - rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir Duberstein)" * tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits) locking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCK rcu: Mark lockdep_assert_rcu_helper() __always_inline compiler-context-analysis: Remove __assume_ctx_lock from initializers tomoyo: Use scoped init guard crypto: Use scoped init guard kcov: Use scoped init guard compiler-context-analysis: Introduce scoped init guards cleanup: Make __DEFINE_LOCK_GUARD handle commas in initializers seqlock: fix scoped_seqlock_read kernel-doc tools: Update context analysis macros in compiler_types.h rust: sync: Replace `kernel::c_str!` with C-Strings rust: sync: Inline various lock related methods rust: helpers: Move #define __rust_helper out of atomic.c rust: wait: Add __rust_helper to helpers rust: time: Add __rust_helper to helpers rust: task: Add __rust_helper to helpers rust: sync: Add __rust_helper to helpers rust: refcount: Add __rust_helper to helpers rust: rcu: Add __rust_helper to helpers rust: processor: Add __rust_helper to helpers ...	2026-02-10 12:28:44 -08:00
Linus Torvalds	7a2c1b27cd	printk fixup for 6.19 rc6 -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmlqGMYbFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyT0UP/3Wn4tm0h0n3XeyujQEA IPNisJCczF6aWIAuwccRigR8hQFriPNvkZ5AhMGtotZJrY3uUoe1/8aF+XIdqnl/ Yb1434AGwVNIpSaap+vtEclHrLDmzZH+Z75/FTWQwldM/hPkPWJI8fsEuRLqBZsn v520NBFtrQVcOZKKNy0npBnHsC0DsAmqoZuOvLTx0mx5AyE029CfPbDMZuVnSNix KjZ4U5KL0qDs2LIMpdB/mqprydGkHdogdIbrPK3WtzStVgNbi9VmnV19ZwbUlXJM rYPbtbQg3htwuspgR+yM6O21qsthRf2qZF5+2/a929IzOBsD/qAXQbbxQWVpF7Qb ELYXNV4N5hqm9EW8WeOOpLKUUG7k0fRPf81X/07uGVafPMQKQJ8kFNgLBkBFR4ya RAMNxTPHbHQvVaLcRujxZXoC4Wh3ZTunQXpIouy0p9dKOzbsCAj0ZqeqDa09UsaW rCEm50p/Pd1csML8a9A/2nNoWjQzuSVmML7F6obGCOWaW6p21GhSKHzqqDIjBab9 3wxhpllVeYRYS2yhkKjOPJkQKXo3idIdpieLpW8IVbJvp/gQgmeKjWdhnvNvUan9 hyCzfI8OZXVz0vItBfWsoX44+6UtpLHd4o16aDYjyDItflJOPuQbWzky+icT2R3x 8B5xPC08tmEGitf3miv2EGq9 =d/xV -----END PGP SIGNATURE----- Merge tag 'printk-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Prevent softlockup by restoring IRQs in atomic flush after each record * tag 'printk-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/nbcon: Restore IRQ in atomic flush after each emitted record	2026-01-16 09:46:59 -08:00
Marco Elver	8ec56d9aab	printk: Move locking annotation to printk.c With Sparse support gone, Clang is a bit more strict and warns: ./include/linux/console.h:492:50: error: use of undeclared identifier 'console_mutex' 492 \| extern void console_list_unlock(void) __releases(console_mutex); Since it does not make sense to make console_mutex itself global, move the annotation to printk.c. Context analysis remains disabled for printk.c. This is needed to enable context analysis for modules that include <linux/console.h>. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251219154418.3592607-34-elver@google.com	2026-01-05 16:43:36 +01:00
Petr Mladek	9bd18e1262	printk/nbcon: Restore IRQ in atomic flush after each emitted record The commit `d5d399efff` ("printk/nbcon: Release nbcon consoles ownership in atomic flush after each emitted record") prevented stall of a CPU which lost nbcon console ownership because another CPU entered an emergency flush. But there is still the problem that the CPU doing the emergency flush might cause a stall on its own. Let's go even further and restore IRQ in the atomic flush after each emitted record. It is not a complete solution. The interrupts and/or scheduling might still be blocked when the emergency atomic flush was called with IRQs and/or scheduling disabled. But it should remove the following lockup: mlx5_core 0000:03:00.0: Shutdown was called kvm: exiting hardware virtualization arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0x00000104, hwcons 0x00000102] smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000032 ns for CPU#00 do_nothing (kernel/smp.c:1057) smp: csd: CSD lock (#1) unresponsive. [...] Call trace: pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P) nbcon_emit_next_record (kernel/printk/nbcon.c:1049) __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517) __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/printk/nbcon.c:1612) nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629) printk_kthreads_shutdown (kernel/printk/printk.c:?) syscore_shutdown (drivers/base/syscore.c:120) kernel_kexec (kernel/kexec_core.c:1045) __arm64_sys_reboot (kernel/reboot.c:794 kernel/reboot.c:722 kernel/reboot.c:722) invoke_syscall (arch/arm64/kernel/syscall.c:50) el0_svc_common.llvm.14158405452757855239 (arch/arm64/kernel/syscall.c:?) do_el0_svc (arch/arm64/kernel/syscall.c:152) el0_svc (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:73 arch/arm64/kernel/entry-common.c:169 arch/arm64/kernel/entry-common.c:182 arch/arm64/kernel/entry-common.c:749) el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:820) el0t_64_sync (arch/arm64/kernel/entry.S:600) In this case, nbcon_atomic_flush_pending() is called from printk_kthreads_shutdown() with IRQs and scheduling enabled. Note that __nbcon_atomic_flush_pending_con() is directly called also from nbcon_device_release() where the disabled IRQs might break PREEMPT_RT guarantees. But the atomic flush is called only in emergency or panic situations where the latencies are irrelevant anyway. An ultimate solution would be a touching of watchdogs. But it would hide all problems. Let's do it later when anyone reports a stall which does not have a better solution. Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu Tested-by: Breno Leitao <leitao@debian.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20251212124520.244483-1-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-12-15 16:18:41 +01:00
Linus Torvalds	208eed95fc	soc: driver updates for 6.19 This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes. - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs. - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access. - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmky/8gACgkQmmx57+YA GNlqohAApPTLM6Q4gf1cIcsTVaP0uxx9CBgupCGuT5ORrOMKBghVWjTOTSxeEAab UQF465QwYUUu602GH34UmRaY9CKW2bMIsfmkgmxNB4Y4Qd7yCgQNJ/h/TnN0rBH+ qTeEsRH/hax4miSNsh0oOZfVkZkg+23VF02d1VL0CcaX7y4oT45RPBQugrNx/gNS fHfVwgIq8vJ8WyrmM1h2nv1i1vgSzEy50B3kY674BBw83FcJTafNLvD7N5DSgD1H /I/2xeyEpb+oL1VfeHcXZaX/jf04O+cmvSzBi+MOH1tI3MpdxJib1vEYBdggoOWN K/FFGgsOY+DNmJPpSnPTTu8UpzksS8SxGBP7M9Q8roKZwA2c9wLotxySvjki5yv8 2zvabRdzbrSaoYwsH9QnZdQ2hVkJ9W8MESu8PevD3yMNuFUzledPDWW0N1SbGm78 0ZdB6NPdaBZYHMNMRdFhN8P275/Mx5e0XWN9oYMQqjPooH7YkyT7hJWz6ao2PCJP 8mDmnW1RzL+LWf7mJ25ZEtS+YjmKA/PVmogRrGurKCadvdxXqCF09KNljICHhmmu t0KB4dqw02OXLPvBk21qCi0zL56w1JDgqtS8suFvDYo9sCceeAbAcmpyoUOFj2N+ Upn976tb4iqFrr9mFswpmCJWPpqJkU+A+KnKsIRPU7N4kSrP35I= =HvlN -----END PGP SIGNATURE----- Merge tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...	2025-12-05 17:29:04 -08:00
Petr Mladek	5cae92e622	Merge branch 'rework/write_atomic-unsafe' into for-linus	2025-12-01 14:17:04 +01:00
Petr Mladek	4f132d81f9	Merge branch 'rework/threaded-printk' into for-linus	2025-12-01 14:16:45 +01:00
Petr Mladek	3a9a3f5fb2	Merge branch 'rework/suspend-fixes' into for-linus	2025-12-01 14:16:28 +01:00
Petr Mladek	b1e6c41ef9	Merge branch 'rework/preempt-legacy-kthread' into for-linus	2025-12-01 14:16:08 +01:00
Petr Mladek	2d786a5b80	Merge branch 'rework/nbcon-in-kdb' into for-linus	2025-12-01 14:15:43 +01:00
Petr Mladek	475bb520c3	Merge branch 'rework/atomic-flush-hardlockup' into for-linus	2025-12-01 14:15:26 +01:00
Marcos Paulo de Souza	466348abb0	printk: Use console_is_usable on console_unblank The macro for_each_console_srcu iterates over all registered consoles. It's implied that all registered consoles have CON_ENABLED flag set, making the check for the flag unnecessary. Call console_is_usable function to fully verify if the given console is usable before calling the ->unblank callback. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/20251121-printk-cleanup-part2-v2-3-57b8b78647f4@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-11-27 15:54:50 +01:00
John Ogness	66e7c1e0ee	printk: Avoid irq_work for printk_deferred() on suspend With commit ("printk: Avoid scheduling irq_work on suspend") the implementation of printk_get_console_flush_type() was modified to avoid offloading when irq_work should be blocked during suspend. Since printk uses the returned flush type to determine what flushing methods are used, this was thought to be sufficient for avoiding irq_work usage during the suspend phase. However, vprintk_emit() implements a hack to support printk_deferred(). In this hack, the returned flush type is adjusted to make sure no legacy direct printing occurs when printk_deferred() was used. Because of this hack, the legacy offloading flushing method can still be used, causing irq_work to be queued when it should not be. Adjust the vprintk_emit() hack to also consider @console_irqwork_blocked so that legacy offloading will not be chosen when irq_work should be blocked. Link: https://lore.kernel.org/lkml/87fra90xv4.fsf@jogness.linutronix.de Signed-off-by: John Ogness <john.ogness@linutronix.de> Fixes: `26873e3e7f` ("printk: Avoid scheduling irq_work on suspend") Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-11-24 15:44:48 +01:00
John Ogness	26873e3e7f	printk: Avoid scheduling irq_work on suspend Allowing irq_work to be scheduled while trying to suspend has shown to cause problems as some architectures interpret the pending interrupts as a reason to not suspend. This became a problem for printk() with the introduction of NBCON consoles. With every printk() call, NBCON console printing kthreads are woken by queueing irq_work. This means that irq_work continues to be queued due to printk() calls late in the suspend procedure. Avoid this problem by preventing printk() from queueing irq_work once console suspending has begun. This applies to triggering NBCON and legacy deferred printing as well as klogd waiters. Since triggering of NBCON threaded printing relies on irq_work, the pr_flush() within console_suspend_all() is used to perform the final flushing before suspending consoles and blocking irq_work queueing. NBCON consoles that are not suspended (due to the usage of the "no_console_suspend" boot argument) transition to atomic flushing. Introduce a new global variable @console_irqwork_blocked to flag when irq_work queueing is to be avoided. The flag is used by printk_get_console_flush_type() to avoid allowing deferred printing and switch NBCON consoles to atomic flushing. It is also used by vprintk_emit() to avoid klogd waking. Add WARN_ON_ONCE(console_irqwork_blocked) to the irq_work queuing functions to catch any code that attempts to queue printk irq_work during the suspending/resuming procedure. Cc: stable@vger.kernel.org # 6.13.x because no drivers in 6.12.x Fixes: `6b93bb41f6` ("printk: Add non-BKL (nbcon) console basic infrastructure") Closes: https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Sherry Sun <sherry.sun@nxp.com> Link: https://patch.msgid.link/20251113160351.113031-3-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-11-19 16:01:31 +01:00
John Ogness	d01ff281bd	printk: Allow printk_trigger_flush() to flush all types Currently printk_trigger_flush() only triggers legacy offloaded flushing, even if that may not be the appropriate method to flush for currently registered consoles. (The function predates the NBCON consoles.) Since commit `6690d6b527` ("printk: Add helper for flush type logic") there is printk_get_console_flush_type(), which also considers NBCON consoles and reports all the methods of flushing appropriate based on the system state and consoles available. Update printk_trigger_flush() to use printk_get_console_flush_type() to appropriately flush registered consoles. Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/stable/20251113160351.113031-2-john.ogness%40linutronix.de Tested-by: Sherry Sun <sherry.sun@nxp.com> Link: https://patch.msgid.link/20251113160351.113031-2-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-11-19 16:01:31 +01:00
Thierry Reding	a97fbc3ee3	syscore: Pass context data to callbacks Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>	2025-11-14 10:01:52 +01:00
Petr Mladek	394aa576c0	printk_ringbuffer: Create a helper function to decide whether more space is needed The decision whether some more space is needed is tricky in the printk ring buffer code: 1. The given lpos values might overflow. A subtraction must be used instead of a simple "lower than" check. 2. Another CPU might reuse the space in the mean time. It can be detected when the subtraction is bigger than DATA_SIZE(data_ring). 3. There is exactly enough space when the result of the subtraction is zero. But more space is needed when the result is exactly DATA_SIZE(data_ring). Add a helper function to make sure that the check is done correctly in all situations. Also it helps to make the code consistent and better documented. Suggested-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/87tsz7iea2.fsf@jogness.linutronix.de Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20251107194720.1231457-3-pmladek@suse.com [pmladek@suse.com: Updated wording as suggested by John] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-11-10 13:09:43 +01:00
Petr Mladek	cc3bad11de	printk_ringbuffer: Fix check of valid data size when blk_lpos overflows The commit `67e1b0052f` ("printk_ringbuffer: don't needlessly wrap data blocks around") allows to use the last 4 bytes of the ring buffer. But the check for the @data_size was not properly updated in get_data(). It fails when "blk_lpos->next" overflows to "0". In this case: + is_blk_wrapped(data_ring, blk_lpos->begin, blk_lpos->next) returns "false" because it checks "blk_lpos->next - 1". + "blk_lpos->begin < blk_lpos->next" fails because "blk_lpos->next" is already 0. + is_blk_wrapped(data_ring, blk_lpos->begin + DATA_SIZE(data_ring), blk_lpos->next) returns "false" because "begin_lpos" is from the next wrap but "next_lpos - 1" is from the previous one. As a result, get_data() triggers the WARN_ON_ONCE() for "Illegal block description", for example: [ 216.317316][ T7652] loop0: detected capacity change from 0 to 16 1 printk messages dropped [ 216.327750][ T7652] ------------[ cut here ]------------ [ 216.327789][ T7652] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0x48a/0x840, CPU#1: syz.0.585/7652 [ 216.327848][ T7652] Modules linked in: [ 216.327907][ T7652] CPU: 1 UID: 0 PID: 7652 Comm: syz.0.585 Not tainted syzkaller #0 PREEMPT(full) [ 216.327933][ T7652] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 [ 216.327953][ T7652] RIP: 0010:get_data+0x48a/0x840 [ 216.327986][ T7652] Code: 83 c4 f8 48 b8 00 00 00 00 00 fc ff df 41 0f b6 04 07 84 c0 0f 85 ee 01 00 00 44 89 65 00 49 83 c5 08 eb 13 e8 a7 19 1f 00 90 <0f> 0b 90 eb 05 e8 9c 19 1f 00 45 31 ed 4c 89 e8 48 83 c4 28 5b 41 [ 216.328007][ T7652] RSP: 0018:ffffc900035170e0 EFLAGS: 00010293 [ 216.328029][ T7652] RAX: ffffffff81a1eee9 RBX: 00003fffffffffff RCX: ffff888033255b80 [ 216.328048][ T7652] RDX: 0000000000000000 RSI: 00003fffffffffff RDI: 0000000000000000 [ 216.328063][ T7652] RBP: 0000000000000012 R08: 0000000000000e55 R09: 000000325e213cc7 [ 216.328079][ T7652] R10: 000000325e213cc7 R11: 00001de4c2000037 R12: 0000000000000012 [ 216.328095][ T7652] R13: 0000000000000000 R14: ffffc90003517228 R15: 1ffffffff1bca646 [ 216.328111][ T7652] FS: 00007f44eb8da6c0(0000) GS:ffff888125fda000(0000) knlGS:0000000000000000 [ 216.328131][ T7652] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 216.328147][ T7652] CR2: 00007f44ea9722e0 CR3: 0000000066344000 CR4: 00000000003526f0 [ 216.328168][ T7652] Call Trace: [ 216.328178][ T7652] <TASK> [ 216.328199][ T7652] _prb_read_valid+0x672/0xa90 [ 216.328328][ T7652] ? desc_read+0x1b8/0x3f0 [ 216.328381][ T7652] ? __pfx__prb_read_valid+0x10/0x10 [ 216.328422][ T7652] ? panic_on_this_cpu+0x32/0x40 [ 216.328450][ T7652] prb_read_valid+0x3c/0x60 [ 216.328482][ T7652] printk_get_next_message+0x15c/0x7b0 [ 216.328526][ T7652] ? __pfx_printk_get_next_message+0x10/0x10 [ 216.328561][ T7652] ? __lock_acquire+0xab9/0xd20 [ 216.328595][ T7652] ? console_flush_all+0x131/0xb10 [ 216.328621][ T7652] ? console_flush_all+0x478/0xb10 [ 216.328648][ T7652] console_flush_all+0x4cc/0xb10 [ 216.328673][ T7652] ? console_flush_all+0x131/0xb10 [ 216.328704][ T7652] ? __pfx_console_flush_all+0x10/0x10 [ 216.328748][ T7652] ? is_printk_cpu_sync_owner+0x32/0x40 [ 216.328781][ T7652] console_unlock+0xbb/0x190 [ 216.328815][ T7652] ? __pfx___down_trylock_console_sem+0x10/0x10 [ 216.328853][ T7652] ? __pfx_console_unlock+0x10/0x10 [ 216.328899][ T7652] vprintk_emit+0x4c5/0x590 [ 216.328935][ T7652] ? __pfx_vprintk_emit+0x10/0x10 [ 216.328993][ T7652] _printk+0xcf/0x120 [ 216.329028][ T7652] ? __pfx__printk+0x10/0x10 [ 216.329051][ T7652] ? kernfs_get+0x5a/0x90 [ 216.329090][ T7652] _erofs_printk+0x349/0x410 [ 216.329130][ T7652] ? __pfx__erofs_printk+0x10/0x10 [ 216.329161][ T7652] ? __raw_spin_lock_init+0x45/0x100 [ 216.329186][ T7652] ? __init_swait_queue_head+0xa9/0x150 [ 216.329231][ T7652] erofs_fc_fill_super+0x1591/0x1b20 [ 216.329285][ T7652] ? __pfx_erofs_fc_fill_super+0x10/0x10 [ 216.329324][ T7652] ? sb_set_blocksize+0x104/0x180 [ 216.329356][ T7652] ? setup_bdev_super+0x4c1/0x5b0 [ 216.329385][ T7652] get_tree_bdev_flags+0x40e/0x4d0 [ 216.329410][ T7652] ? __pfx_erofs_fc_fill_super+0x10/0x10 [ 216.329444][ T7652] ? __pfx_get_tree_bdev_flags+0x10/0x10 [ 216.329483][ T7652] vfs_get_tree+0x92/0x2b0 [ 216.329512][ T7652] do_new_mount+0x302/0xa10 [ 216.329537][ T7652] ? apparmor_capable+0x137/0x1b0 [ 216.329576][ T7652] ? __pfx_do_new_mount+0x10/0x10 [ 216.329605][ T7652] ? ns_capable+0x8a/0xf0 [ 216.329637][ T7652] ? kmem_cache_free+0x19b/0x690 [ 216.329682][ T7652] __se_sys_mount+0x313/0x410 [ 216.329717][ T7652] ? __pfx___se_sys_mount+0x10/0x10 [ 216.329836][ T7652] ? do_syscall_64+0xbe/0xfa0 [ 216.329869][ T7652] ? __x64_sys_mount+0x20/0xc0 [ 216.329901][ T7652] do_syscall_64+0xfa/0xfa0 [ 216.329932][ T7652] ? lockdep_hardirqs_on+0x9c/0x150 [ 216.329964][ T7652] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 216.329988][ T7652] ? clear_bhb_loop+0x60/0xb0 [ 216.330017][ T7652] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 216.330040][ T7652] RIP: 0033:0x7f44ea99076a [ 216.330080][ T7652] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 [ 216.330100][ T7652] RSP: 002b:00007f44eb8d9e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [ 216.330128][ T7652] RAX: ffffffffffffffda RBX: 00007f44eb8d9ef0 RCX: 00007f44ea99076a [ 216.330146][ T7652] RDX: 0000200000000180 RSI: 00002000000001c0 RDI: 00007f44eb8d9eb0 [ 216.330164][ T7652] RBP: 0000200000000180 R08: 00007f44eb8d9ef0 R09: 0000000000000000 [ 216.330181][ T7652] R10: 0000000000000000 R11: 0000000000000246 R12: 00002000000001c0 [ 216.330196][ T7652] R13: 00007f44eb8d9eb0 R14: 00000000000001a1 R15: 0000200000000080 [ 216.330233][ T7652] </TASK> Solve the problem by moving and fixing the sanity check. The problematic if-else-if-else code will just distinguish three basic scenarios: "regular" vs. "wrapped" vs. "too many times wrapped" block. The new sanity check is more precise. A valid "data_size" must be lower than half of the data buffer size. Also it must not be zero at this stage. It allows to catch problematic "data_size" even for wrapped blocks. Closes: https://lore.kernel.org/all/69096836.a70a0220.88fb8.0006.GAE@google.com/ Closes: https://lore.kernel.org/all/69078fb6.050a0220.29fc44.0029.GAE@google.com/ Fixes: `67e1b0052f` ("printk_ringbuffer: don't needlessly wrap data blocks around") Reviewed-by: John Ogness <john.ogness@linutronix.de> Tested-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20251107194720.1231457-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-11-10 12:58:28 +01:00
John Ogness	187de7c212	printk: nbcon: Allow unsafe write_atomic() for panic There may be console drivers that have not yet figured out a way to implement safe atomic printing (->write_atomic() callback). These drivers could choose to only implement threaded printing (->write_thread() callback), but then it is guaranteed that _no_ output will be printed during panic. Not even attempted. As a result, developers may be tempted to implement unsafe ->write_atomic() callbacks and/or implement some sort of custom deferred printing trickery to try to make it work. This goes against the principle intention of the nbcon API as well as endangers other nbcon drivers that are doing things correctly (safely). As a compromise, allow nbcon drivers to implement unsafe ->write_atomic() callbacks by providing a new console flag CON_NBCON_ATOMIC_UNSAFE. When specified, the ->write_atomic() callback for that console will _only_ be called during the final "hope and pray" flush attempt at the end of a panic: nbcon_atomic_flush_unsafe(). Signed-off-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/lkml/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/all/swdpckuwwlv3uiessmtnf2jwlx3jusw6u7fpk5iggqo4t2vdws@7rpjso4gr7qp/ [1] Link: https://lore.kernel.org/all/20251103-fix_netpoll_aa-v4-1-4cfecdf6da7c@debian.org/ [2] Link: https://patch.msgid.link/20251027161212.334219-2-john.ogness@linutronix.de [pmladek@suse.com: Fix build with rework/nbcon-in-kdb branch.] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-11-07 14:30:52 +01:00
Petr Mladek	d5d399efff	printk/nbcon: Release nbcon consoles ownership in atomic flush after each emitted record printk() tries to flush messages with NBCON_PRIO_EMERGENCY on nbcon consoles immediately. It might take seconds to flush all pending lines on slow serial consoles. Note that there might be hundreds of messages, for example: [ 3.771531][ T1] pci 0000:3e:08.1: [8086:324 replaying previous printk message [ 3.771531][ T1] pci 0000:3e:08.1: [8086:3246] type 00 class 0x088000 PCIe Root Complex Integrated Endpoint [ ... more than 2000 lines, about 200kB messages ... ] [ 3.837752][ T1] pci 0000:20:01.0: Adding to iommu group 18 [ 3.837851][ T replaying previous printk message [ 3.837851][ T1] pci 0000:20:03.0: Adding to iommu group 19 [ 3.837946][ T1] pci 0000:20:05.0: Adding to iommu group 20 [ ... more than 500 messages for iommu groups 21-590 ...] [ 3.912932][ T1] pci 0000:f6:00.1: Adding to iommu group 591 [ 3.913070][ T1] pci 0000:f6:00.2: Adding to iommu group 592 [ 3.913243][ T1] DMAR: Intel(R) Virtualization Technology for Directed I/O [ 3.913245][ T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 3.913245][ T1] software IO TLB: mapped [mem 0x000000004f000000-0x0000000053000000] (64MB) [ 3.913324][ T1] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer [ 3.913325][ T1] RAPL PMU: hw unit of domain package 2^-14 Joules [ 3.913326][ T1] RAPL PMU: hw unit of domain dram 2^-14 Joules [ 3.913327][ T1] RAPL PMU: hw unit of domain psys 2^-0 Joules [ 3.933486][ T1] ------------[ cut here ]------------ [ 3.933488][ T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180 [ 3.930291][ C0] watchdog: Watchdog detected hard LOCKUP on cpu 0 [ 3.930291][ C0] Kernel panic - not syncing: Hard LOCKUP [...] [ 3.930291][ C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted... [...] [ 3.930291][ C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50 [ 3.930291][ C0] Call Trace: [...] [ 3.930291][ C0] <TASK> [ 3.930291][ C0] serial8250_console_write+0x16d/0x5c0 [ 3.930291][ C0] nbcon_emit_next_record+0x22c/0x250 [ 3.930291][ C0] nbcon_emit_one+0x93/0xe0 [ 3.930291][ C0] nbcon_kthread_func+0x13c/0x1c0 The are visible two takeovers of the console ownership: - The 1st one is triggered by the "WARNING: CPU: 2 PID: 1 at arch/x86/..." line printed with NBCON_PRIO_EMERGENCY. - The 2nd one is triggered by the "Kernel panic - not syncing: Hard LOCKUP" line printed with NBCON_PRIO_PANIC. There are more than 2500 lines, at about 240kB, emitted between the takeover and the 1st "WARNING" line in the emergency context. This amount of pending messages had to be flushed by nbcon_atomic_flush_pending() when WARN() printed its first line. The atomic flush was holding the nbcon console context for too long so that it triggered hard lockup on the CPU running the printk kthread "pr/ttyS0". The kthread needed to reacquire the console ownership for restoring the original serial port state in serial8250_console_write(). Prevent the hardlockup by releasing the nbcon console ownership after each emitted record. Note that __nbcon_atomic_flush_pending_con() used to hold the console ownership all the time because it blocked the printk kthread. Otherwise the kthread tried to flush the messages in parallel which caused repeated takeovers and more replayed messages. It is not longer a problem because the repeated takeovers are blocked by the counter of emergency contexts, see nbcon_cpu_emergency_cnt. Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20250926124912.243464-4-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-30 12:11:33 +01:00
Petr Mladek	4c3ba0d592	printk/nbcon/panic: Allow printk kthread to sleep when the system is in panic The printk kthread might be running when there is a panic in progress. But it is not able to acquire the console ownership any longer. Prevent the desperate attempts to acquire the ownership and allow sleeping in panic. It would make it behave the same as when there is any CPU in an emergency context. Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20250926124912.243464-3-pmladek@suse.com [pmladek@suse.com: Rebased on top of 6.18-rc1 (panic_in_progress() moved to panic.c)] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-30 12:11:33 +01:00
Petr Mladek	c41c0ebfa1	printk/nbcon: Block printk kthreads when any CPU is in an emergency context In emergency contexts, printk() tries to flush messages directly even on nbcon consoles. And it is allowed to takeover the console ownership and interrupt the printk kthread in the middle of a message. Only one takeover and one repeated message should be enough in most situations. The first emergency message flushes the backlog and printk kthreads get to sleep. Next emergency messages are flushed directly and printk() does not wake up the kthreads. However, the one takeover is not guaranteed. Any printk() in normal context on another CPU could wake up the kthreads. Or a new emergency message might be added before the kthreads get to sleep. Note that the interrupted .write_thread() callbacks usually have to call nbcon_reacquire_nobuf() and restore the original device setting before checking for pending messages. The risk of the repeated takeovers will be even bigger because __nbcon_atomic_flush_pending_con is going to release the console ownership after each emitted record. It will be needed to prevent hardlockup reports on other CPUs which are busy waiting for the context ownership, for example, by nbcon_reacquire_nobuf() or __uart_port_nbcon_acquire(). The repeated takeovers break the output, for example: [ 5042.650211][ T2220] Call Trace: [ 5042.6511 replaying previous printk message [ 5042.651192][ T2220] <TASK> [ 5042.652160][ T2220] kunit_run_ replaying previous printk message [ 5042.652160][ T2220] kunit_run_tests+0x72/0x90 [ 5042.653340][ T22 replaying previous printk message [ 5042.653340][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5042.654628][ T2220] ? stack_trace_save+0x4d/0x70 [ 5042.6553 replaying previous printk message [ 5042.655394][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5042.656713][ T2220] ? save_trace+0x5b/0x180 A more robust solution is to block the printk kthread entirely whenever any CPU enters an emergency context. This ensures that critical messages can be flushed without contention from the normal, non-atomic printing path. Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20250926124912.243464-2-pmladek@suse.com [pmladek@suse.com: Added changes proposed by John Ogness] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-30 12:10:16 +01:00
Oleg Nesterov	2079395583	printk_legacy_map: use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP printk_legacy_map is used to hide lock nesting violations caused by legacy drivers and is using the wrong override type. LD_WAIT_SLEEP is for always sleeping lock types such as mutex_t. LD_WAIT_CONFIG is for lock type which are sleeping while spinning on PREEMPT_RT such as spinlock_t. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251026150726.GA23223@redhat.com [pmladek@suse.com: Fixed indentation.] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-29 14:20:56 +01:00
Marcos Paulo de Souza	4349cf0df3	printk: nbcon: Export nbcon_write_context_set_buf This function will be used in the next patch to allow a driver to set both the message and message length of a nbcon_write_context. This is necessary because the function also initializes the ->unsafe_takeover struct member. By using this helper we ensure that the struct is initialized correctly. Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-4-866aac60a80e@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-24 12:56:19 +02:00
Marcos Paulo de Souza	286b113d70	printk: nbcon: Allow KDB to acquire the NBCON context KDB can interrupt any console to execute the "mirrored printing" at any time, so add an exception to nbcon_context_try_acquire_direct to allow to get the context if the current CPU is the same as kdb_printf_cpu. This change will be necessary for the next patch, which fixes kdb_msg_write to work with NBCON consoles by calling ->write_atomic on such consoles. But to print it first needs to acquire the ownership of the console, so nbcon_context_try_acquire_direct is fixed here. Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-3-866aac60a80e@suse.com [pmladek@suse.com: Fix compilation with !CONFIG_KGDB_KDB.] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-24 12:55:10 +02:00
Marcos Paulo de Souza	49f7d3054e	printk: nbcon: Introduce KDB helpers These helpers will be used when calling console->write_atomic on KDB code in the next patch. It's basically the same implementation as nbcon_device_try_acquire, but using NBCON_PRIO_EMERGENCY when acquiring the context. If the acquire succeeds, the message and message length are assigned to nbcon_write_context so ->write_atomic can print the message. After release try to flush the console since there may be a backlog of messages in the ringbuffer. The kthread console printers do not get a chance to run while kdb is active. Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-2-866aac60a80e@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-24 12:19:23 +02:00
Marcos Paulo de Souza	4da42aaa82	printk: nbcon: Export console_is_usable The helper will be used on KDB code in the next commits. Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Link: https://patch.msgid.link/20251016-nbcon-kgdboc-v6-1-866aac60a80e@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-24 12:19:23 +02:00
Andrew Murray	1bc9a28f07	printk: Use console_flush_one_record for legacy printer kthread The legacy printer kthread uses console_lock and __console_flush_and_unlock to flush records to the console. This approach results in the console_lock being held for the entire duration of a flush. This can result in large waiting times for those waiting for console_lock especially where there is a large volume of records or where the console is slow (e.g. serial). This contention is observed during boot, as the call to filp_open in console_on_rootfs will delay progression to userspace until any in-flight flush is completed. Let's instead use console_flush_one_record and release/reacquire the console_lock between records. On a PocketBeagle 2, with the following boot args: "console=ttyS2,9600 initcall_debug=1 loglevel=10" Without this patch: [ 5.613166] +console_on_rootfs/filp_open [ 5.643473] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit [ 5.643823] probe of fa00000.mmc returned 0 after 258244 usecs [ 5.710520] mmc1: new UHS-I speed SDR104 SDHC card at address 5048 [ 5.721976] mmcblk1: mmc1:5048 SD32G 29.7 GiB [ 5.747258] mmcblk1: p1 p2 [ 5.753324] probe of mmc1:5048 returned 0 after 40002 usecs [ 15.595240] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30040000.pruss [ 15.595282] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e010000.watchdog [ 15.595297] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e000000.watchdog [ 15.595437] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30300000.crc [ 146.275961] -console_on_rootfs/filp_open ... and with: [ 5.477122] +console_on_rootfs/filp_open [ 5.595814] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit [ 5.596181] probe of fa00000.mmc returned 0 after 312757 usecs [ 5.662813] mmc1: new UHS-I speed SDR104 SDHC card at address 5048 [ 5.674367] mmcblk1: mmc1:5048 SD32G 29.7 GiB [ 5.699320] mmcblk1: p1 p2 [ 5.705494] probe of mmc1:5048 returned 0 after 39987 usecs [ 6.418682] -console_on_rootfs/filp_open ... ... [ 15.593509] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30040000.pruss [ 15.593551] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e010000.watchdog [ 15.593566] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e000000.watchdog [ 15.593704] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30300000.crc Where I've added a printk surrounding the call in console_on_rootfs to filp_open. Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-3-00f1f0ac055a@thegoodpenguin.co.uk [pmladek@suse.com: Fixed ordering of variable definition suggested by John.] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-23 17:15:44 +02:00
Petr Mladek	ba00f7c4d0	printk: console_flush_one_record() code cleanup console_flush_one_record() and console_flush_all() duplicate several checks. They both want to tell the caller that consoles are not longer usable in this context because it has lost the lock or the lock has to be reserved for the panic CPU. Remove the duplication by changing the semantic of the function console_flush_one_record() return value and parameters. The function will return true when it is able to do the job. It means that there is at least one usable console. And the flushing was not interrupted by a takeover or panic_on_other_cpu(). Also replace the @any_usable parameter with @try_again. The @try_again parameter will be set to true when the function could do the job and at least one console made a progress. Motivation: The callers need to know when + they should continue flushing => @try_again + when the console is flushed => can_do_the_job(return) && !@try_again + when @next_seq is valid => same as flushed + when lost console_lock => @takeover The proposed change makes it clear when the function can do the job. It simplifies the answer for the other questions. Also the return value from console_flush_one_record() can be used as return value from console_flush_all(). Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-2-00f1f0ac055a@thegoodpenguin.co.uk [pmladek@suse.com: Fixed type of any_usable variable reported by John] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-23 17:11:13 +02:00
Andrew Murray	741ea7aa95	printk: Introduce console_flush_one_record console_flush_all prints all remaining records to all usable consoles whilst its caller holds console_lock. This can result in large waiting times for those waiting for console_lock especially where there is a large volume of records or where the console is slow (e.g. serial). Let's extract the parts of this function which print a single record into a new function named console_flush_one_record. This can later be used for functions that will release and reacquire console_lock between records. This commit should not change existing functionality. Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20251020-printk_legacy_thread_console_lock-v3-1-00f1f0ac055a@thegoodpenguin.co.uk Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-23 17:08:41 +02:00
Daniil Tatianin	67e1b0052f	printk_ringbuffer: don't needlessly wrap data blocks around Previously, data blocks that perfectly fit the data ring buffer would get wrapped around to the beginning for no reason since the calculated offset of the next data block would belong to the next wrap. Since this offset is not actually part of the data block, but rather the offset of where the next data block is going to start, there is no reason to include it when deciding whether the current block fits the buffer. Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Tested-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20250905144152.9137-2-d-tatianin@yandex-team.ru [pmladek@suse.com: Updated indentation.] Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-10-22 14:03:36 +02:00
Linus Torvalds	48e3694ae7	printk changes for 6.18 -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmjeOX8bFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyM7MP/0mD0oJa/8DRiZ0r1qLM xt/mCZIXhbqfJdCIaLEpfL35qPI59PWvrECGxwzL7CXOHlORxoCW98LoUDkigRKW e93mv8xSBaac11QrMSy32cEMSpzXmcNjz5u/02AUveXpxERr82nLgGwKY8eF0eAy 7wF+N4bGHPShEy2J7kVFVrgZompFQLDgPB3/GycSdKsZd7ss9XiKa/EWvPJbEGOD CNC0MlAd5QDxVXxhioC9g9tZaUENZrqxys1vfROFx8RlaobRa2Vt/TEJQ5WCzmKo JO8JdojD03h0U2DdFE/lgTjf/lz8hTiaRaTifNnhUofqvzFKupzcv9L2xq/zmpfC 4dMW6YOObD/FVeNVsgVC16/1WDKl+XfyogAmtTBjSNoPd6WoSczZsfubxGc30lW4 AqJ2OpPa+CXq2elL+Qb0dLVekMhNytjfYdyFK/FP0DXrkoV4FMHdfNLW58TR7Pcq iN1MNEkEs4dWSbn2xJzKlFQyVq2/t6A6l5N/kodZ1xLWiwFYzeA3FTOVzrOlXnde IAYh3iH2WoTiHjZB/oomhnXKCMeTIvMYsGQQl4y+XqRHIwpEjSjcm8/M3hvlMyJE m0D6cpW8IRLoZ0raxqJPVpXR3WZsLJm2InGMrfY/tPWTU7L5uxmF51GPXYKdUmpZ NUgzh2cOwSWWU4UHt/AiD9cd =uFn0 -----END PGP SIGNATURE----- Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add KUnit test for the printk ring buffer - Fix the check of the maximal record size which is allowed to be stored into the printk ring buffer. It prevents corruptions of the ring buffer. Note that printk() is on the safe side. The messages are limited by 1kB buffer and are always small enough for the minimal log buffer size 4kB, see CONFIG_LOG_BUF_SHIFT definition. * tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: ringbuffer: Fix data block max size check printk: kunit: support offstack cpumask printk: kunit: Fix __counted_by() in struct prbtest_rbdata printk: ringbuffer: Explain why the KUnit test ignores failed writes printk: ringbuffer: Add KUnit test	2025-10-04 11:13:11 -07:00
Petr Mladek	7a75a5da79	Merge branch 'rework/ringbuffer-kunit-test' into for-linus	2025-10-02 10:33:08 +02:00
John Ogness	4d164e08cd	printk: ringbuffer: Fix data block max size check Currently data_check_size() limits data blocks to a maximum size of the full buffer minus an ID (long integer): max_size <= DATA_SIZE(data_ring) - sizeof(long) However, this is not an appropriate limit due to the nature of wrapping data blocks. For example, if a data block is larger than half the buffer: size = (DATA_SIZE(data_ring) / 2) + 8 and begins exactly in the middle of the buffer, then: - the data block will wrap - the ID will be stored at exactly half of the buffer - the record data begins at the beginning of the buffer - the record data ends 8 bytes _past_ exactly half of the buffer The record overwrites itself, i.e. needs more space than the full buffer! Luckily printk() is not vulnerable to this problem because truncate_msg() limits printk-messages to 1/4 of the ringbuffer. Indeed, by adjusting the printk_ringbuffer KUnit test, which does not use printk() and its truncate_msg() check, it is easy to see that the ringbuffer becomes corrupted for records larger than half the buffer size. The corruption occurs because data_push_tail() expects it will never be requested to push the tail beyond the head. Avoid this problem by adjusting data_check_size() to limit record sizes to half the buffer size. Also add WARN_ON_ONCE() before relevant data_push_tail() calls to validate that there are no such illegal requests. WARN_ON_ONCE() is used, rather than just adding extra checks to data_push_tail() because it is considered a bug to attempt such illegal actions. Link: https://lore.kernel.org/lkml/aMLrGCQSyC8odlFZ@pathway.suse.cz Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-09-26 16:17:27 +02:00
Jinchao Wang	652ab7c8fa	panic: use angle-bracket include for panic.h Replace quoted includes of panic.h with `#include <linux/panic.h>` for consistency across the kernel. Link: https://lkml.kernel.org/r/20250829051312.33773-1-wangjinchao600@gmail.com Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Qianqiang Liu <qianqiang.liu@163.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:54 -07:00
Jinchao Wang	d4a36db563	panic/printk: replace other_cpu_in_panic() with panic_on_other_cpu() The helper other_cpu_in_panic() duplicated logic already provided by panic_on_other_cpu(). Remove other_cpu_in_panic() and update all users to call panic_on_other_cpu() instead. This removes redundant code and makes panic handling consistent. Link: https://lkml.kernel.org/r/20250825022947.1596226-9-wangjinchao600@gmail.com Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Doug Anderson <dianders@chromium.org> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kees Cook <kees@kernel.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Nam Cao <namcao@linutronix.de> Cc: oushixiong <oushixiong@kylinos.cn> Cc: Petr Mladek <pmladek@suse.com> Cc: Qianqiang Liu <qianqiang.liu@163.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Thorsten Blum <thorsten.blum@linux.dev> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:52 -07:00
Jinchao Wang	c6be36e299	panic/printk: replace this_cpu_in_panic() with panic_on_this_cpu() The helper this_cpu_in_panic() duplicated logic already provided by panic_on_this_cpu(). Remove this_cpu_in_panic() and switch all users to panic_on_this_cpu(). This simplifies the code and avoids having two helpers for the same check. Link: https://lkml.kernel.org/r/20250825022947.1596226-8-wangjinchao600@gmail.com Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Doug Anderson <dianders@chromium.org> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kees Cook <kees@kernel.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Nam Cao <namcao@linutronix.de> Cc: oushixiong <oushixiong@kylinos.cn> Cc: Petr Mladek <pmladek@suse.com> Cc: Qianqiang Liu <qianqiang.liu@163.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Thorsten Blum <thorsten.blum@linux.dev> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:52 -07:00
Jinchao Wang	2325e8eadf	printk/nbcon: use panic_on_this_cpu() helper nbcon_context_try_acquire() compared panic_cpu directly with smp_processor_id(). This open-coded check is now provided by panic_on_this_cpu(). Switch to panic_on_this_cpu() to simplify the code and improve readability. Link: https://lkml.kernel.org/r/20250825022947.1596226-7-wangjinchao600@gmail.com Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Doug Anderson <dianders@chromium.org> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kees Cook <kees@kernel.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Nam Cao <namcao@linutronix.de> Cc: oushixiong <oushixiong@kylinos.cn> Cc: Petr Mladek <pmladek@suse.com> Cc: Qianqiang Liu <qianqiang.liu@163.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Thorsten Blum <thorsten.blum@linux.dev> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:52 -07:00
Jinchao Wang	d0d9c72355	panic: introduce helper functions for panic state Patch series "panic: introduce panic status function family", v2. This series introduces a family of helper functions to manage panic state and updates existing code to use them. Before this series, panic state helpers were scattered and inconsistent. For example, panic_in_progress() was defined in printk/printk.c, not in panic.c or panic.h. As a result, developers had to look in unexpected places to understand or re-use panic state logic. Other checks were open- coded, duplicating logic across panic, crash, and watchdog paths. The new helpers centralize the functionality in panic.c/panic.h: - panic_try_start() - panic_reset() - panic_in_progress() - panic_on_this_cpu() - panic_on_other_cpu() Patches 1–8 add the helpers and convert panic/crash and printk/nbcon code to use them. Patch 9 fixes a bug in the watchdog subsystem by skipping checks when a panic is in progress, avoiding interference with the panic CPU. Together, this makes panic state handling simpler, more discoverable, and more robust. This patch (of 9): This patch introduces four new helper functions to abstract the management of the panic_cpu variable. These functions will be used in subsequent patches to refactor existing code. The direct use of panic_cpu can be error-prone and ambiguous, as it requires manual checks to determine which CPU is handling the panic. The new helpers clarify intent: panic_try_start(): Atomically sets the current CPU as the panicking CPU. panic_reset(): Reset panic_cpu to PANIC_CPU_INVALID. panic_in_progress(): Checks if a panic has been triggered. panic_on_this_cpu(): Returns true if the current CPU is the panic originator. panic_on_other_cpu(): Returns true if a panic is on another CPU. This change lays the groundwork for improved code readability and robustness in the panic handling subsystem. Link: https://lkml.kernel.org/r/20250825022947.1596226-1-wangjinchao600@gmail.com Link: https://lkml.kernel.org/r/20250825022947.1596226-2-wangjinchao600@gmail.com Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Doug Anderson <dianders@chromium.org> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kees Cook <kees@kernel.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Nam Cao <namcao@linutronix.de> Cc: oushixiong <oushixiong@kylinos.cn> Cc: Petr Mladek <pmladek@suse.com> Cc: Qianqiang Liu <qianqiang.liu@163.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Thorsten Blum <thorsten.blum@linux.dev> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>b Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 17:32:51 -07:00
Arnd Bergmann	bf42df09b6	printk: kunit: support offstack cpumask For large values of CONFIG_NR_CPUS, the newly added kunit test fails to build: kernel/printk/printk_ringbuffer_kunit_test.c: In function 'test_readerwriter': kernel/printk/printk_ringbuffer_kunit_test.c:279:1: error: the frame size of 1432 bytes is larger than 1280 bytes [-Werror=frame-larger-than=] Change this to use cpumask_var_t and allocate it dynamically when CONFIG_CPUMASK_OFFSTACK is set. The variable has to be released via a KUnit action wrapper so that it is freed when the test fails and gets aborted. The parameter type is hardcoded to "struct cpumask *" because the macro KUNIT_DEFINE_ACTION_WRAPPER() does not accept an array. But the function does nothing when CONFIG_CPUMASK_OFFSTACK is not set anyway. Fixes: `5ea2bcdfbf` ("printk: ringbuffer: Add KUnit test") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/all/20250620192554.2234184-1-arnd@kernel.org # v1 [pmladek@suse.com: Correctly handle allocation failures and freeing using KUnit test API.] Link: https://lore.kernel.org/all/20250702095157.110916-3-pmladek@suse.com # v2 Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-09-10 17:18:37 +02:00
Linus Torvalds	35a813e010	printk changes for 6.17 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmiQpykACgkQUqAMR0iA lPJcrg/9Hez6+zO7LECCn5VkuK5oJWR5CyCfwx14ki8UF38djQGU2frckI5837rE MnVoEBexZunK5SXy4MAy7bTCitzR+lMqNtP5uq9J2ovlSPtNlfuJRDr7uGQLDtSS M5KZ1qsZnhgwLYeNhfVVToHgp+OwIQb2GcgYmYc8k03fUI1NQpdxIM46DzoTj+06 x6qgrNsmmJbm8E73VWBByJAEFoq9ugjny8Rt+tYMi/CmhgZpp0ZyF1r5dYfYX/KS VS8UQY//aZOFhNsQUAXwP7Ym00CYRgTg7Na+MHivYLXmYGH2gF6tWQhX/eEgHKcJ RTmUbLFx70fdBbjJMxv2k8vyMk2sy6sTfJHPqM/NS/Fb0tSPBXQJG/EexzfoqiBc wcjgOPkeALIosVdFdTqXxjoIGOP8rqsU4t6Y6WFjJlWK04SBVjxBUofytRdQSxkG 5Sb0rFVGKrKIkXaVkt4byPa1/BDpfNhfKMYPtQ56pv2VNUgzfye4prUAZHE5pLnK 8nixeeMtKDFFCBpn6rG5wZW7k2mK5FrWGZUfdfxdK1gWQ1y0kqGy5wa3lNZLcxlH l3AtOYoDeWM2DjDVO6WCj8ambEWkbjbGg7tC9TI3F0NvRJSYytTb6npMqb3Gwhcb U4NgT+Ho0GJ/5BLUye8HMfhvrGoCfRCeptHtEFXAK7pzKyjc0+c= =Mocd -----END PGP SIGNATURE----- Merge tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add new "hash_pointers=[auto\|always\|never]" boot parameter to force the hashing even with "slab_debug" enabled - Allow to stop CPU, after losing nbcon console ownership during panic(), even without proper NMI - Allow to use the printk kthread immediately even for the 1st registered nbcon - Compiler warning removal * tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: nbcon: Allow reacquire during panic printk: Allow to use the printk kthread immediately even for 1st nbcon slab: Decouple slab_debug and no_hash_pointers vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format' compiler-gcc.h: Introduce __diag_GCC_all	2025-08-04 10:54:36 -07:00
Petr Mladek	3db488c8ed	Merge branch 'rework/fixes' into for-linus	2025-08-04 14:18:01 +02:00
Petr Mladek	d18d7989e3	printk: kunit: Fix __counted_by() in struct prbtest_rbdata __counted_by() has to point to a variable which defines the size of the related array. The code must never access the array beyond this limit. struct prbtest_rbdata currently stores the length of the string. And the code access the array beyond the limit when writing or reading the trailing '\0'. Store the size of the string, including the trailing '\0' if we wanted to keep __counted_by(). Consistently use "_size" suffix when the trailing '\0' is counted. Note that MAX_RBDATA_TEXT_SIZE was originally used to limit the text length. When touching the code, make sure that @text_size produced by get_random_u32_inclusive() stays within the limits. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/eaea66b9-266a-46e7-980d-33f40ad4b215@sabinyo.mountain Suggested-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250702095157.110916-4-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-07-10 17:12:21 +02:00
Petr Mladek	254e8fb5e6	printk: ringbuffer: Explain why the KUnit test ignores failed writes The KUnit test ignores prb_reserve() failures on purpose. It tries to push the ringbuffer beyond limits. Note that it is a know problem that writes might fail in this situation. printk() tries to prevent this problem by: + allocating big enough data buffer, see log_buf_add_cpu(). + allocating enough descriptors by using small enough average record, see PRB_AVGBITS. + storing the record with disabled interrupts, see vprintk_store(). Also the amount of printk() messages is always somehow bound in practice. And they are serialized when they are printed from many CPUs on purpose, for example, when printing backtraces. Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250702095157.110916-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-07-10 17:11:57 +02:00
Nam Cao	0af3ecdde5	printk: Make vprintk_deferred() public vprintk_deferred() is useful for implementing runtime verification reactors. Make it public. Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-09 15:27:00 -04:00
Thomas Weißschuh	5ea2bcdfbf	printk: ringbuffer: Add KUnit test The KUnit test validates the correct operation of the ringbuffer. A separate dedicated ringbuffer is used so that the global printk ringbuffer is not touched. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/20250612-printk-ringbuffer-test-v3-1-550c088ee368@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-06-18 16:42:42 +02:00
John Ogness	571c1ea91a	printk: nbcon: Allow reacquire during panic If a console printer is interrupted during panic, it will never be able to reacquire ownership in order to perform and cleanup. That in itself is not a problem, since the non-panic CPU will simply quiesce in an endless loop within nbcon_reacquire_nobuf(). However, in this state, platforms that do not support a true NMI to interrupt the quiesced CPU will not be able to shutdown that CPU from within panic(). This then causes problems for such as being unable to load and run a kdump kernel. Fix this by allowing non-panic CPUs to reacquire ownership using a direct acquire. Then the non-panic CPUs can successfullyl exit the nbcon_reacquire_nobuf() loop and the console driver can perform any necessary cleanup. But more importantly, the CPU is no longer quiesced and is free to process any interrupts necessary for panic() to shutdown the CPU. All other forms of acquire are still not allowed for non-panic CPUs since it is safer to have them avoid gaining console ownership that is not strictly necessary. Reported-by: Michael Kelley <mhklinux@outlook.com> Closes: https://lore.kernel.org/r/SN6PR02MB4157A4C5E8CB219A75263A17D46DA@SN6PR02MB4157.namprd02.prod.outlook.com Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250606185549.900611-1-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-06-17 12:10:29 +02:00
Petr Mladek	cf55438701	printk: Allow to use the printk kthread immediately even for 1st nbcon The kthreads for nbcon consoles are created by nbcon_alloc() at the beginning of the console registration. But it currently works only for the 2nd or later nbcon console because the code checks @printk_kthreads_running. The kthread for the 1st registered nbcon console is created at the very end of register_console() by printk_kthreads_check_locked(). As a result, the entire log is replayed synchronously when the "enabled" message gets printed. It might block the boot for a long time with a slow serial console. Prevent the synchronous flush by creating the kthread even for the 1st nbcon console when it is safe (kthreads ready and no boot consoles). Also inform printk() to use the kthread by setting @printk_kthreads_running. Note that the kthreads already must be running when it is safe and this is not the 1st nbcon console. Symmetrically, clear @printk_kthreads_running when the last nbcon console was unregistered by nbcon_free(). This requires updating @have_nbcon_console before nbcon_free() gets called. Note that there is _no_ problem when the 1st nbcon console replaces boot consoles. In this case, the kthread will be started at the end of registration after the boot consoles are removed. But the console does not reply the entire log buffer in this case. Note that the flag CON_PRINTBUFFER is always cleared when the boot consoles are removed and vice versa. Closes: https://lore.kernel.org/r/20250514173514.2117832-1-mcobb@thegoodpenguin.co.uk Tested-by: Michael Cobb <mcobb@thegoodpenguin.co.uk> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20250604142045.253301-1-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-06-09 16:44:01 +02:00
Linus Torvalds	96050814a3	printk changes for 6.15 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmflJC4ACgkQUqAMR0iA lPIEMBAAsIJmQPHdZUQ49tkMGMoQPU9gnrrDZbRzN9IWwMQQymwEobSEkJkrx7O3 MFQHc5Z29cAxDqrWDCG5hrH0P3rd994pLPgC/XZC9+UbksYIkIrudCqaPaekLVaw BQmBZnZRe7PltOiDpx/TxQ8WQgNlAmZpLpIRTB3ePHZpikq/mf4CGcoYXkrNij1j E7uzurLZssJwXasRkWOVNvaOFyVcxrjn78H0JoT7mVeh2IUkWyMALX4IyDeD6Uvh SCe19/6W3Uyeh/jpJ6+gjuirVy/Vam/2GXeer4yKpgSQBtsqIaGyki4Cc6Xy12gU 71nRPTggysZXNnqZ40qw68Irb9JGAo63qFDwWf3OQI+yQQWHyGZIinxr+2xupwWF ZgmDJtvUsWKHT4+WmB9+MVTGW7Jy8wKq2u/Haf8xCJxs2yhI0J8iUvQi3CPAMdRH Q8G9UJP29Lfj0hdKVT2/wVNGxZKIZ8c6v1vR6Rd1hJ0Zqp4Oj+XGr0btE8Ah5LYv GI/M00LCdTNT2kBbbw8lekovUIs2stiRl6hfKoQFR11oiZwFDUePLDdJH2GiJAZm g9fPs6YuBERfPtzUeo9a9UbSd6oPTLgLNBEklpHUTWilzO8E/Jh+ZXyvQwj7eXyq Hi4OnShn1uxL68o6DJfKsF2qW8S2U5g61JrU3D1CaQ98Derrp0o= =lNTZ -----END PGP SIGNATURE----- Merge tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - New option "printk.debug_non_panic_cpus" allows to store printk messages from non-panic CPUs during panic. It might be useful when panic() fails. It is disabled by default because it increases the chance to see the messages printed before panic() and on the panic-CPU. - New build option "CONFIG_NULL_TTY_DEFAULT_CONSOLE" allows to build kernel without the virtual terminal support which prefers ttynull over serial console. - Do not unblank suspended consoles. - Some code clean up. * tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/panic: Add option to allow non-panic CPUs to write to the ring buffer. printk: Add an option to allow ttynull to be a default console device printk: Check CON_SUSPEND when unblanking a console printk: Rename console_start to console_resume printk: Rename console_stop to console_suspend printk: Rename resume_console to console_resume_all printk: Rename suspend_console to console_suspend_all	2025-03-27 19:22:24 -07:00
Petr Mladek	f49040c7aa	Merge branch 'for-6.15-console-suspend-api-cleanup' into for-linus	2025-03-27 11:09:34 +01:00

1 2 3 4 5 ...

651 Commits (master)