Fix undefined behavior caused by shifting a 32-bit integer by 32 bits
during decompression. This prevents potential kernel decompression
failures or corruption when parsing malicious or malformed bzip2 archives.
Link: https://lkml.kernel.org/r/20260308165012.2872633-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2_unlink takes orphan dir inode_lock first and then ip_alloc_sem,
while in ocfs2_dio_end_io_write, it acquires these locks in reverse order.
This creates an ABBA lock ordering violation on lock classes
ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE] and
ocfs2_file_ip_alloc_sem_key.
Lock Chain #0 (orphan dir inode_lock -> ip_alloc_sem):
ocfs2_unlink
ocfs2_prepare_orphan_dir
ocfs2_lookup_lock_orphan_dir
inode_lock(orphan_dir_inode) <- lock A
__ocfs2_prepare_orphan_dir
ocfs2_prepare_dir_for_insert
ocfs2_extend_dir
ocfs2_expand_inline_dir
down_write(&oi->ip_alloc_sem) <- Lock B
Lock Chain #1 (ip_alloc_sem -> orphan dir inode_lock):
ocfs2_dio_end_io_write
down_write(&oi->ip_alloc_sem) <- Lock B
ocfs2_del_inode_from_orphan()
inode_lock(orphan_dir_inode) <- Lock A
Deadlock Scenario:
CPU0 (unlink) CPU1 (dio_end_io_write)
------ ------
inode_lock(orphan_dir_inode)
down_write(ip_alloc_sem)
down_write(ip_alloc_sem)
inode_lock(orphan_dir_inode)
Since ip_alloc_sem is to protect allocation changes, which is unrelated
with operations in ocfs2_del_inode_from_orphan. So move
ocfs2_del_inode_from_orphan out of ip_alloc_sem to fix the deadlock.
Link: https://lkml.kernel.org/r/20260306032211.1016452-1-joseph.qi@linux.alibaba.com
Reported-by: syzbot+67b90111784a3eac8c04@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=67b90111784a3eac8c04
Fixes: a86a72a4a4 ("ocfs2: take ip_alloc_sem in ocfs2_dio_get_block & ocfs2_dio_end_io_write")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove the unnecessary initialization of 'rcu' to false in
report_bug_entry() and report_bug(), as it is assigned by warn_rcu_enter()
before its first use.
Link: https://lkml.kernel.org/r/20260306162418.2815979-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use lowercase "kernel BUG" consistently in pr_crit() messages. The
verbose path already uses "kernel BUG at %s:%u!" but the non-verbose
fallback uses "Kernel BUG" with an uppercase 'K'.
Link: https://lkml.kernel.org/r/20260306162327.2815553-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix "This results of this trade" to "The results of this trade" in the
comment describing the lbits and dbits tuning parameters.
Link: https://lkml.kernel.org/r/20260306161732.2812132-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix "all variable" to "all variables" in the file header comment.
Link: https://lkml.kernel.org/r/20260306161707.2812005-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When inflate_codes() fails in inflate_dynamic(), the code jumps to the
'out' label which only frees 'll', leaking the Huffman tables 'tl' and
'td'. Restructure the code so that the decoding tables are always freed
before reaching the 'out' label.
Link: https://lkml.kernel.org/r/20260306161647.2811874-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When inflate_codes() fails in inflate_fixed(), only the length list 'l' is
freed, but the Huffman tables 'tl' and 'td' are leaked. Add the missing
huft_free() calls on the error path.
Link: https://lkml.kernel.org/r/20260306161612.2811703-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix a typo in __uuid_gen_common() where "reversion" (meaning to revert)
was used instead of "revision" when describing the UUID variant field.
Link: https://lkml.kernel.org/r/20260306161250.2811500-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit 581ee79a25 ("scripts/gdb/symbols: make BPF debug info available
to GDB") added support to make BPF debug information available to GDB.
However, the argument handling loop was slightly broken, causing it to
fail if further modules were passed. Fix it to append these passed
modules to the instance variable after expansion.
Link: https://lkml.kernel.org/r/20260304110642.2020614-2-benjamin@sipsolutions.net
Fixes: 581ee79a25 ("scripts/gdb/symbols: make BPF debug info available to GDB")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, the hung task reporting mechanism indiscriminately labels all
TASK_UNINTERRUPTIBLE (D) tasks as "blocked", irrespective of whether they
are awaiting I/O completion or kernel locking primitives. This ambiguity
compels system administrators to manually inspect stack traces to discern
whether the delay stems from an I/O wait (typically indicative of hardware
or filesystem anomalies) or software contention. Such detailed analysis
is not always immediately accessible to system administrators or support
engineers.
To address this, this patch utilises the existing in_iowait field within
struct task_struct to augment the failure report. If the task is blocked
due to I/O (e.g., via io_schedule_prepare()), the log message is updated
to explicitly state "blocked in I/O wait".
Examples:
- Standard Block: "INFO: task bash:123 blocked for more than 120
seconds".
- I/O Block: "INFO: task dd:456 blocked in I/O wait for more than
120 seconds".
Theoretically, concurrent executions of io_schedule_finish() could result
in a race condition where the read flag does not precisely correlate with
the subsequently printed backtrace. However, this limitation is deemed
acceptable in practice. The entire reporting mechanism is inherently racy
by design; nevertheless, it remains highly reliable in the vast majority
of cases, particularly because it primarily captures protracted stalls.
Consequently, introducing additional synchronisation to mitigate this
minor inaccuracy would be entirely disproportionate to the situation.
Link: https://lkml.kernel.org/r/20260303221324.4106917-1-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A recent change allowed to reset the global counter of hung tasks using
the sysctl interface. A potential race with the regular check has been
solved by updating the global counter only once at the end of the check.
However, the hung task check can take a significant amount of time,
particularly when task information is being dumped to slow serial
consoles. Some users monitor this global counter to trigger immediate
migration of critical containers. Delaying the increment until the full
check completes postpones these high-priority rescue operations.
Update the global counter as soon as a hung task is detected. Since the
value is read asynchronously, a relaxed atomic operation is sufficient.
Link: https://lkml.kernel.org/r/20260303203031.4097316-4-atomlin@atomlin.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reported-by: Lance Yang <lance.yang@linux.dev>
Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot. In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset once
an incident has been resolved. Furthermore, the previous implementation
relied upon implicit ordering, which could not strictly guarantee that
diagnostic metadata published by one CPU was visible to the panic logic on
another.
This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
has been updated to validate this input and atomically reset the counter.
The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace. The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:
1. Prevention of Load-Store Reordering via Acquire Semantics By
utilising atomic_long_read_acquire() to snapshot the counter
before initiating the task traversal, we establish a strict
memory barrier. This prevents the compiler or hardware from
reordering the initial load to a point later in the scan. Without
this "acquire" barrier, a delayed load could potentially read a
"0" value resulting from a userspace reset that occurred
mid-scan. This would lead to the subsequent cmpxchg succeeding
erroneously, thereby overwriting the user's reset with stale
increment data.
2. Atomicity of the "Commit" Phase via Release Semantics The
atomic_long_cmpxchg_release() serves as the transaction's commit
point. The "release" barrier ensures that all diagnostic
recordings and task-state observations made during the scan are
globally visible before the counter is incremented.
3. Race Condition Resolution This pairing effectively detects any
"out-of-band" reset of the counter. If
sysctl_hung_task_detect_count is modified via the procfs
interface during the scan, the final cmpxchg will detect the
discrepancy between the current value and the "acquire" snapshot.
Consequently, the update will fail, ensuring that a reset command
from the administrator is prioritised over a scan that may have
been invalidated by that very reset.
Link: https://lkml.kernel.org/r/20260303203031.4097316-3-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Joel Granados <joel.granados@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "hung_task: Provide runtime reset interface for hung task
detector", v9.
This series introduces the ability to reset
/proc/sys/kernel/hung_task_detect_count.
Writing a "0" value to this file atomically resets the counter of detected
hung tasks. This functionality provides system administrators with the
means to clear the cumulative diagnostic history following incident
resolution, thereby simplifying subsequent monitoring without
necessitating a system restart.
This patch (of 3):
The check_hung_task() function currently conflates two distinct
responsibilities: validating whether a task is hung and handling the
subsequent reporting (printing warnings, triggering panics, or
tracepoints).
This patch refactors the logic by introducing hung_task_info(), a function
dedicated solely to reporting. The actual detection check,
task_is_hung(), is hoisted into the primary loop within
check_hung_uninterruptible_tasks(). This separation clearly decouples the
mechanism of detection from the policy of reporting.
Furthermore, to facilitate future support for concurrent hung task
detection, the global sysctl_hung_task_detect_count variable is converted
from unsigned long to atomic_long_t. Consequently, the counting logic is
updated to accumulate the number of hung tasks locally (this_round_count)
during the iteration. The global counter is then updated atomically via
atomic_long_cmpxchg_relaxed() once the loop concludes, rather than
incrementally during the scan.
These changes are strictly preparatory and introduce no functional change
to the system's runtime behaviour.
Link: https://lkml.kernel.org/r/20260303203031.4097316-1-atomlin@atomlin.com
Link: https://lkml.kernel.org/r/20260303203031.4097316-2-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Joel Granados <joel.granados@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The Baikal SoC and platform support was dropped from the kernel, remove
the reference to non-exist file. While at it, fix spelling.
Link: https://lkml.kernel.org/r/20260302092831.2267785-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Update header inclusions to follow IWYU (Include What You Use) principle.
Link: https://lkml.kernel.org/r/20260302092831.2267785-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "lib: polynomial: Move to math/ and clean up", v2.
While removing Baikal SoC and platform code pieces I found that this code
belongs to lib/math/ rather than generic lib/. Hence the move and
followed up cleanups.
This patch (of 3):
The algorithm behind polynomial belongs to our collection of math
equations and expressions handling. Move it to math/ subfolder where
others of the kind are located.
Link: https://lkml.kernel.org/r/20260302092831.2267785-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
syzbot detected a circular locking dependency. the scenarios:
CPU0 CPU1
---- ----
lock(&ocfs2_quota_ip_alloc_sem_key);
lock(&ocfs2_sysfile_lock_key[USER_QUOTA_SYSTEM_INODE]);
lock(&ocfs2_quota_ip_alloc_sem_key);
lock(&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]);
or:
CPU0 CPU1
---- ----
lock(&ocfs2_quota_ip_alloc_sem_key);
lock(&dquot->dq_lock);
lock(&ocfs2_quota_ip_alloc_sem_key);
lock(&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]);
Following are the code paths for above scenarios:
path_openat
ocfs2_create
ocfs2_mknod
+ ocfs2_reserve_new_inode
| ocfs2_reserve_suballoc_bits
| inode_lock(alloc_inode) //C0: hold INODE_ALLOC_SYSTEM_INODE
| //ocfs2_free_alloc_context(inode_ac) is called at the end of
| //caller ocfs2_mknod to handle the release
|
+ ocfs2_get_init_inode
__dquot_initialize
dqget
ocfs2_acquire_dquot
+ ocfs2_lock_global_qf
| down_write(&OCFS2_I(oinfo->dqi_gqinode)->ip_alloc_sem)//A2:grabbing
+ ocfs2_create_local_dquot
down_write(&OCFS2_I(lqinode)->ip_alloc_sem)//A3:grabbing
evict
ocfs2_evict_inode
ocfs2_delete_inode
ocfs2_wipe_inode
+ inode_lock(orphan_dir_inode) //B0:hold
+ ...
+ ocfs2_remove_inode
inode_lock(inode_alloc_inode) //INODE_ALLOC_SYSTEM_INODE
down_write(&inode->i_rwsem) //C1:grabbing
generic_file_direct_write
ocfs2_direct_IO
__blockdev_direct_IO
dio_complete
ocfs2_dio_end_io
ocfs2_dio_end_io_write
+ down_write(&oi->ip_alloc_sem) //A0:hold
+ ocfs2_del_inode_from_orphan
inode_lock(orphan_dir_inode) //B1:grabbing
Root cause for the circular locking:
DIO completion path:
holds oi->ip_alloc_sem and is trying to acquire the orphan_dir_inode lock.
evict path:
holds the orphan_dir_inode lock and is trying to acquire the
inode_alloc_inode lock.
ocfs2_mknod path:
Holds the inode_alloc_inode lock (to allocate a new quota file) and is
blocked waiting for oi->ip_alloc_sem in ocfs2_acquire_dquot().
How to fix:
Replace down_write() with down_write_trylock() in ocfs2_acquire_dquot().
If acquiring oi->ip_alloc_sem fails, return -EBUSY to abort the file
creation routine and break the deadlock.
Link: https://lkml.kernel.org/r/20260302061707.7092-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reported-by: syzbot+78359d5fbb04318c35e9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=78359d5fbb04318c35e9
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add support for the ** glob operator in MAINTAINERS F: and X: patterns,
matching any number of path components (like Python's ** glob).
The existing * to .* conversion with slash-count check is preserved. **
is converted to (?:.*), a non-capturing group used as a marker to bypass
the slash-count check in file_match_pattern(), allowing the pattern to
cross directory boundaries.
This enables patterns like F: **/*[_-]kunit*.c to match files at any depth
in the tree.
Link: https://lkml.kernel.org/r/20260302103822.77343-1-teknoraver@meta.com
Signed-off-by: Matteo Croce <teknoraver@meta.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace sprintf() with sysfs_emit() in sysfs show functions. sysfs_emit()
is preferred for formatting sysfs output because it provides safer bounds
checking. No functional changes.
Link: https://lkml.kernel.org/r/20260301125106.911980-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace the implicit 'bool' to 'int' conversion with an explicit ternary
operator. This makes the pointer arithmetic clearer and avoids relying on
boolean memory representation for logic flow.
Link: https://lkml.kernel.org/r/20260301203845.2617217-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Using bitwise OR (|=) on a boolean variable is valid C, but replacing it
with a direct logical assignment makes the intent clearer and appeases
strict static analysis tools.
Link: https://lkml.kernel.org/r/20260301152143.2572137-2-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Include <linux/export.h> explicitly instead of relying on it being
implicitly included by <linux/module.h> for the EXPORT_SYMBOL macro.
Link: https://lkml.kernel.org/r/20260301152143.2572137-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix a missing article ('a') in the comment describing the glob
implementation, and replace 'blacklists' with 'denylists' to align with
the kernel's inclusive terminology guidelines.
Link: https://lkml.kernel.org/r/20260301154553.2592681-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The fchmodat2 test program open codes a version of ksft_finished(), use
the standard version.
Link: https://lkml.kernel.org/r/20260226-selftests-fchmodat2-v4-2-a6419435f2e8@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Alexey Gladkov <legion@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/fchmodat2: Error handling and general", v4.
I looked at the fchmodat2() tests since I've been experiencing some random
intermittent segfaults with them in my test systems, while doing so I
noticed these two issues. Unfortunately I didn't figure out the original
yet, unless I managed to fix it unwittingly.
This patch (of 2):
The fchmodat2() test program creates a temporary directory with a file and
a symlink for every test it runs but never cleans these up, resulting in
${TMPDIR} getting left with stale files after every run. Restructure the
program a bit to ensure that we clean these up, this is more invasive than
it might otherwise be due to the extensive use of ksft_exit_fail_msg() in
the program.
As a side effect this also ensures that we report a consistent test name
for the tests and always try both tests even if they are skipped.
Link: https://lkml.kernel.org/r/20260226-selftests-fchmodat2-v4-0-a6419435f2e8@kernel.org
Link: https://lkml.kernel.org/r/20260226-selftests-fchmodat2-v4-1-a6419435f2e8@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Alexey Gladkov <legion@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add the missing dual MIT/GPL license identifier to glob.c.
Link: https://lkml.kernel.org/r/20260228195300.2468310-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Both copy_process() and alloc_pid() do the same PIDNS_ADDING check. The
reasons for these checks, and the fact that both are necessary, are not
immediately obvious. Add the comments.
Link: https://lkml.kernel.org/r/aaGIRElc78U4Er42@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "pid: make sub-init creation retryable".
This patch (of 2):
Currently we allow only one attempt to create init in a new namespace. If
the first fork() fails after alloc_pid() succeeds, free_pid() clears
PIDNS_ADDING and thus disables further PID allocations.
Nowadays this looks like an unnecessary limitation. The original reason
to handle "case PIDNS_ADDING" in free_pid() is gone, most probably after
commit 69879c01a0 ("proc: Remove the now unnecessary internal mount of
proc").
Change free_pid() to keep ns->pid_allocated == PIDNS_ADDING, and change
alloc_pid() to reset the cursor early, right after taking pidmap_lock.
Test-case:
#define _GNU_SOURCE
#include <linux/sched.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <assert.h>
#include <sched.h>
#include <errno.h>
int main(void)
{
struct clone_args args = {
.exit_signal = SIGCHLD,
.flags = CLONE_PIDFD,
.pidfd = 0,
};
unsigned long pidfd;
int pid;
assert(unshare(CLONE_NEWPID) == 0);
pid = syscall(__NR_clone3, &args, sizeof(args));
assert(pid == -1 && errno == EFAULT);
args.pidfd = (unsigned long)&pidfd;
pid = syscall(__NR_clone3, &args, sizeof(args));
if (pid)
assert(pid > 0 && wait(NULL) == pid);
else
assert(getpid() == 1);
return 0;
}
Link: https://lkml.kernel.org/r/aaGHu3ixbw9Y7kFj@redhat.com
Link: https://lkml.kernel.org/r/aaGIHa7vGdwhEc_D@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The function read_key_from_user_keying() is missing an 'r' in its name.
Fix the typo by renaming it to read_key_from_user_keyring().
Link: https://lkml.kernel.org/r/20260227230422.859423-1-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
'key_count' is an 'unsigned int' and cannot be less than zero. Remove
the redundant condition.
Link: https://lkml.kernel.org/r/20260228085136.861971-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace simple_strtoul() with the recommended kstrtoul() for parsing the
'coredump_filter=' boot parameter.
Check the return value of kstrtoul() and reject invalid values. This adds
error handling while preserving behavior for existing values, and removes
use of the deprecated simple_strtoul() helper. The current code silently
sets 'default_dump_filter = 0' if parsing fails, instead of leaving the
default value (MMF_DUMP_FILTER_DEFAULT) unchanged.
Rename the static variable 'default_dump_filter' to 'coredump_filter'
since it does not necessarily contain the default value and the current
name can be misleading.
Link: https://lkml.kernel.org/r/20251215142152.4082-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The "(signal->core_state || !(signal->flags & SIGNAL_GROUP_EXIT))" check
in complete_signal() is not obvious at all, and in fact it only adds
unnecessary confusion: this condition is always true.
prepare_signal() does:
if (signal->flags & SIGNAL_GROUP_EXIT) {
if (signal->core_state)
return sig == SIGKILL;
/*
* The process is in the middle of dying, drop the signal.
*/
return false;
}
This means that "!signal->core_state && (signal->flags &
SIGNAL_GROUP_EXIT)" in complete_signal() is never possible.
If SIGNAL_GROUP_EXIT is set, prepare_signal() can only return true if
signal->core_state is not NULL.
Link: https://lkml.kernel.org/r/aZsfkDhnqJ4s1oTs@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc; Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
thread_group_empty(tsk) is only possible if tsk is a group leader, and
thread_group_empty() already does the thread_group_leader() check.
So it makes no sense to check "thread_group_leader() &&
thread_group_empty()"; thread_group_empty() alone is enough.
Link: https://lkml.kernel.org/r/aZsfeegKZPZZszJh@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc; Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
msgque kselftest uses msgrcv(..., MSG_COPY) to copy messages. When the
kernel is built without CONFIG_CHECKPOINT_RESTORE, prepare_copy() is
stubbed out and msgrcv() returns -ENOSYS. The test currently reports this
as a failure even though it is simply a missing feature/configuration.
Skip the test when msgrcv() fails with ENOSYS.
Link: https://lkml.kernel.org/r/20260210135359.178636-1-jouyeol8739@gmail.com
Signed-off-by: UYeol Jo <jouyeol8739@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Easier to add new entries. It was sorted when added in 66b47b4a9d, but
later got wrong order for few entries. Sorted with en_US.UTF-8 locale.
Link: https://lkml.kernel.org/r/20260212144005.45052-1-pvorel@suse.cz
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Cc: Jonathan Camerom <Jonathan.Cameron@huawei.com>
Cc: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
However there's a convention of assuming that __init-time allocations
cannot fail. Because if a kmalloc() were to fail at this time, the kernel
is hopelessly messed up anyway. So simply panic() if that kmalloc failed,
then make that 350-byte buffer __initdata.
Link: https://lkml.kernel.org/r/20260223035914.4033-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The buffer used to hold the taint string is statically allocated, which
requires updating whenever a new taint flag is added.
Instead, allocate the exact required length at boot once the allocator is
available in an init function. The allocation sums the string lengths in
taint_flags[], along with space for separators and formatting.
print_tainted() is switched to use this dynamically allocated buffer.
If allocation fails, print_tainted() warns about the failure and continues
to use the original static buffer as a fallback.
Link: https://lkml.kernel.org/r/20260222140804.22225-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The verbose 'Tainted: ...' string in print_tainted_seq can total to 327
characters while the buffer defined in _print_tainted is 320 bytes.
Increase its size to 350 characters to hold all flags, along with some
headroom.
[akpm@linux-foundation.org: fix spello, add comment]
Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The output of bloat-o-meter already uses the words 'old' and 'new' for
symbol size in the table header, so reflect that in the corresponding
argument names.
Link: https://lkml.kernel.org/r/20260212213941.3984330-1-vkoskiv@gmail.com
Signed-off-by: Valtteri Koskivuori <vkoskiv@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The correct passive of "to bind" is "bound", not "binded". This is often
used in the context of the BSD socket bind(2) operation.
Link: https://lkml.kernel.org/r/20260214140854.42247-1-gnoack3000@gmail.com
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
task_sig() already wraps the SigQ rlimit read in an explicit RCU read-side
critical section. Drop the stale FIXME comment and keep using
task_ucounts() for the ucounts access.
No functional change.
Link: https://lkml.kernel.org/r/20260215124511.14227-1-jaime.saguillo@gmail.com
Signed-off-by: Jaime Saguillo Revilla <jaime.saguillo@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Clear the pending exception state from a vcpu coming out of
reset, as it could otherwise affect the first instruction
executed in the guest.
- Fix pointer arithmetic in address translation emulation, so that the
Hardware Access bit is set on the correct PTE instead of some other
location.
s390:
- Fix deadlock in new memory management.
- Properly handle kernel faults on donated memory.
- Fix bounds checking for irq routing, with selftest.
- Fix invalid machine checks and log all of them.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnCvYAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO6Kgf9EobmY4mWv8EmGiLQqsdtShzaSDa+
cvzoWT8OEECFlZzWLCn/9/FiF68IkNfoV5ad79+2vJ5um+ZlJtkjtq6z8EbvhBBZ
/QppVas+gmqhctuR41GnDxSKReXNEKIfQ1qwxAEujriui4FEpHAza+yRQ8jHJCCN
LpcwO7dubHWe+HJewF0t7P6MN76Ln6EJWS2tu/zQUBpKKAvLHkm2EHk38X+vwGlN
Lip9tcCYgzZXKdHZgTKKm45Te0ijpi/gxZ0j0kn6FNBkY8PIbtwlB2Hl8H6J5jP1
q+0dLlzFiAK5ww9Wrf5/LAt9vFcZKyOTY1y3ADEvdfLLwVBNdhaZ318Myw==
=Zd43
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Clear the pending exception state from a vcpu coming out of reset,
as it could otherwise affect the first instruction executed in the
guest
- Fix pointer arithmetic in address translation emulation, so that
the Hardware Access bit is set on the correct PTE instead of some
other location
s390:
- Fix deadlock in new memory management
- Properly handle kernel faults on donated memory
- Fix bounds checking for irq routing, with selftest
- Fix invalid machine checks and log all of them"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: Fix the descriptor address in __kvm_at_swap_desc()
KVM: s390: vsie: Avoid injecting machine check on signal
KVM: s390: log machine checks more aggressively
KVM: s390: selftests: Add IRQ routing address offset tests
KVM: s390: Limit adapter indicator access to mapped page
s390/mm: Add missing secure storage access fixups for donated memory
KVM: arm64: Discard PC update state on vcpu reset
KVM: s390: Fix a deadlock
cxl: Adjust the startup priority of cxl_pmem to be higher than that of cxl_acpi
cxl/mbox: Use proper endpoint validity check upon sanitize
cxl/hdm: Avoid incorrect DVSEC fallback when HDM decoders are enabled
cxl/acpi: Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch
cxl/region: Fix leakage in __construct_region()
cxl/port: Fix use after free of parent_port in cxl_detach_ep()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmnCwV8ACgkQYGjFFmlT
OEo3cQ/8DoL1ibjflJ5cuVbqqiqS1JprjIWHYQ28ve/S1bqawcnQA/eP/b75/FJJ
/9OkVjWPOKFZszWt/oPHoJvVuwlfnPDDcHcOKXZ+jpc5jy5qRYFWVblCerlwk3Ie
PIM1EZA0UwQg8Hwn4+z4g5DZbx6rDQKdhE9Wt6q1jbx46enazPlQU7POpZWalIiA
dhlntUuZxOv/gDF8BXZYI7W19vVI7X+BzFbG1Y+d07wW75cio0U8z2uMtCUDVgIk
e9Kea/Na7WyY7RcBMyfbZj2suXZrfTw6Nqgzw9Gn3hHTdjojc3IBDYi6OyDAu4XS
92sYjt+aeV1cWplS7Wpwo/z4/hbI6Q4s4Z0rQt4A2P4UHGzhG/sbo23pRXSxC7dV
mLM3bx9U4bGAA8pG2P5gqQBS453i1Mb6wUGyTrm7oNRTYeZh2tHQIEWMZtoCHO4A
h8VFqqJ0v3EkZtA4I2C8rcmFw5kOqnD5BkuiXzWdW6WZv4fyCiSB482LQkAxhDkN
zHd/4peHDK4d8DcyhH+OA5zZzrPQrBwXDO0VhNM+cLK224RrpiFaRwrBIHc09mZp
UpzrqFgL1FzXmJmLws7dMnIk1EhNyz2P01L0xjf4LN7pwZofonpbX+4MegP/2lyg
fN/WKCx0/rhNoKtsUWhVofyeiDcWQW1rnG6vnU6rQmOGE9765ss=
=ct8z
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang:
- Adjust the startup priority of cxl_pmem to be higher than that of
cxl_acpi
- Use proper endpoint validity check upon sanitize
- Avoid incorrect DVSEC fallback when HDM decoders are enabled
- Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch
- Fix leakage in __construct_region()
- Fix use after free of parent_port in cxl_detach_ep()
* tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Adjust the startup priority of cxl_pmem to be higher than that of cxl_acpi
cxl/mbox: Use proper endpoint validity check upon sanitize
cxl/hdm: Avoid incorrect DVSEC fallback when HDM decoders are enabled
cxl/acpi: Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch
cxl/region: Fix leakage in __construct_region()
cxl/port: Fix use after free of parent_port in cxl_detach_ep()
- Clear the pending exception state from a vcpu coming out of
reset, as it could otherwise affect the first instruction
executed in the guest.
- Fix the address translation emulation icode to set the Hardware
Access bit on the correct PTE instead of some other location.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmm9E6IACgkQI9DQutE9
ekNYaw//SXvrK0FCGS87qoy/32p361f3BiUVEUQtTYJsrPL8Rm+92Gpni6VGShD2
2r7/QBZzk4oKIRUOVz4Yp4mo31DSm8uFelLLlZPEODHHnKNbhtaZ4kqmpVKQ7O11
PSCESpCcmPQgGorshdFOZ+A0+5heLI3lw0MprNwG/EjI+7w/sTBUiA+ooUoGQ/Sj
zLq3ZPZfFxQyeXBeTq9oigu4GRjlz5spzj9zpZ+51ilVa35wE+0nWgPOgxssZ1yM
VhKLksdxUMDy5f2C5DuWWkThyDBGRaCobSQB4/H8EynsKSZ2gdfVvJFapOUMMuld
o5/8rM/JAxN66Y8tA0UcNSv9CbeROwQ3VWf/u4FCF6TuwHLLY3qZvmQd5+tn39gb
gLjagJrS5Cq7iiykBMjeAJ+n3sRpuy47gRj278eyqd+1Sx/YiKAm2bXJw+q2Rnmf
+mEPANuDNL4MKLoHKdZtqXDw7RSCEnfD7ctGpsuKQJr08VZagbr6RsGsMV/KwNUv
K+VcJPSwV8SHnqxcANpHfXh0795miAMPd424ftKjvnwEOdln8EBHrqOgEjdm6zNV
qmqvAsbMbKCGmrvXKL6H8wfhB2cv3TMWTPLuedjrL0ITY/qxT6TUQnbEA75AMoz2
5TBDRf6ciYxxwM962ASNrBCn/xgOizGMWn85+SdOYLHWAyJlTVM=
=bU28
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.0, take #4
- Clear the pending exception state from a vcpu coming out of
reset, as it could otherwise affect the first instruction
executed in the guest.
- Fix the address translation emulation icode to set the Hardware
Access bit on the correct PTE instead of some other location.
All are singletons - please see the changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCacHhYgAKCRDdBJ7gKXxA
jqnKAPwLWHOazW6WB43goV605aA42anjBRm8kHg7E36X53OUJgD+L8KV2IXeDzGE
cFe9TxqtdhYg6/JRiwEE5eDYT/uWaAk=
=G0UF
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"6 hotfixes. 2 are cc:stable. All are for MM.
All are singletons - please see the changelogs for details"
* tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/stat: monitor all System RAM resources
mm/zswap: add missing kunmap_local()
mailmap: update email address for Muhammad Usama Anjum
zram: do not slot_free() written-back slots
mm/damon/core: avoid use of half-online-committed context
mm/rmap: clear vma->anon_vma on error