mirror-linux

Commit Graph

Author	SHA1	Message	Date
Huacai Chen	0a91336e28	Merge tag 'bpf-next-6.17' into loongarch-next LoongArch architecture changes for 6.17 have many bpf features such as trampoline, so merge 'bpf-next-6.17' to create a base to make bpf work well.	2025-08-02 11:07:06 +08:00
KaFai Wan	863aab3d4d	bpf: Add log for attaching tracing programs to functions in deny list Show the rejected function name when attaching tracing programs to functions in deny list. With this change, we know why tracing programs can't attach to functions like __rcu_read_lock() from log. $ ./fentry libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG -- Attaching tracing programs to function '__rcu_read_lock' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 19:39:29 -07:00
KaFai Wan	a5a6b29a70	bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions With this change, we know the precise rejected function name when attaching fexit/fmod_ret to __noreturn functions from log. $ ./fexit libbpf: prog 'fexit': BPF program load failed: -EINVAL libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG -- Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 19:39:29 -07:00
Suchit Karunakaran	5b4c54ac49	bpf: Fix various typos in verifier.c comments This patch fixes several minor typos in comments within the BPF verifier. No changes in functionality. Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 10:02:57 -07:00
Paul Chaignon	5dbb19b16a	bpf: Add third round of bounds deduction Commit `d7f0087381` ("bpf: try harder to deduce register bounds from different numeric domains") added a second call to __reg_deduce_bounds in reg_bounds_sync because a single call wasn't enough to converge to a fixed point in terms of register bounds. With patch "bpf: Improve bounds when s64 crosses sign boundary" from this series, Eduard noticed that calling __reg_deduce_bounds twice isn't enough anymore to converge. The first selftest added in "selftests/bpf: Test cross-sign 64bits range refinement" highlights the need for a third call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync performs the following bounds deduction: reg_bounds_sync entry: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __update_reg_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_bound_offset: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff)) __update_reg_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff)) In particular, notice how: 1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds learns new u32 bounds. 2. __reg64_deduce_bounds is unable to improve bounds at this point. 3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds. 4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds improves the smax and umin bounds thanks to patch "bpf: Improve bounds when s64 crosses sign boundary" from this series. 5. Subsequent functions are unable to improve the ranges further (only tnums). Yet, a better smin32 bound could be learned from the smin bound. __reg32_deduce_bounds is able to improve smin32 from smin, but for that we need a third call to __reg_deduce_bounds. As discussed in [1], there may be a better way to organize the deduction rules to learn the same information with less calls to the same functions. Such an optimization requires further analysis and is orthogonal to the present patchset. Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Co-developed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 10:02:13 -07:00
Paul Chaignon	00bf8d0c6c	bpf: Improve bounds when s64 crosses sign boundary __reg64_deduce_bounds currently improves the s64 range using the u64 range and vice versa, but only if it doesn't cross the sign boundary. This patch improves __reg64_deduce_bounds to cover the case where the s64 range crosses the sign boundary but overlaps with the u64 range on only one end. In that case, we can improve both ranges. Consider the following example, with the s64 range crossing the sign boundary: 0 U64_MAX \| [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx] \| \|----------------------------\|----------------------------\| \|xxxxx s64 range xxxxxxxxx] [xxxxxxx\| 0 S64_MAX S64_MIN -1 The u64 range overlaps only with positive portion of the s64 range. We can thus derive the following new s64 and u64 ranges. 0 U64_MAX \| [xxxxxx u64 range xxxxx] \| \|----------------------------\|----------------------------\| \| [xxxxxx s64 range xxxxx] \| 0 S64_MAX S64_MIN -1 The same logic can probably apply to the s32/u32 ranges, but this patch doesn't implement that change. In addition to the selftests, the __reg64_deduce_bounds change was also tested with Agni, the formal verification tool for the range analysis [1]. Link: https://github.com/bpfverif/agni [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-28 10:02:12 -07:00
Paul Chaignon	5345e64760	bpf: Simplify bounds refinement from s32 During the bounds refinement, we improve the precision of various ranges by looking at other ranges. Among others, we improve the following in this order (other things happen between 1 and 2): 1. Improve u32 from s32 in __reg32_deduce_bounds. 2. Improve s/u64 from u32 in __reg_deduce_mixed_bounds. 3. Improve s/u64 from s32 in __reg_deduce_mixed_bounds. In particular, if the s32 range forms a valid u32 range, we will use it to improve the u32 range in __reg32_deduce_bounds. In __reg_deduce_mixed_bounds, under the same condition, we will use the s32 range to improve the s/u64 ranges. If at (1) we were able to learn from s32 to improve u32, we'll then be able to use that in (2) to improve s/u64. Hence, as (3) happens under the same precondition as (1), it won't improve s/u64 ranges further than (1)+(2) did. Thus, we can get rid of (3). In addition to the extensive suite of selftests for bounds refinement, this patch was also tested with the Agni formal verification tool [1]. Additionally, Eduard mentioned: The argument appears to be as follows: Under precondition `(u32)reg->s32_min <= (u32)reg->s32_max` __reg32_deduce_bounds produces: reg->u32_min = max_t(u32, reg->s32_min, reg->u32_min); reg->u32_max = min_t(u32, reg->s32_max, reg->u32_max); And then first part of __reg_deduce_mixed_bounds assigns: a. reg->umin umax= (reg->umin & ~0xffffffffULL) \| max_t(u32, reg->s32_min, reg->u32_min); b. reg->umax umin= (reg->umax & ~0xffffffffULL) \| min_t(u32, reg->s32_max, reg->u32_max); And then second part of __reg_deduce_mixed_bounds assigns: c. reg->umin umax= (reg->umin & ~0xffffffffULL) \| (u32)reg->s32_min; d. reg->umax umin= (reg->umax & ~0xffffffffULL) \| (u32)reg->s32_max; But assignment (c) is a noop because: max_t(u32, reg->s32_min, reg->u32_min) >= (u32)reg->s32_min Hence RHS(a) >= RHS(c) and umin= does nothing. Also assignment (d) is a noop because: min_t(u32, reg->s32_max, reg->u32_max) <= (u32)reg->s32_max Hence RHS(b) <= RHS(d) and umin= does nothing. Plus the same reasoning for the part dealing with reg->s{min,max}_value: e. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) \| max_t(u32, reg->s32_min_value, reg->u32_min_value); f. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) \| min_t(u32, reg->s32_max_value, reg->u32_max_value); vs g. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) \| (u32)reg->s32_min_value; h. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) \| (u32)reg->s32_max_value; RHS(e) >= RHS(g) and RHS(f) <= RHS(h), hence smax=,smin= do nothing. This appears to be correct. Also, Shung-Hsi: Beside going through the reasoning, I also played with CBMC a bit to double check that as far as a single run of __reg_deduce_bounds() is concerned (and that the register state matches certain handwavy expectations), the change indeed still preserve the original behavior. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://github.com/bpfverif/agni [1] Link: https://lore.kernel.org/bpf/aIJwnFnFyUjNsCNa@mail.gmail.com	2025-07-27 19:23:29 +02:00
Linus Torvalds	b711733e89	A single fix for the PTP systemcounter mechanism: The rework of this mechanism added a 'use_nsec' member to struct system_counterval. get_device_system_crosststamp() instantiates that struct on the stack and hands a pointer to the driver callback. Only the drivers which set use_nsec to true, initialize that field, but all others ignore it. As get_device_system_crosststamp() does not initialize the struct, the use_nsec field contains random stack content in those cases. That causes a miscalulation usually resulting in a failing range check in the best case. Initialize the structure before handing it to the drivers to cure that. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGFA4THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRcsEACvQI0LmKTOigzSZvBT1CZnGcwpeqYi Ez0v/w+tpyfbwQgf9kxR+ZbjNdwqCYFnR8PZPFKFuvsanWRTIcYaTkIQWvDhcEX/ U4AFI3VkdZUFckCEY/fv7j3/jkp7pbLVHMq001Z9xaMMcE+ox1AlHpEW0Khd3gqL VFLXU5S7Q9H6J6ujjFAXAMuhgjk6WOz8q+ew3hnc3dxwyuEBAz83jOScH/be3dTl 10ydzoxFEa+ZlacAHX+SqZ7nhS7ExxNlwlUuTYj/EkBCQ8UIoS93YLA5bYMcWCao W5rs6vFJmMO6NR6lkqwfKmKyjovx79jHMVNKoxydZGvkqcNMtfc/eUfByxAkyCDP gmTCFwgKVGdjGsYwkGqafejmJt5OFrD1hMyWfBhGWQ/Z8CXuuJNEa/8trSyUK/CS DFD1InOLltbYuw7rY5gRxb+xmgBTxUMj8gF/hXYs7wNzJqNJXXNae/2Sue+Xi+mV iieEF8UonmpMe9k9w3+fFGGDWYa4lYnT5O3VMQ0nEjj6dt5RVQqRvjTa+GtQJzUs h4fUs+BIKyCkh6DgRKyIsDruzryOSnZ+vqMcGMm0gvPttc3cGYksLiQVlWYjQhxs pTFrHNGOSXMT5WBQ7KWKzGypHlf3WYhVWk1+dmPJedrdyr23AfgKAGM6zva1Oqjc 81w9DvppBL0sOA== =j6Po -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for the PTP systemcounter mechanism: The rework of this mechanism added a 'use_nsec' member to struct system_counterval. get_device_system_crosststamp() instantiates that struct on the stack and hands a pointer to the driver callback. Only the drivers which set use_nsec to true, initialize that field, but all others ignore it. As get_device_system_crosststamp() does not initialize the struct, the use_nsec field contains random stack content in those cases. That causes a miscalulation usually resulting in a failing range check in the best case. Initialize the structure before handing it to the drivers to cure that" * tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Zero initialize system_counterval when querying time from phc drivers	2025-07-27 09:31:32 -07:00
Puranjay Mohan	3ba58312e6	bpf: Move bpf_jit_get_prog_name() to core.c bpf_jit_get_prog_name() will be used by all JITs when enabling support for private stack. This function is currently implemented in the x86 JIT. Move the function to core.c so that other JITs can easily use it in their implementation of private stack. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250724120257.7299-2-puranjay@kernel.org	2025-07-26 21:26:51 +02:00
Thomas Weißschuh	b7b3500bd4	umd: Remove usermode driver framework The code is unused since `98e20e5e13` ("bpfilter: remove bpfilter"), therefore remove it. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-2-0d0083334382@linutronix.de	2025-07-26 21:03:04 +02:00
Thomas Weißschuh	2b03164eee	bpf/preload: Don't select USERMODE_DRIVER The usermode driver framework is not used anymore by the BPF preload code. Fixes: `cb80ddc671` ("bpf: Convert bpf_preload.ko to use light skeleton.") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-1-0d0083334382@linutronix.de	2025-07-26 21:02:48 +02:00
Linus Torvalds	2942242dde	11 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 7 are for MM. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaILYBgAKCRDdBJ7gKXxA jo0uAQDvTlAjH6TcgRW/cbqHRIeiRoZ9Bwh/RUlJXM9neDR2LgEA41B+ohTsxUmZ OhM3Ce94tiGrHnVlW3SsmVaO+1TjGAU= =KUR9 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "11 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 7 are for MM" * tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: sprintf.h requires stdarg.h resource: fix false warning in __request_region() mm/damon/core: commit damos_quota_goal->nid kasan: use vmalloc_dump_obj() for vmalloc error reports mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show() mm: update MAINTAINERS entry for HMM nilfs2: reject invalid file types when reading inodes selftests/mm: fix split_huge_page_test for folio_split() tests mailmap: add entry for Senozhatsky mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list	2025-07-24 19:13:30 -07:00
Akinobu Mita	91a229bb7b	resource: fix false warning in __request_region() A warning is raised when __request_region() detects a conflict with a resource whose resource.desc is IORES_DESC_DEVICE_PRIVATE_MEMORY. But this warning is only valid for iomem_resources. The hmem device resource uses resource.desc as the numa node id, which can cause spurious warnings. This warning appeared on a machine with multiple cxl memory expanders. One of the NUMA node id is 6, which is the same as the value of IORES_DESC_DEVICE_PRIVATE_MEMORY. In this environment it was just a spurious warning, but when I saw the warning I suspected a real problem so it's better to fix it. This change fixes this by restricting the warning to only iomem_resource. This also adds a missing new line to the warning message. Link: https://lkml.kernel.org/r/20250719112604.25500-1-akinobu.mita@gmail.com Fixes: `7dab174e2e` ("dax/hmem: Move hmem device registration to dax_hmem.ko") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 17:57:59 -07:00
Markus Blöchl	67c632b4a7	timekeeping: Zero initialize system_counterval when querying time from phc drivers Most drivers only populate the fields cycles and cs_id of system_counterval in their get_time_fn() callback for get_device_system_crosststamp(), unless they explicitly provide nanosecond values. When the use_nsecs field was added to struct system_counterval, most drivers did not care. Clock sources other than CSID_GENERIC could then get converted in convert_base_to_cs() based on an uninitialized use_nsecs field, which usually results in -EINVAL during the following range check. Pass in a fully zero initialized system_counterval_t to cure that. Fixes: `6b2e299775` ("timekeeping: Provide infrastructure for converting to/from a base clock") Signed-off-by: Markus Blöchl <markus@blochl.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250720-timekeeping_uninit_crossts-v2-1-f513c885b7c2@blochl.de	2025-07-22 14:25:21 +02:00
Yonghong Song	95993dc303	bpf: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...)) Intel linux test robot reported a warning that ERR_CAST can be used for error pointer casting instead of more-complicated/rarely-used ERR_PTR(PTR_ERR(...)) style. There is no functionality change, but still let us replace two such instances as it improves consistency and readability. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507201048.bceHy8zX-lkp@intel.com/ Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/20250720164754.3999140-1-yonghong.song@linux.dev	2025-07-21 17:27:09 -07:00
Linus Torvalds	2013e8c2e6	Tracing fixes for 6.16: - Fix timerlat with use of FORTIFY_SOURCE FORTIFY_SOURCE was added to the stack tracer where it compares the entry->caller array to having entry->size elements. timerlat has the following: memcpy(&entry->caller, fstack->calls, size); entry->size = size; Which triggers FORTIFY_SOURCE as the caller is populated before the entry->size is initialized. Swap the order to satisfy FORTIFY_SOURCE logic. - Add down_write(trace_event_sem) when adding trace events in modules Trace events being added to the ftrace_events array are protected by the trace_event_sem semaphore. But when loading modules that have trace events, the addition of the events are not protected by the semaphore and loading two modules that have events at the same time can corrupt the list. Also add a lockdep_assert_held(trace_event_sem) to _trace_add_event_dirs() to confirm its held when iterating the list. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaH06gBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qoJsAP0a+/E0f+5g7O/OtYPVEDSCREv1vj9c 3dr0iWopqaOC7gEAw8Vc5iWIHKcB/JuJ+GqALoutL+lihruG26MWkFFsOgU= =zH5J -----END PGP SIGNATURE----- Merge tag 'trace-v6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix timerlat with use of FORTIFY_SOURCE FORTIFY_SOURCE was added to the stack tracer where it compares the entry->caller array to having entry->size elements. timerlat has the following: memcpy(&entry->caller, fstack->calls, size); entry->size = size; Which triggers FORTIFY_SOURCE as the caller is populated before the entry->size is initialized. Swap the order to satisfy FORTIFY_SOURCE logic. - Add down_write(trace_event_sem) when adding trace events in modules Trace events being added to the ftrace_events array are protected by the trace_event_sem semaphore. But when loading modules that have trace events, the addition of the events are not protected by the semaphore and loading two modules that have events at the same time can corrupt the list. Also add a lockdep_assert_held(trace_event_sem) to _trace_add_event_dirs() to confirm it is held when iterating the list. * tag 'trace-v6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Add down_write(trace_event_sem) when adding trace event tracing/osnoise: Fix crash in timerlat_dump_stack()	2025-07-20 13:03:31 -07:00
Linus Torvalds	62347e2790	A single fix for the scheduler. A recent commit changed the runqueue counter nr_uninterruptible to an unsigned int. Due to the fact that the counters are not updated on migration of a uninterruptble task to a different CPU, these counters can exceed INT_MAX. The counter is cast to long in the load average calculation, which means that the cast expands into negative space resulting in bogus load average values. Convert it back to unsigned long to fix this. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmh82RYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXNlEACyQxl/OCXSmKpcxrIvMjnalXh3Ibs2 KqbZ3MA/hY1ZWhSKXBDMv8hkxE00PY8WXsPDtLeRz0n6GegbUx4zVsUQpzn24gRe uEl+qIuANaz5uMu2qlmQwSuxVQ0SDqLVFObrNgKQ554jJckjXKcgvrBjwczTw9lJ u6yTVLrknf0IsbQ79yxToaNf6jD3HcSIGX06Hs4EYucPzYd54VScr6LGGwux3rmI CN0RSA9RduUzcf/aKhe9/tmM6oss4tdByNuVSdx/yZ5QvVw5oOrWdI8sU0BlZsrC IjvqLvvLFKRPZre24UeFnIVxhtv+cwPXxrA6tR/SM2eUbHKzU8jHz2TlPu8xwOwb sZMMBjPx1Y+fpVQoGxEC1cEWbCUPSyQ7NGN2uS3Quk4XoQLRz6t//B+63rT65bfV wMAZpsgZ9/c23yDuPG11XxBMQYVbS0sFyLDymBC93Co8vh8PowDIf2xmgPFLxM78 HdOCCsHsBPqZutQGQ7/qw//e7T/9fv0MDP8D4fuyLrhUdvBAvGHZp85T4YmOUcjp bTTE28Rw1sBgdAnTNjGWfwRN7Hwc1DzaLh40Um+nnjMA+yG9uq7gjTYDx4w6W/EA V70tx9tfJOM11g/qfIURkJDpf8PezJ3KHg1ZOiPn9e+I08LoPVMeIqbFEnJ9VAc7 VStWpQd+SG99AQ== =Aqcr -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single fix for the scheduler. A recent commit changed the runqueue counter nr_uninterruptible to an unsigned int. Due to the fact that the counters are not updated on migration of a uninterruptble task to a different CPU, these counters can exceed INT_MAX. The counter is cast to long in the load average calculation, which means that the cast expands into negative space resulting in bogus load average values. Convert it back to unsigned long to fix this. * tag 'sched-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Change nr_uninterruptible type to unsigned long	2025-07-20 11:08:51 -07:00
Steven Rostedt	b5e8acc14d	tracing: Add down_write(trace_event_sem) when adding trace event When a module is loaded, it adds trace events defined by the module. It may also need to modify the modules trace printk formats to replace enum names with their values. If two modules are loaded at the same time, the adding of the event to the ftrace_events list can corrupt the walking of the list in the code that is modifying the printk format strings and crash the kernel. The addition of the event should take the trace_event_sem for write while it adds the new event. Also add a lockdep_assert_held() on that semaphore in __trace_add_event_dirs() as it iterates the list. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/20250718223158.799bfc0c@batman.local.home Reported-by: Fusheng Huang(黄富生) <Fusheng.Huang@luxshare-ict.com> Closes: https://lore.kernel.org/all/20250717105007.46ccd18f@batman.local.home/ Fixes: `110bf2b764` ("tracing: add protection around module events unload") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-19 13:54:59 -04:00
Linus Torvalds	bf61759db4	sched_ext: Fixes for v6.16-rc6 - Fix handling of migration disabled tasks in default idle selection. - update_locked_rq() called __this_cpu_write() spuriously with NULL when @rq was not locked. As the writes were spurious, it didn't break anything directly. However, the function could be called in a preemptible leading to a context warning in __this_cpu_write(). Skip the spurious NULL writes. - Selftest fix on UP. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaHvPZw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGabMAP4jSAr4gYWEBOUaD9btwnPxZwlSiAEQtqBDBVRb /UunFAD/WBwUPk/u7BchLHjuH3sYW5gQb40kbtUnmNvB+RNUUgc= =3WAD -----END PGP SIGNATURE----- Merge tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix handling of migration disabled tasks in default idle selection - update_locked_rq() called __this_cpu_write() spuriously with NULL when @rq was not locked. As the writes were spurious, it didn't break anything directly. However, the function could be called in a preemptible leading to a context warning in __this_cpu_write(). Skip the spurious NULL writes. - Selftest fix on UP * tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: idle: Handle migration-disabled tasks in idle selection sched/ext: Prevent update_locked_rq() calls with NULL rq selftests/sched_ext: Fix exit selftest hang on UP	2025-07-19 10:40:30 -07:00
Linus Torvalds	c64004df89	cgroup: Fixes for v6.16-rc6 An earlier commit to suppress a warning introduced a race condition where tasks can escape cgroup1 freezer. Revert the commit and simply remove the warning which was spurious to begin with. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaHvMvw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGadfAP0cT4QXwtw0VXyiNr5PMqxQ74rYsngJ+NevRbod fK6hIwD/T+owQc/ivYp5/N/XUgpT+Ixp7YRj2RIzQbL6SPjzOwE= =IlrN -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "An earlier commit to suppress a warning introduced a race condition where tasks can escape cgroup1 freezer. Revert the commit and simply remove the warning which was spurious to begin with" * tag 'cgroup-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Revert "cgroup_freezer: cgroup_freezing: Check if not frozen" sched,freezer: Remove unnecessary warning in __thaw_task	2025-07-19 10:00:47 -07:00
Tomas Glozar	85a3bce695	tracing/osnoise: Fix crash in timerlat_dump_stack() We have observed kernel panics when using timerlat with stack saving, with the following dmesg output: memcpy: detected buffer overflow: 88 byte write of buffer size 0 WARNING: CPU: 2 PID: 8153 at lib/string_helpers.c:1032 __fortify_report+0x55/0xa0 CPU: 2 UID: 0 PID: 8153 Comm: timerlatu/2 Kdump: loaded Not tainted 6.15.3-200.fc42.x86_64 #1 PREEMPT(lazy) Call Trace: <TASK> ? trace_buffer_lock_reserve+0x2a/0x60 __fortify_panic+0xd/0xf __timerlat_dump_stack.cold+0xd/0xd timerlat_dump_stack.part.0+0x47/0x80 timerlat_fd_read+0x36d/0x390 vfs_read+0xe2/0x390 ? syscall_exit_to_user_mode+0x1d5/0x210 ksys_read+0x73/0xe0 do_syscall_64+0x7b/0x160 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e __timerlat_dump_stack() constructs the ftrace stack entry like this: struct stack_entry *entry; ... memcpy(&entry->caller, fstack->calls, size); entry->size = fstack->nr_entries; Since commit `e7186af7fb` ("tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure"), struct stack_entry marks its caller field with __counted_by(size). At the time of the memcpy, entry->size contains garbage from the ringbuffer, which under some circumstances is zero, triggering a kernel panic by buffer overflow. Populate the size field before the memcpy so that the out-of-bounds check knows the correct size. This is analogous to __ftrace_trace_stack(). Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Attila Fazekas <afazekas@redhat.com> Link: https://lore.kernel.org/20250716143601.7313-1-tglozar@redhat.com Fixes: `e7186af7fb` ("tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-18 15:51:35 -04:00
Alexei Starovoitov	beb1097ec8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6 Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-18 12:15:59 -07:00
Linus Torvalds	d786aba320	bpf-fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmh5r+QACgkQ6rmadz2v bTqrFA/+OW+z+vh8k43C9TVZttqC5poGcFqF5zRlYTArQT3AB+QuhG/CRjVQFbGL 2YfVbq+5pxNo0I/FtCoWVui2OMf1UsRKKvM0pSn50yn3ytRfotZjQ/AWACm/9t5y fyRLBIS3ArjashQ9/S71tAIfG6l/B+FGX81wOVa1uL50ab15+4NrplhZHY421o9a lH2E2wnpy/BnrB9F/FO4iQbelixvBfMwj8epruhCVbipfx6BOKPMzKVtcm61FVT1 hDsQZ0bIpVKgpRNBlTUHjVyzYo8oeXzqVhhY7hsmpHxJSiol7KLWyHEJD5ExS9Qg XVPK34b9IPgAfS8f/DgGAkWsAht7BMLsR0GUWyVIiacHHqTinRPVfWbzqWa5yjdD +8Vp4RVrcUONx69upx+IDrb4uMfQYktdpcvQtSl0SSinsG/INXurT1Vyz8aBPfkv WbiBeXhW/dCD9NuL5D9gnyZWaPXIAmbK7+pXJOSIpfKC24WRXTONDXhGP1b6ef31 zHQu3r98ekYnHr3hbsvdHOWB7LKkJ1bcg2+OsmtYUUmnCiQTM1H8ILTwbSQ4EfXJ 6iRxYeFp+VJOPScRzmNU/A3ibQWfV+foiO4S6hmazJOy3mmHX6hgPZoj2fjV8Ejf xZeOpQbCaZQCzbxxOdtjykwfe+zPWGnyRPnQpVIVdi7Abk1EZaE= =eUL7 -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix handling of BPF arena relocations (Andrii Nakryiko) - Fix race in bpf_arch_text_poke() on s390 (Ilya Leoshkevich) - Fix use of virt_to_phys() on arm64 when mmapping BTF (Lorenz Bauer) - Reject %p% format string in bprintf-like BPF helpers (Paul Chaignon) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: libbpf: Fix handling of BPF arena relocations btf: Fix virt_to_phys() on arm64 when mmapping BTF selftests/bpf: Stress test attaching a BPF prog to another BPF prog s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL again selftests/bpf: Add negative test cases for snprintf bpf: Reject %p% format string in bprintf-like helpers	2025-07-18 11:46:26 -07:00
Lorenz Bauer	2e2713ae1a	btf: Fix virt_to_phys() on arm64 when mmapping BTF Breno Leitao reports that arm64 emits the following warning with CONFIG_DEBUG_VIRTUAL: [ 58.896157] virt_to_phys used for non-linear address: 000000009fea9737 (__start_BTF+0x0/0x685530) [ 23.988669] WARNING: CPU: 25 PID: 1442 at arch/arm64/mm/physaddr.c:15 __virt_to_phys (arch/arm64/mm/physaddr.c:?) ... [ 24.075371] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST [ 24.080276] Hardware name: Quanta S7GM 20S7GCU0010/S7G MB (CG1), BIOS 3D22 07/03/2024 [ 24.088295] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 24.098440] pc : __virt_to_phys (arch/arm64/mm/physaddr.c:?) [ 24.105398] lr : __virt_to_phys (arch/arm64/mm/physaddr.c:?) ... [ 24.197257] Call trace: [ 24.199761] __virt_to_phys (arch/arm64/mm/physaddr.c:?) (P) [ 24.206883] btf_sysfs_vmlinux_mmap (kernel/bpf/sysfs_btf.c:27) [ 24.214264] sysfs_kf_bin_mmap (fs/sysfs/file.c:179) [ 24.218536] kernfs_fop_mmap (fs/kernfs/file.c:462) [ 24.222461] mmap_region (./include/linux/fs.h:? mm/internal.h:167 mm/vma.c:2405 mm/vma.c:2467 mm/vma.c:2622 mm/vma.c:2692) It seems that the memory layout on arm64 maps the kernel image in vmalloc space which is different than x86. This makes virt_to_phys emit the warning. Fix this by translating the address using __pa_symbol as suggested by Breno instead. Reported-by: Breno Leitao <leitao@debian.org> Closes: https://lore.kernel.org/bpf/g2gqhkunbu43awrofzqb4cs4sxkxg2i4eud6p4qziwrdh67q4g@mtw3d3aqfgmb/ Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Tested-by: Breno Leitao <leitao@debian> Fixes: `a539e2a6d5` ("btf: Allow mmap of vmlinux btf") Link: https://lore.kernel.org/r/20250717-vmlinux-mmap-pa-symbol-v1-1-970be6681158@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-17 11:33:52 -07:00
Andrea Righi	06efc9fe0b	sched_ext: idle: Handle migration-disabled tasks in idle selection When SCX_OPS_ENQ_MIGRATION_DISABLED is enabled, migration-disabled tasks are also routed to ops.enqueue(). A scheduler may attempt to dispatch such tasks directly to an idle CPU using the default idle selection policy via scx_bpf_select_cpu_and() or scx_bpf_select_cpu_dfl(). This scenario must be properly handled by the built-in idle policy to avoid returning an idle CPU where the target task isn't allowed to run. Otherwise, it can lead to errors such as: EXIT: runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled Chrome_ChildIOT[291646] from CPU 3 to 14) Prevent this by explicitly handling migration-disabled tasks in the built-in idle selection logic, maintaining their CPU affinity. Fixes: `a730e3f7a4` ("sched_ext: idle: Consolidate default idle CPU selection kfuncs") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-07-17 08:19:38 -10:00
Chen Ridong	14a67b42cb	Revert "cgroup_freezer: cgroup_freezing: Check if not frozen" This reverts commit `cff5f49d43`. Commit `cff5f49d43` ("cgroup_freezer: cgroup_freezing: Check if not frozen") modified the cgroup_freezing() logic to verify that the FROZEN flag is not set, affecting the return value of the freezing() function, in order to address a warning in __thaw_task. A race condition exists that may allow tasks to escape being frozen. The following scenario demonstrates this issue: CPU 0 (get_signal path) CPU 1 (freezer.state reader) try_to_freeze read freezer.state __refrigerator freezer_read update_if_frozen WRITE_ONCE(current->__state, TASK_FROZEN); ... /* Task is now marked frozen / / frozen(task) == true / / Assuming other tasks are frozen / freezer->state \|= CGROUP_FROZEN; / freezing(current) returns false / / because cgroup is frozen (not freezing) / break out __set_current_state(TASK_RUNNING); / Bug: Task resumes running when it should remain frozen */ The existing !frozen(p) check in __thaw_task makes the WARN_ON_ONCE(freezing(p)) warning redundant. Removing this warning enables reverting the commit `cff5f49d43` ("cgroup_freezer: cgroup_freezing: Check if not frozen") to resolve the issue. The warning has been removed in the previous patch. This patch revert the commit `cff5f49d43` ("cgroup_freezer: cgroup_freezing: Check if not frozen") to complete the fix. Fixes: `cff5f49d43` ("cgroup_freezer: cgroup_freezing: Check if not frozen") Reported-by: Zhong Jiawei<zhongjiawei1@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-07-17 07:57:02 -10:00
Chen Ridong	9beb8c5e77	sched,freezer: Remove unnecessary warning in __thaw_task Commit `cff5f49d43` ("cgroup_freezer: cgroup_freezing: Check if not frozen") modified the cgroup_freezing() logic to verify that the FROZEN flag is not set, affecting the return value of the freezing() function, in order to address a warning in __thaw_task. A race condition exists that may allow tasks to escape being frozen. The following scenario demonstrates this issue: CPU 0 (get_signal path) CPU 1 (freezer.state reader) try_to_freeze read freezer.state __refrigerator freezer_read update_if_frozen WRITE_ONCE(current->__state, TASK_FROZEN); ... /* Task is now marked frozen / / frozen(task) == true / / Assuming other tasks are frozen / freezer->state \|= CGROUP_FROZEN; / freezing(current) returns false / / because cgroup is frozen (not freezing) / break out __set_current_state(TASK_RUNNING); / Bug: Task resumes running when it should remain frozen */ The existing !frozen(p) check in __thaw_task makes the WARN_ON_ONCE(freezing(p)) warning redundant. Removing this warning enables reverting commit `cff5f49d43` ("cgroup_freezer: cgroup_freezing: Check if not frozen") to resolve the issue. This patch removes the warning from __thaw_task. A subsequent patch will revert commit `cff5f49d43` ("cgroup_freezer: cgroup_freezing: Check if not frozen") to complete the fix. Reported-by: Zhong Jiawei<zhongjiawei1@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-07-17 07:56:50 -10:00
Linus Torvalds	e6e82e5bed	Power management fixes for 6.16-rc7 - Fix a deadlock that may occur on asynchronous device suspend failures due to missing completion updates in error paths (Rafael Wysocki). - Drop a misplaced pm_restore_gfp_mask() call, which may cause swap to be accessed too early if system suspend fails, from suspend_devices_and_enter() (Rafael Wysocki). - Remove duplicate filesystems_freeze/thaw() calls, which sometimes cause systems to be unable to resume, from enter_state() (Zihuan Zhang). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmh5IE4SHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO12LYH/3CULHOIoshuWu+G9nIKokqO0oNYmxh1 qgkh+o9sBz9uTyfCSd1qDT9j1LjzUnOJUe67IzHJFuZcHbnWU4k9VYWV+H8TKyNp CcQ+9g5gCqOzxWH7G7C2ekciSnnBlObwJ7ZsDlUOeuJ16GVCjqrFPZbJ6No0A+Hz 8Ed7R4o1MKrURLU9IZWpqV1a54Z9ySv2yrx9T4G0c8WV2VRJZJ76e1hAGcOr4owj kM1+MPnsfU/RvBUUEKjUEm70ZBXGbXT+D9p/L/AuoYyhI94kvoImK1/2An5noHCO czK5nDB867z6hu5jTVPt/RoIK/49H/a2CDNYl3ZiZnVVZIoPN/wt3C8= =wkHb -----END PGP SIGNATURE----- Merge tag 'pm-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These address three issues introduced during the current development cycle and related to system suspend and hibernation, one triggering when asynchronous suspend of devices fails, one possibly affecting memory management in the core suspend code error path, and one due to duplicate filesystems freezing during system suspend: - Fix a deadlock that may occur on asynchronous device suspend failures due to missing completion updates in error paths (Rafael Wysocki) - Drop a misplaced pm_restore_gfp_mask() call, which may cause swap to be accessed too early if system suspend fails, from suspend_devices_and_enter() (Rafael Wysocki) - Remove duplicate filesystems_freeze/thaw() calls, which sometimes cause systems to be unable to resume, from enter_state() (Zihuan Zhang)" * tag 'pm-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: sleep: Update power.completion for all devices on errors PM: suspend: clean up redundant filesystems_freeze/thaw() handling PM: suspend: Drop a misplaced pm_restore_gfp_mask() call	2025-07-17 09:46:37 -07:00
Tao Chen	19d18fdfc7	bpf: Add struct bpf_token_info The 'commit `35f96de041` ("bpf: Introduce BPF token object")' added BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD already used to get BPF object info, so we can also get token info with this cmd. One usage scenario, when program runs failed with token, because of the permission failure, we can report what BPF token is allowing with this API for debugging. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:38:05 -07:00
Feng Yang	62ef449b8d	bpf: Clean up individual BTF_ID code Use BTF_ID_LIST_SINGLE(a, b, c) instead of BTF_ID_LIST(a) BTF_ID(b, c) Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Link: https://lore.kernel.org/r/20250710055419.70544-1-yangfeng59949@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:34:42 -07:00
Ilya Leoshkevich	1f489662fb	bpf: Update iterators.lskel-big-endian.h The last iterators update (commit `515ee52b22` ("bpf: make preloaded map iterators to display map elements count")) missed the big-endian skeleton. Update it by running "make big" with Debian clang version 21.0.0 (++20250706105601+01c97b4953e8-1~exp1~20250706225612.1558). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250710100907.45880-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-16 18:29:18 -07:00
Breno Leitao	e14fd98c6d	sched/ext: Prevent update_locked_rq() calls with NULL rq Avoid invoking update_locked_rq() when the runqueue (rq) pointer is NULL in the SCX_CALL_OP and SCX_CALL_OP_RET macros. Previously, calling update_locked_rq(NULL) with preemption enabled could trigger the following warning: BUG: using __this_cpu_write() in preemptible [00000000] This happens because __this_cpu_write() is unsafe to use in preemptible context. rq is NULL when an ops invoked from an unlocked context. In such cases, we don't need to store any rq, since the value should already be NULL (unlocked). Ensure that update_locked_rq() is only called when rq is non-NULL, preventing calling __this_cpu_write() on preemptible context. Suggested-by: Peter Zijlstra <peterz@infradead.org> Fixes: `18853ba782` ("sched_ext: Track currently locked rq") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v6.15	2025-07-16 15:02:12 -10:00
Nathan Chancellor	1ed171a3af	tracing/probes: Avoid using params uninitialized in parse_btf_arg() After a recent change in clang to strengthen uninitialized warnings [1], it points out that in one of the error paths in parse_btf_arg(), params is used uninitialized: kernel/trace/trace_probe.c:660:19: warning: variable 'params' is uninitialized when used here [-Wuninitialized] 660 \| return PTR_ERR(params); \| ^~~~~~ Match many other NO_BTF_ENTRY error cases and return -ENOENT, clearing up the warning. Link: https://lore.kernel.org/all/20250715-trace_probe-fix-const-uninit-warning-v1-1-98960f91dd04@kernel.org/ Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2110 Fixes: `d157d76944` ("tracing/probes: Support BTF field access from $retval") Link: `2464313eef` [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-07-16 14:01:54 +09:00
Zihuan Zhang	228b9deded	PM: suspend: clean up redundant filesystems_freeze/thaw() handling The recently introduced support for freezing filesystems during system suspend included calls to filesystems_freeze() in both suspend_prepare() and enter_state(), as well as calls to filesystems_thaw() in both suspend_finish() and the Unlock path in enter_state(). These are redundant. Moreover, calling filesystems_freeze() twice, from both suspend_prepare() and enter_state(), leads to a black screen and makes the system unable to resume in some cases. Address this as follows: - filesystems_freeze() is already called in suspend_prepare(), which is the proper and consistent place to handle pre-suspend operations. The second call in enter_state() is unnecessary and so remove it. - filesystems_thaw() is invoked in suspend_finish(), which covers successful suspend/resume paths. In the failure case, add a call to filesystems_thaw() only when needed, avoiding the duplicate call in the general Unlock path. This change simplifies the suspend code and avoids repeated freeze/thaw calls, while preserving correct ordering and behavior. Fixes: `eacfbf7419` ("power: freeze filesystems during suspend/resume") Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/20250712030824.81474-1-zhangzihuan@kylinos.cn [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-15 14:55:00 +02:00
Rafael J. Wysocki	75b63ce2c9	PM: suspend: Drop a misplaced pm_restore_gfp_mask() call The pm_restore_gfp_mask() call added by commit `12ffc3b151` ("PM: Restrict swap use to later in the suspend sequence") to suspend_devices_and_enter() is done too early because it takes place before calling dpm_resume() in dpm_resume_end() and some swap-backing devices may not be ready at that point. Moreover, dpm_resume_end() called subsequently in the same code path invokes pm_restore_gfp_mask() again and calling it twice in a row is pointless. Drop the misplaced pm_restore_gfp_mask() call from suspend_devices_and_enter() to address this issue. Fixes: `12ffc3b151` ("PM: Restrict swap use to later in the suspend sequence") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/2810409.mvXUDI8C0e@rjwysocki.net [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-15 14:54:26 +02:00
Aruna Ramakrishna	36569780b0	sched: Change nr_uninterruptible type to unsigned long The commit `e6fe3f422b` ("sched: Make multiple runqueue task counters 32-bit") changed nr_uninterruptible to an unsigned int. But the nr_uninterruptible values for each of the CPU runqueues can grow to large numbers, sometimes exceeding INT_MAX. This is valid, if, over time, a large number of tasks are migrated off of one CPU after going into an uninterruptible state. Only the sum of all nr_interruptible values across all CPUs yields the correct result, as explained in a comment in kernel/sched/loadavg.c. Change the type of nr_uninterruptible back to unsigned long to prevent overflows, and thus the miscalculation of load average. Fixes: `e6fe3f422b` ("sched: Make multiple runqueue task counters 32-bit") Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250709173328.606794-1-aruna.ramakrishna@oracle.com	2025-07-14 10:59:31 +02:00
Linus Torvalds	0a197b7576	- Prevent perf_sigtrap() from observing an exiting task and warning about it -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhzhRwACgkQEsHwGGHe VUpz3xAAwLM/anyGvUXMP9Wx3X3kXM0NMw0NfkSucE22p7R//DfVWwLgz26hE8VX haUp8Idnmp2EV6BsA/6SmslSbzYeBNKSIWwQiRI67goIACK0MDqzYkTlTnS170BU bw4Lvd5PzA9DZiHTpqNCmg2m9ouGCQHOsxfKzs6DUMgpca6y8imFYW3gMZwGyA0m Zkanw+BqPAhYSbTjZ1ZgtYmtPuvunXu/K/meEbonW2prRXfhZ9tbU31q/iRtHy3c v9nRiK83pbqIJHUfYraTlfvmkMz975gvuN7mtmj5H2v+gc1S9adp1VrcI3tYFZDB d5FVDsQpqWNBwewHG6VyYww6wCfm0IRg6Ys+rujhYl/ICjmthDUQJg9tZnZ03yxz sv/b9E84Nq5BiJeLB88Ue0vRhH4ct9MTUr3Z9zhSKGN3IArpMPOm2wlmUmPoGerZ g1jR80oG6Ikwl660MBpEX9yqyG8P5b7jnLGMJC0QJX9c0TLSY8yQgnnWKDVUvLVx SrCjACQ2PnxqS9v2RlB3omoBYeEZfv7coy/vKIXZygXg7aNaDMemUktveihjUxLg MLdOTH3Ygq5lASIGhhjqJenZ/844XIhZ9rgUcz3HLzlkXLGt8NmjOT+rW1IHAZHo VpKbYPedr4uTqcxSDp4Qr/53m9xm4OTPe+3XEkXmeN5WQXbEh4A= =vP1/ -----END PGP SIGNATURE----- Merge tag 'perf_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Prevent perf_sigtrap() from observing an exiting task and warning about it * tag 'perf_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix WARN in perf_sigtrap()	2025-07-13 10:34:47 -07:00
Linus Torvalds	3f31a806a6	19 hotfixes. A whopping 16 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 14 are for MM. Three gdb-script fixes and a kallsyms build fix. -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaHGbTgAKCRDdBJ7gKXxA jowqAPiCWBFfcFaX20BxVaMU1PjC3Lh9llDXqQwBhBNdcadSAP44SGQ8nrfV+piB OcNz2AEwBBfS354G0Etlh4k08YoAAw== =IDDc -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. A whopping 16 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 14 are for MM. Three gdb-script fixes and a kallsyms build fix" * tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "sched/numa: add statistics of numa balance task" mm: fix the inaccurate memory statistics issue for users mm/damon: fix divide by zero in damon_get_intervals_score() samples/damon: fix damon sample mtier for start failure samples/damon: fix damon sample wsse for start failure samples/damon: fix damon sample prcl for start failure kasan: remove kasan_find_vm_area() to prevent possible deadlock scripts: gdb: vfs: support external dentry names mm/migrate: fix do_pages_stat in compat mode mm/damon/core: handle damon_call_control as normal under kdmond deactivation mm/rmap: fix potential out-of-bounds page table access during batched unmap mm/hugetlb: don't crash when allocating a folio if there are no resv scripts/gdb: de-reference per-CPU MCE interrupts scripts/gdb: fix interrupts.py after maple tree conversion maple_tree: fix mt_destroy_walk() on root leaf node mm/vmalloc: leave lazy MMU mode on PTE mapping error scripts/gdb: fix interrupts display after MCP on x86 lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() kallsyms: fix build without execinfo	2025-07-12 10:30:47 -07:00
Tao Chen	0eeeebdcc5	bpf: Remove attach_type in bpf_tracing_link Use attach_type in bpf_link, and remove it in bpf_tracing_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-7-chen.dylane@linux.dev	2025-07-11 11:01:08 -07:00
Tao Chen	2a76a80c7f	bpf: Remove attach_type in bpf_netns_link Use attach_type in bpf_link, and remove it in bpf_netns_link. And move netns_type field to the end to fill the byte hole. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20250710032038.888700-6-chen.dylane@linux.dev	2025-07-11 11:01:04 -07:00
Tao Chen	6e816e1c05	bpf: Remove location field in tcx_link Use attach_type in bpf_link to replace the location filed, and remove location field in tcx_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20250710032038.888700-5-chen.dylane@linux.dev	2025-07-11 11:00:57 -07:00
Tao Chen	9b8d543dc2	bpf: Remove attach_type in bpf_cgroup_link Use attach_type in bpf_link, and remove it in bpf_cgroup_link. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-3-chen.dylane@linux.dev	2025-07-11 10:51:55 -07:00
Tao Chen	b725441f02	bpf: Add attach_type field to bpf_link Attach_type will be set when a link is created by user. It is better to record attach_type in bpf_link generically and have it available universally for all link types. So add the attach_type field in bpf_link and move the sleepable field to avoid unnecessary gap padding. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-2-chen.dylane@linux.dev	2025-07-11 10:51:55 -07:00
Paul Chaignon	6279846b9b	bpf: Forget ranges when refining tnum after JSET Syzbot reported a kernel warning due to a range invariant violation on the following BPF program. 0: call bpf_get_netns_cookie 1: if r0 == 0 goto <exit> 2: if r0 & Oxffffffff goto <exit> The issue is on the path where we fall through both jumps. That path is unreachable at runtime: after insn 1, we know r0 != 0, but with the sign extension on the jset, we would only fallthrough insn 2 if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to figure this out, so the verifier walks all branches. The verifier then refines the register bounds using the second condition and we end up with inconsistent bounds on this unreachable path: 1: if r0 == 0 goto <exit> r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff) 2: if r0 & 0xffffffff goto <exit> r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0) r0 after reg_bounds_sync: u64=[0x1, 0] var_off=(0, 0) Improving the range refinement for JSET to cover all cases is tricky. We also don't expect many users to rely on JSET given LLVM doesn't generate those instructions. So instead of improving the range refinement for JSETs, Eduard suggested we forget the ranges whenever we're narrowing tnums after a JSET. This patch implements that approach. Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/9d4fd6432a095d281f815770608fdcd16028ce0b.1752171365.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-11 10:45:25 -07:00
Emil Tsalapatis	8fc3d2d8b5	bpf/arena: add bpf_arena_reserve_pages kfunc Add a new BPF arena kfunc for reserving a range of arena virtual addresses without backing them with pages. This prevents the range from being populated using bpf_arena_alloc_pages(). Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250709191312.29840-2-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-07-11 10:43:54 -07:00
Linus Torvalds	a0f8361c3c	dma-mapping fix for Linux 6.16 - small fix relevant to arm64 server and custom CMA configuration (Feng Tang) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaHCzdQAKCRCJp1EFxbsS RMrMAQDghOwKZqYuC27kJt5T7lgG47YCNE5em1v8WsTSvwQAugEA4AlWIpqQ34eI Es6ObfMt8Q9gArubFZ0ZDFtmZq9NpA0= =+z0i -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fix from Marek Szyprowski: - small fix relevant to arm64 server and custom CMA configuration (Feng Tang) * tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-contiguous: hornor the cma address limit setup by user	2025-07-11 08:49:25 -07:00
Chen Yu	db6cc3f4ac	Revert "sched/numa: add statistics of numa balance task" This reverts commit `ad6b26b6a0`. This commit introduces per-memcg/task NUMA balance statistics, but unfortunately it introduced a NULL pointer exception due to the following race condition: After a swap task candidate was chosen, its mm_struct pointer was set to NULL due to task exit. Later, when performing the actual task swapping, the p->mm caused the problem. CPU0 CPU1 : ... task_numa_migrate task_numa_find_cpu task_numa_compare # a normal task p is chosen env->best_task = p # p exit: exit_signals(p); p->flags \|= PF_EXITING exit_mm p->mm = NULL; migrate_swap_stop __migrate_swap_task((arg->src_task, arg->dst_cpu) count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL task_lock() should be held and the PF_EXITING flag needs to be checked to prevent this from happening. After discussion, the conclusion was that adding a lock is not worthwhile for some statistics calculations. Revert the change and rely on the tracepoint for this purpose. Link: https://lkml.kernel.org/r/20250704135620.685752-1-yu.c.chen@intel.com Link: https://lkml.kernel.org/r/20250708064917.BBD13C4CEED@smtp.kernel.org Fixes: `ad6b26b6a0` ("sched/numa: add statistics of numa balance task") Signed-off-by: Chen Yu <yu.c.chen@intel.com> Reported-by: Jirka Hladky <jhladky@redhat.com> Closes: https://lore.kernel.org/all/CAE4VaGBLJxpd=NeRJXpSCuw=REhC5LWJpC29kDy-Zh2ZDyzQZA@mail.gmail.com/ Reported-by: Srikanth Aithal <Srikanth.Aithal@amd.com> Reported-by: Suneeth D <Suneeth.D@amd.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Hladky <jhladky@redhat.com> Cc: Libo Chen <libo.chen@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-09 21:07:56 -07:00
Tetsuo Handa	3da6bb4197	perf/core: Fix WARN in perf_sigtrap() Since exit_task_work() runs after perf_event_exit_task_context() updated ctx->task to TASK_TOMBSTONE, perf_sigtrap() from perf_pending_task() might observe event->ctx->task == TASK_TOMBSTONE. Swap the early exit tests in order not to hit WARN_ON_ONCE(). Closes: https://syzkaller.appspot.com/bug?extid=2fe61cb2a86066be6985 Reported-by: syzbot <syzbot+2fe61cb2a86066be6985@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/b1c224bd-97f9-462c-a3e3-125d5e19c983@I-love.SAKURA.ne.jp	2025-07-09 13:40:17 +02:00
Sebastian Andrzej Siewior	570db4b39f	module: Make sure relocations are applied to the per-CPU section The per-CPU data section is handled differently than the other sections. The memory allocations requires a special __percpu pointer and then the section is copied into the view of each CPU. Therefore the SHF_ALLOC flag is removed to ensure move_module() skips it. Later, relocations are applied and apply_relocations() skips sections without SHF_ALLOC because they have not been copied. This also skips the per-CPU data section. The missing relocations result in a NULL pointer on x86-64 and very small values on x86-32. This results in a crash because it is not skipped like NULL pointer would and can't be dereferenced. Such an assignment happens during static per-CPU lock initialisation with lockdep enabled. Allow relocation processing for the per-CPU section even if SHF_ALLOC is missing. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com Fixes: 1a6100caae425 ("Don't relocate non-allocated regions in modules.") #v2.6.1-rc3 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Link: https://lore.kernel.org/r/20250610163328.URcsSUC1@linutronix.de Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Message-ID: <20250610163328.URcsSUC1@linutronix.de>	2025-07-08 20:52:30 +02:00
Petr Pavlu	eb0994a954	module: Avoid unnecessary return value initialization in move_module() All error conditions in move_module() set the return value by updating the ret variable. Therefore, it is not necessary to the initialize the variable when declaring it. Remove the unnecessary initialization. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250618122730.51324-3-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Message-ID: <20250618122730.51324-3-petr.pavlu@suse.com>	2025-07-08 20:52:29 +02:00

1 2 3 4 5 ...

48531 Commits (41fee4f0036734bec427659f749e44cfe1821565)