mirror-linux

Commit Graph

Author	SHA1	Message	Date
Pasha Tatashin	16cec0d265	liveupdate: luo_session: add ioctls for file preservation Introducing the userspace interface and internal logic required to manage the lifecycle of file descriptors within a session. Previously, a session was merely a container; this change makes it a functional management unit. The following capabilities are added: A new set of ioctl commands are added, which operate on the file descriptor returned by CREATE_SESSION. This allows userspace to: - LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session to be preserved across the live update. - LIVEUPDATE_SESSION_RETRIEVE_FD: Retrieve a preserved file in the new kernel using its unique token. - LIVEUPDATE_SESSION_FINISH: finish session The session's .release handler is enhanced to be state-aware. When a session's file descriptor is closed, it correctly unpreserves the session based on its current state before freeing all associated file resources. Link: https://lkml.kernel.org/r/20251125165850.3389713-8-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: David Matlack <dmatlack@google.com> Cc: Aleksander Lobakin <aleksander.lobakin@intel.com> Cc: Alexander Graf <graf@amazon.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: anish kumar <yesanishhere@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Wagner <wagi@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guixin Liu <kanie@linux.alibaba.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lennart Poettering <lennart@poettering.net> Cc: Leon Romanovsky <leon@kernel.org> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Lukas Wunner <lukas@wunner.de> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Matthew Maurer <mmaurer@google.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Myugnjoo Ham <myungjoo.ham@samsung.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Song Liu <song@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: William Tu <witu@nvidia.com> Cc: Yoann Congal <yoann.congal@smile.fr> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:39 -08:00
Pasha Tatashin	81cd25d263	liveupdate: luo_core: add user interface Introduce the user-space interface for the Live Update Orchestrator via ioctl commands, enabling external control over the live update process and management of preserved resources. The idea is that there is going to be a single userspace agent driving the live update, therefore, only a single process can ever hold this device opened at a time. The following ioctl commands are introduced: LIVEUPDATE_IOCTL_CREATE_SESSION Provides a way for userspace to create a named session for grouping file descriptors that need to be preserved. It returns a new file descriptor representing the session. LIVEUPDATE_IOCTL_RETRIEVE_SESSION Allows the userspace agent in the new kernel to reclaim a preserved session by its name, receiving a new file descriptor to manage the restored resources. Link: https://lkml.kernel.org/r/20251125165850.3389713-6-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Tested-by: David Matlack <dmatlack@google.com> Cc: Aleksander Lobakin <aleksander.lobakin@intel.com> Cc: Alexander Graf <graf@amazon.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: anish kumar <yesanishhere@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Wagner <wagi@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guixin Liu <kanie@linux.alibaba.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lennart Poettering <lennart@poettering.net> Cc: Leon Romanovsky <leon@kernel.org> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Lukas Wunner <lukas@wunner.de> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Matthew Maurer <mmaurer@google.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Myugnjoo Ham <myungjoo.ham@samsung.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Song Liu <song@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: William Tu <witu@nvidia.com> Cc: Yoann Congal <yoann.congal@smile.fr> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:38 -08:00
Pasha Tatashin	0153094d03	liveupdate: luo_session: add sessions support Introduce concept of "Live Update Sessions" within the LUO framework. LUO sessions provide a mechanism to group and manage `struct file *` instances (representing file descriptors) that need to be preserved across a kexec-based live update. Each session is identified by a unique name and acts as a container for file objects whose state is critical to a userspace workload, such as a virtual machine or a high-performance database, aiming to maintain their functionality across a kernel transition. This groundwork establishes the framework for preserving file-backed state across kernel updates, with the actual file data preservation mechanisms to be implemented in subsequent patches. [dan.carpenter@linaro.org: fix use after free in luo_session_deserialize()] Link: https://lkml.kernel.org/r/c5dd637d7eed3a3be48c5e9fedb881596a3b1f5a.1764163896.git.dan.carpenter@linaro.org Link: https://lkml.kernel.org/r/20251125165850.3389713-5-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Tested-by: David Matlack <dmatlack@google.com> Cc: Aleksander Lobakin <aleksander.lobakin@intel.com> Cc: Alexander Graf <graf@amazon.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: anish kumar <yesanishhere@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Wagner <wagi@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guixin Liu <kanie@linux.alibaba.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lennart Poettering <lennart@poettering.net> Cc: Leon Romanovsky <leon@kernel.org> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Lukas Wunner <lukas@wunner.de> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Matthew Maurer <mmaurer@google.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Myugnjoo Ham <myungjoo.ham@samsung.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Song Liu <song@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: William Tu <witu@nvidia.com> Cc: Yoann Congal <yoann.congal@smile.fr> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:38 -08:00
Pasha Tatashin	9e2fd062fa	liveupdate: luo_core: Live Update Orchestrator Patch series "Live Update Orchestrator", v8. This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. The other series that use LUO, are VFIO [1], IOMMU [2], and PCI [3] preservations. Github repo of this series [4]. The core of LUO is a framework for managing the lifecycle of preserved resources through a userspace-driven interface. Key features include: - Session Management Userspace agent (i.e. luod [5]) creates named sessions, each represented by a file descriptor (via centralized agent that controls /dev/liveupdate). The lifecycle of all preserved resources within a session is tied to this FD, ensuring automatic kernel cleanup if the controlling userspace agent crashes or exits unexpectedly. - File Preservation A handler-based framework allows specific file types (demonstrated here with memfd) to be preserved. Handlers manage the serialization, restoration, and lifecycle of their specific file types. - File-Lifecycle-Bound State A new mechanism for managing shared global state whose lifecycle is tied to the preservation of one or more files. This is crucial for subsystems like IOMMU or HugeTLB, where multiple file descriptors may depend on a single, shared underlying resource that must be preserved only once. - KHO Integration LUO drives the Kexec Handover framework programmatically to pass its serialized metadata to the next kernel. The LUO state is finalized and added to the kexec image just before the reboot is triggered. In the future this step will also be removed once stateless KHO is merged [6]. - Userspace Interface Control is provided via ioctl commands on /dev/liveupdate for creating and retrieving sessions, as well as on session file descriptors for managing individual files. - Testing The series includes a set of selftests, including userspace API validation, kexec-based lifecycle tests for various session and file scenarios, and a new in-kernel test module to validate the FLB logic. Introduce LUO, a mechanism intended to facilitate kernel updates while keeping designated devices operational across the transition (e.g., via kexec). The primary use case is updating hypervisors with minimal disruption to running virtual machines. For userspace side of hypervisor update we have copyless migration. LUO is for updating the kernel. This initial patch lays the groundwork for the LUO subsystem. Further functionality, including the implementation of state transition logic, integration with KHO, and hooks for subsystems and file descriptors, will be added in subsequent patches. Create a character device at /dev/liveupdate. A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary structures. The magic number for IOCTL is registered in Documentation/userspace-api/ioctl/ioctl-number.rst. Link: https://lkml.kernel.org/r/20251125165850.3389713-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20251125165850.3389713-2-pasha.tatashin@soleen.com Link: https://lore.kernel.org/all/20251018000713.677779-1-vipinsh@google.com/ [1] Link: https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@google.com [2] Link: https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel.org [3] Link: https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v8 [4] Link: https://tinyurl.com/luoddesign [5] Link: https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com [6] Link: https://lore.kernel.org/all/20251115233409.768044-1-pasha.tatashin@soleen.com [7] Link: https://github.com/soleen/linux/blob/luo/v8b03/diff.v7.v8 [8] Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: David Matlack <dmatlack@google.com> Cc: Aleksander Lobakin <aleksander.lobakin@intel.com> Cc: Alexander Graf <graf@amazon.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: anish kumar <yesanishhere@gmail.com> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Daniel Wagner <wagi@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guixin Liu <kanie@linux.alibaba.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lennart Poettering <lennart@poettering.net> Cc: Leon Romanovsky <leon@kernel.org> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Lukas Wunner <lukas@wunner.de> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Matthew Maurer <mmaurer@google.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Myugnjoo Ham <myungjoo.ham@samsung.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Song Liu <song@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: William Tu <witu@nvidia.com> Cc: Yoann Congal <yoann.congal@smile.fr> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-27 14:24:37 -08:00
Jakub Kicinski	db4029859d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/xdp/xsk.c `0ebc27a4c6` ("xsk: avoid data corruption on cq descriptor number") `8da7bea7db` ("xsk: add indirect call for xsk_destruct_skb") `30ed05adca` ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-27 12:19:08 -08:00
Paolo Abeni	73f784b2c9	linux-can-next-for-6.19-20251126 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmkm2EcTHG1rbEBwZW5n dXRyb25peC5kZQAKCRAMdGXf+ZCRnCT7B/98qQUzIef9I3Z0jsQhPmNHzyf/lO2d dym2qE3axbLfiUziC8FA997iDellCR/TqOHD1iVsfw2ifaiBwKx63ZOv3AY1ob0+ C0lM6qTBA8HNn1M9Ij4x96v+qz3dE5n0eyvKvDdEmiUL6gz9T+QItjTdph4WIyWL bpnPPUs75KLDtlTkztTJ807fStnCJMn6UP+tXcqjLE+1lZ9k+hpLH4jzprn4oVry vLU5TmbXkDZVL9MdpecExHE5mTV1f1Fch7TtliUoBY9CFZ58CSXkuSzu38F3j/xD nytAydXNy6wjDU5GkIca2g0MPWmBI3EF5QHvfnWBjxScQC6MdUxP5wrJ =F3+Y -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-6.19-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2025-11-26 this is a pull request of 27 patches for net-next/main. The first 17 patches are by Vincent Mailhol and Oliver Hartkopp and add CAN XL support to the CAN netlink interface. Geert Uytterhoeven and Biju Das provide 7 patches for the rcar_canfd driver to add suspend/resume support. The next 2 patches are by Markus Schneider-Pargmann and add them as the m_can maintainer. Conor Dooley's patch updates the mpfs-can DT bindungs. linux-can-next-for-6.19-20251126 * tag 'linux-can-next-for-6.19-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (27 commits) dt-bindings: can: mpfs: document resets MAINTAINERS: Simplify m_can section MAINTAINERS: Add myself as m_can maintainer can: rcar_canfd: Add suspend/resume support can: rcar_canfd: Convert to DEFINE_SIMPLE_DEV_PM_OPS() can: rcar_canfd: Invert CAN clock and close_candev() order can: rcar_canfd: Extract rcar_canfd_global_{,de}init() can: rcar_canfd: Use devm_clk_get_optional() for RAM clk can: rcar_canfd: Invert global vs. channel teardown can: rcar_canfd: Invert reset assert order can: dev: print bitrate error with two decimal digits can: raw: instantly reject unsupported CAN frames can: add dummy_can driver can: calc_bittiming: add can_calc_sample_point_pwm() can: calc_bittiming: add can_calc_sample_point_nrz() can: calc_bittiming: replace misleading "nominal" by "reference" can: netlink: add PWM netlink interface can: calc_bittiming: add PWM calculation can: bittiming: add PWM validation can: bittiming: add PWM parameters ... ==================== Link: https://patch.msgid.link/20251126120106.154635-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-27 15:45:17 +01:00
Alexander Duyck	39e138173a	net: pcs: xpcs: Fix PMA identifier handling in XPCS The XPCS driver was mangling the PMA identifier as the original code appears to have been focused on just capturing the OUI. Rather than store a mangled ID it is better to work with the actual PMA ID and instead just mask out the values that don't apply rather than shifting them and reordering them as you still don't get the original OUI for the NIC without having to bitswap the values as per the definition of the layout in IEEE 802.3-2022 22.2.4.3.1. By laying it out as it was in the hardware it is also less likely for us to have an unintentional collision as the enum values will occupy the revision number area while the OUI occupies the upper 22 bits. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374320920.959489.17267159479370601070.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-27 10:41:31 +01:00
Alexander Duyck	7622d55276	net: pcs: xpcs: Add support for 25G, 50G, and 100G interfaces With this change we are adding support for 25G, 50G, and 100G interface types to the XPCS driver. This had supposedly been enabled with the addition of XLGMII but I don't see any capability for configuration there so I suspect it may need to be refactored in the future. With this change we can enable the XPCS driver with the selected interface and it should be able to detect link, speed, and report the link status to the phylink interface. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374320248.959489.11649590675011158859.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-27 10:41:30 +01:00
Alexander Duyck	e6c43c9500	net: phy: Add MDIO_PMA_CTRL1_SPEED for 2.5G and 5G to reflect PMA values The 2.5G and 5G values are not consistent between the PCS CTRL1 and PMA CTRL1 values. In order to avoid confusion between the two I am updating the values to include "PMA" in the name similar to values used in similar places. To avoid breaking UAPI I have retained the original macros and just defined them as the new PMA based defines. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374319569.959489.6610469879021800710.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-27 10:41:30 +01:00
Gabriel Krisman Bertazi	5d24321e4c	io_uring: Introduce getsockname io_uring cmd Introduce a socket-specific io_uring_cmd to support getsockname/getpeername via io_uring. I made this an io_uring_cmd instead of a new operation to avoid polluting the command namespace with what is exclusively a socket operation. In addition, since we don't need to conform to existing interfaces, this merges the getsockname/getpeername in a single operation, since the implementation is pretty much the same. This has been frequently requested, for instance at [1] and more recently in the project Discord channel. The main use-case is to support fixed socket file descriptors. [1] https://github.com/axboe/liburing/issues/1356 Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-26 13:45:23 -07:00
Jason Gunthorpe	5185c4d8a5	Merge branch 'iommufd_dmabuf' into k.o-iommufd/for-next Jason Gunthorpe says: ==================== This series is the start of adding full DMABUF support to iommufd. Currently it is limited to only work with VFIO's DMABUF exporter. It sits on top of Leon's series to add a DMABUF exporter to VFIO: https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com/ The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but otherwise works the same as it does today for a memfd. The user can select a slice of the FD to map into the ioas and if the underliyng alignment requirements are met it will be placed in the iommu_domain. Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR memory from VFIO to an iommu_domain controlled by iommufd. This is used for PCI Peer to Peer support in VMs, and is the last feature that the VFIO type 1 container has that iommufd couldn't do. The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime control and is a use-after-free security problem. Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there should be no access to the MMIO it can shoot down the mapping in iommufd which will unmap it from the iommu_domain. There is no automatic remap, this is a safety protocol so the kernel doesn't get stuck. Userspace is expected to know it is doing something that will revoke the dmabuf and map/unmap it around the activity. Eg when QEMU goes to issue FLR it should do the map/unmap to iommufd. Since DMABUF is missing some key general features for this use case it relies on a "private interconnect" between VFIO and iommufd via the vfio_pci_dma_buf_iommufd_map() call. The call confirms the DMABUF has revoke semantics and delivers a phys_addr for the memory suitable for use with iommu_map(). Medium term there is a desire to expand the supported DMABUFs to include GPU drivers to support DPDK/SPDK type use cases so future series will work to add a general concept of revoke and a general negotiation of interconnect to remove vfio_pci_dma_buf_iommufd_map(). I also plan another series to modify iommufd's vfio_compat to transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI of type1. The latest series for interconnect negotation to exchange a phys_addr is: https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com And the discussion for design of revoke is here: https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/ ==================== Based on a shared branch with vfio. * iommufd_dmabuf: iommufd/selftest: Add some tests for the dmabuf flow iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE iommufd: Have iopt_map_file_pages convert the fd to a file iommufd: Have pfn_reader process DMABUF iopt_pages iommufd: Allow MMIO pages in a batch iommufd: Allow a DMABUF to be revoked iommufd: Do not map/unmap revoked DMABUFs iommufd: Add DMABUF to iopt_pages vfio/pci: Add vfio_pci_dma_buf_iommufd_map() vfio/nvgrace: Support get_dmabuf_phys vfio/pci: Add dma-buf export support for MMIO regions vfio/pci: Enable peer-to-peer DMA transactions by default vfio/pci: Share the core device pointer while invoking feature functions vfio: Export vfio device get and put registration helpers dma-buf: provide phys_vec to scatter-gather mapping routine PCI/P2PDMA: Document DMABUF model PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation PCI/P2PDMA: Simplify bus address mapping API PCI/P2PDMA: Separate the mmap() support from the core logic Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-11-26 14:04:10 -04:00
Nicolin Chen	81c45c62dc	iommu/arm-smmu-v3-iommufd: Allow attaching nested domain for GBPA cases A vDEVICE has been a hard requirement for attaching a nested domain to the device. This makes sense when installing a guest STE, since a vSID must be present and given to the kernel during the vDEVICE allocation. But, when CR0.SMMUEN is disabled, VM doesn't really need a vSID to program the vSMMU behavior as GBPA will take effect, in which case the vSTE in the nested domain could have carried the bypass or abort configuration in GBPA register. Thus, having such a hard requirement doesn't work well for GBPA. Skip vmaster allocation in arm_smmu_attach_prepare_vmaster() for an abort or bypass vSTE. Note that device on this attachment won't report vevents. Update the uAPI doc accordingly. Link: https://patch.msgid.link/r/20251103172755.2026145-1-nicolinc@nvidia.com Tested-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-11-26 14:04:04 -04:00
Randy Dunlap	f0fdaa4ad5	virt: acrn: split acrn_mmio_dev_res out of acrn_mmiodev Add struct acrn_mmio_dev_res before struct acrn_mmio_dev. The former is used in the latter and breaking them up provides better kernel-doc documentation for the struct members. Suggested-by: Fei Li <fei1.li@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Fei Li <fei1.li@intel.com> Link: https://patch.msgid.link/20251028040409.868254-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-11-26 15:09:24 +01:00
Vincent Mailhol	46552323fa	can: netlink: add PWM netlink interface When the TMS is switched on, the node uses PWM (Pulse Width Modulation) during the data phase instead of the classic NRZ (Non Return to Zero) encoding. PWM is configured by three parameters: - PWMS: Pulse Width Modulation Short phase - PWML: Pulse Width Modulation Long phase - PWMO: Pulse Width Modulation Offset time For each of these parameters, define three IFLA symbols: - IFLA_CAN_PWM_PWM_MIN: the minimum allowed value. - IFLA_CAN_PWM_PWM_MAX: the maximum allowed value. - IFLA_CAN_PWM_PWM: the runtime value. This results in a total of nine IFLA symbols which are all nested in a parent IFLA_CAN_XL_PWM symbol. IFLA_CAN_PWM_PWM_MIN and IFLA_CAN_PWM_PWM_MAX define the range of allowed values and will match the value statically configured by the device in struct can_pwm_const. IFLA_CAN_PWM_PWM match the runtime values stored in struct can_pwm. Those parameters may only be configured when the tms mode is on. If the PWMS, PWML and PWMO parameters are provided, check that all the needed parameters are present using can_validate_pwm(), then check their value using can_validate_pwm_bittiming(). PWMO defaults to zero if omitted. Otherwise, if CAN_CTRLMODE_XL_TMS is true but none of the PWM parameters are provided, calculate them using can_calc_pwm(). Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-11-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-11-26 11:20:43 +01:00
Vincent Mailhol	233134af20	can: netlink: add CAN_CTRLMODE_XL_TMS flag The Transceiver Mode Switching (TMS) indicates whether the CAN XL controller shall use the PWM or NRZ encoding during the data phase. The term "transceiver mode switching" is used in both ISO 11898-1 and CiA 612-2 (although only the latter one uses the abbreviation TMS). We adopt the same naming convention here for consistency. Add the CAN_CTRLMODE_XL_TMS flag to the list of the CAN control modes. Add can_validate_xl_flags() to check the coherency of the TMS flag. That function will be reused in upcoming changes to validate the other CAN XL flags. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-6-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-11-26 11:20:43 +01:00
Vincent Mailhol	e632816147	can: netlink: add initial CAN XL support CAN XL uses bittiming parameters different from Classical CAN and CAN FD. Thus, all the data bittiming parameters, including TDC, need to be duplicated for CAN XL. Add the CAN XL netlink interface for all the features which are common with CAN FD. Any new CAN XL specific features are added later on. The first time CAN XL is activated, the MTU is set by default to CANXL_MAX_MTU. The user may then configure a custom MTU within the CANXL_MIN_MTU to CANXL_MAX_MTU range, in which case, the custom MTU value will be kept as long as CAN XL remains active. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-5-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-11-26 11:20:43 +01:00
Vincent Mailhol	60f511f443	can: netlink: add CAN_CTRLMODE_RESTRICTED ISO 11898-1:2024 adds a new restricted operation mode. This mode is added as a mandatory feature for nodes which support CAN XL and is retrofitted as optional for legacy nodes (i.e. the ones which only support Classical CAN and CAN FD). The restricted operation mode is nearly the same as the listen only mode: the node can not send data frames or remote frames and can not send dominant bits if an error occurs. The only exception is that the node shall still send the acknowledgment bit. A second niche exception is that the node may still send a data frame containing a time reference message if the node is a primary time provider, but because the time provider feature is not yet implemented in the kernel, this second exception is not relevant to us at the moment. Add the CAN_CTRLMODE_RESTRICTED control mode flag and update the can_dev_dropped_skb() helper function accordingly. Finally, bail out if both CAN_CTRLMODE_LISTENONLY and CAN_CTRLMODE_RESTRICTED are provided. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-4-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-11-26 11:20:43 +01:00
Paolo Bonzini	236831743c	KVM guest_memfd changes for 6.19: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way. - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references. - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors. - Misc cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmkmSgcACgkQOlYIJqCj N/0MBQ/+KzI/6q6AR2m9jYCbz8APcJpmAEJ4Ma0+Q4XrJfkymAmt9P4M46bRZl5G Aznaqq9vak1weIzlaFsOzYqEWvSN54P/EfZKkqh0kCDKXGzl1HkSCC7FeThyqZz0 +uuWUiRtWI/dyNFEpIXB/G06DwqhMIKAlk421Zt84iBI/wz2oZeAgWoFCjPWca4a /L/ClmpzM6LnP/Hg2DyoZtfwAXIy65pb9h0IhKbvGcgSrS4sesZPiSV20KKvKVSp 4+WVLHuNbjk9vWKkmV8IZH+BXAO2J2+y2JYckbx4DvUKQcauXUJjFjp8+wZ4gMrC SK3SuWTTc3oe7fhaZZ98KY3BO9dq68iCbxpmdkhYrEbNqcDbQA5GMhIUZ2TgqDX7 KQ58s9zyHZvsX4cnL3XY9igsdl6tvqnFqVhGyBaxUrWNJDqAs3NPJr8Td3cnrMKg MRB2AaJN8xX6DK7JAxKQ4LC0Nb7w3hxF0t0XL2XzBw+6nsu67NjrM7EflIhG+mHJ YWhrnNh83Yut95cym5mpUU0kFqfHNB5SxRGGokDcQ1NAXBwwE2Tq+YopR9OKRfiZ 52/hkC195D7Xaeo9raf6iKN+YfiZ0uj3TvxuxIHs/EIqmmjhsc8VSwkfFW5OIeYf Tulmh/ohkq1675By/dapyHvW97PSgd0IqbDzjDrlxmbZbuuS1F8= =j3On -----END PGP SIGNATURE----- Merge tag 'kvm-x86-gmem-6.19' of https://github.com/kvm-x86/linux into HEAD KVM guest_memfd changes for 6.19: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way. - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references. - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors. - Misc cleanups.	2025-11-26 09:32:44 +01:00
Asbjørn Sloth Tønnesen	68e83f3472	tools: ynl-gen: add regeneration comment Add a comment on regeneration to the generated files. The comment is placed after the YNL-GEN line[1], as to not interfere with ynl-regen.sh's detection logic. [1] and after the optional YNL-ARG line. Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:20:42 -08:00
Randy Dunlap	1c6a92a5a5	wifi: nl80211: vendor-cmd: intel: fix a blank kernel-doc line warning Delete an empty line prevent a kernel-doc warning: Warning: ../include/uapi/linux/nl80211-vnd-intel.h:86 bad line: Fixes: `3d2a2544ea` ("nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251125022834.3171742-1-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-25 10:34:55 +01:00
Qu Wenruo	6b1ac78dd0	btrfs: implement shutdown ioctl The shutdown ioctl should follow the XFS one, which use magic number 'X', and ioctl number 125, with a uint32 as flags. For now btrfs don't distinguish DEFAULT and LOGFLUSH flags (just like f2fs), both will freeze the fs first (implies committing the current transaction), setting the SHUTDOWN flag and finally thaw the fs. For NOLOGFLUSH flag, the freeze/thaw part is skipped thus the current transaction is aborted. The new shutdown ioctl is hidden behind experimental features for more testing. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <asj@kernel.org> Tested-by: Anand Jain <asj@kernel.org> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 21:56:17 +01:00
Dave Penkler	e6ab504633	staging: gpib: Destage gpib Move the gpib drivers out of staging and into the "real" part of the kernel. This entails: - Remove the gpib Kconfig menu and Makefile build rule from staging. - Remove gpib/uapi from the header file search path in subdir-ccflags of the gpib Makefile - move the gpib/uapi files to include/uapi/linux - Move the gpib tree out of staging to drivers. - Remove the word "Linux" from the gpib Kconfig file. - Add the gpib Kconfig menu and Makefile build rule to drivers Signed-off-by: Dave Penkler <dpenkler@gmail.com> Link: https://patch.msgid.link/20251117144021.23569-5-dpenkler@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-11-24 17:52:11 +01:00
James Clark	cbbfba4847	perf: Add perf_event_attr::config4 Arm FEAT_SPE_FDS adds the ability to filter on the data source of a packet using another 64-bits of event filtering control. As the existing perf_event_attr::configN fields are all used up for SPE PMU, an additional field is needed. Add a new 'config4' field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:59:18 +00:00
Linus Torvalds	ebd975458d	Input updates for v6.18-rc6 - INPUT_PROP_HAPTIC_TOUCHPAD definition added early in 6.18 cycle has been renamed to INPUT_PROP_PRESSUREPAD to better reflect the kind of devices it is supposed to be set for - a new ID for a touchscreen found in Ayaneo Flip DS in Goodix driver - Goodix driver no longer tries to set reset pin as "input" as it causes issues when there is no pull up resistor installed on the board - fixes for cros_ec_keyb, imx_sc_key, and pegasus-notetaker drivers to deal with potential out-of-bounds access and memory corruption issues -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCaSFbUAAKCRBAj56VGEWX nC+zAQCmxIsTcA4GEPgelW0dRaaKCcCnr++0RQVW5hFJ+0VgTgD6AlzkMtdy5O30 91gz5ooMKSz5SMsgb97N2GlayGKezA0= =dMey -----END PGP SIGNATURE----- Merge tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - INPUT_PROP_HAPTIC_TOUCHPAD definition added early in 6.18 cycle has been renamed to INPUT_PROP_PRESSUREPAD to better reflect the kind of devices it is supposed to be set for - a new ID for a touchscreen found in Ayaneo Flip DS in Goodix driver - Goodix driver no longer tries to set reset pin as "input" as it causes issues when there is no pull up resistor installed on the board - fixes for cros_ec_keyb, imx_sc_key, and pegasus-notetaker drivers to deal with potential out-of-bounds access and memory corruption issues * tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: rename INPUT_PROP_HAPTIC_TOUCHPAD to INPUT_PROP_PRESSUREPAD Input: cros_ec_keyb - fix an invalid memory access Input: imx_sc_key - fix memory corruption on unload Input: pegasus-notetaker - fix potential out-of-bounds access Input: goodix - remove setting of RST pin to input Input: goodix - add support for ACPI ID GDIX1003	2025-11-22 09:58:41 -08:00
Oliver Neukum	a67df6d1b9	uapi: cdc.h: cleanly provide for more interfaces and countries The spec requires at least one interface respectively country. It allows multiple ones. This needs to be clearly said in the UAPI. This is subject to sanity checking in cdc_parse_cdc_header(), thus we can trust the length. Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/20251111134641.4118827-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-11-21 15:12:12 +01:00
Janosch Frank	8e8678e740	KVM: s390: Add capability that forwards operation exceptions Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation exceptions to user space. This also includes the 0x0000 instructions managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants to emulate instructions which do not (yet) have an opcode. While we're at it refine the documentation for KVM_CAP_S390_USER_INSTR0. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>	2025-11-21 10:26:03 +01:00
Leon Romanovsky	5d74781ebc	vfio/pci: Add dma-buf export support for MMIO regions Add support for exporting PCI device MMIO regions through dma-buf, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. Currently VFIO can take MMIO regions from the device's BAR and map them into a PFNMAP VMA with special PTEs. This mapping type ensures the memory cannot be used with things like pin_user_pages(), hmm, and so on. In practice only the user process CPU and KVM can safely make use of these VMA. When VFIO shuts down these VMAs are cleaned by unmap_mapping_range() to prevent any UAF of the MMIO beyond driver unbind. However, VFIO type 1 has an insecure behavior where it uses follow_pfnmap_*() to fish a MMIO PFN out of a VMA and program it back into the IOMMU. This has a long history of enabling P2P DMA inside VMs, but has serious lifetime problems by allowing a UAF of the MMIO after the VFIO driver has been unbound. Introduce DMABUF as a new safe way to export a FD based handle for the MMIO regions. This can be consumed by existing DMABUF importers like RDMA or DRM without opening an UAF. A following series will add an importer to iommufd to obsolete the type 1 code and allow safe UAF-free MMIO P2P in VM cases. DMABUF has a built in synchronous invalidation mechanism called move_notify. VFIO keeps track of all drivers importing its MMIO and can invoke a synchronous invalidation callback to tell the importing drivers to DMA unmap and forget about the MMIO pfns. This process is being called revoke. This synchronous invalidation fully prevents any lifecycle problems. VFIO will do this before unbinding its driver ensuring there is no UAF of the MMIO beyond the driver lifecycle. Further, VFIO has additional behavior to block access to the MMIO during things like Function Level Reset. This is because some poor platforms may experience a MCE type crash when touching MMIO of a PCI device that is undergoing a reset. Today this is done by using unmap_mapping_range() on the VMAs. Extend that into the DMABUF world and temporarily revoke the MMIO from the DMABUF importers during FLR as well. This will more robustly prevent an errant P2P from possibly upsetting the platform. A DMABUF FD is a preferred handle for MMIO compared to using something like a pgmap because: - VFIO is supported, including its P2P feature, on archs that don't support pgmap - PCI devices have all sorts of BAR sizes, including ones smaller than a section so a pgmap cannot always be created - It is undesirable to waste a lot of memory for struct pages, especially for a case like a GPU with ~100GB of BAR size - We want a synchronous revoke semantic to support FLR with light hardware requirements Use the P2P subsystem to help generate the DMA mapping. This is a significant upgrade over the abuse of dma_map_resource() that has historically been used by DMABUF exporters. Experience with an OOT version of this patch shows that real systems do need this. This approach deals with all the P2P scenarios: - Non-zero PCI bus_offset - ACS flags routing traffic to the IOMMU - ACS flags that bypass the IOMMU - though vfio noiommu is required to hit this. There will be further work to formalize the revoke semantic in DMABUF. For now this acts like a move_notify dynamic exporter where importer fault handling will get a failure when they attempt to map. This means that only fully restartable fault capable importers can import the VFIO DMABUFs. A future revoke semantic should open this up to more HW as the HW only needs to invalidate, not handle restartable faults. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-10-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>	2025-11-20 21:12:19 -07:00
Daniel Zahka	2a367002ed	devlink: support default values for param-get and param-set Support querying and resetting to default param values. Introduce two new devlink netlink attrs: DEVLINK_ATTR_PARAM_VALUE_DEFAULT and DEVLINK_ATTR_PARAM_RESET_DEFAULT. The former is used to contain an optional parameter value inside of the param_value nested attribute. The latter is used in param-set requests from userspace to indicate that the driver should reset the param to its default value. To implement this, two new functions are added to the devlink driver api: devlink_param::get_default() and devlink_param::reset_default(). These callbacks allow drivers to implement default param actions for runtime and permanent cmodes. For driverinit params, the core latches the last value set by a driver via devl_param_driverinit_value_set(), and uses that as the default value for a param. Because default parameter values are optional, it would be impossible to discern whether or not a param of type bool has default value of false or not provided if the default value is encoded using a netlink flag type. For this reason, when a DEVLINK_PARAM_TYPE_BOOL has an associated default value, the default value is encoded using a u8 type. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-4-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 19:01:22 -08:00
Yael Chemla	491c5dc98b	net: ethtool: Add support for 1600Gbps speed Add support for 1600Gbps link modes based on 200Gbps per lane [1]. This includes the adopted IEEE 802.3dj copper and optical PMDs that use 200G/lane signaling [2]. Add the following PMD types: - KR8 (backplane) - CR8 (copper cable) - DR8 (SMF 500m) - DR8-2 (SMF 2km) These modes are defined in the 802.3dj specifications. References: [1] https://www.ieee802.org/3/dj/public/23_03/opsasnick_3dj_01a_2303.pdf [2] https://www.ieee802.org/3/dj/projdoc/objectives_P802d3dj_240314.pdf Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/1763585297-1243980-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 18:21:29 -08:00
Linus Torvalds	6ba3bb3348	platform-drivers-x86 for v6.18-4 Fixes - acer-wmi: Ignore backlight event - alienware-wmi-wmax: Fix quirk match table order & drop redundant entries - amd/pmc: - Add Xbox Ally to spurious 8042 quirk list - Quirk list Lenovo Legion Go 2 NVMe resume - msi-wmi-platform: - Correct GUID to uppercase - GUID is uncleverly copy-pasted from an example so add a DMI whitelist - intel/speed_select_if: PCIBIOS_* return code conversion - intel-uncore-freq & ISST: Fix kernel doc warnings New HW support - alienware-wmi-wmax: - Alienware 16 Aurora support - Alienware M support - Alienware X support - Dell G support - amd/pmc: - ROG Xbox Ally (non-X) support - huaway-wmi: HONOR MagicBoox X16/X14 PrintScreen & YOYO keys - hp-wmi: - Omen 16-wf1xxx fan support - Omen MAX 16-ah0xx fan + thermal profile support - Victus 16-r0 and 16-s0 fan + thermal profile support - intel/hid: Intel Nova Lake support - intel-uncore-freq: - Intel Panther Lake support - Intel Wildcat Lake support - Intel Nova Lake support The following is an automated shortlog grouped by driver: acer-wmi: - Ignore backlight event alienware-wmi-wmax: - Add AWCC support to Alienware 16 Aurora - Add support for the whole "G" family - Add support for the whole "M" family - Add support for the whole "X" family - Drop redundant DMI entries - Fix "Alienware m16 R1 AMD" quirk order amd: pmc: - Add Lenovo Legion Go 2 to pmc quirk list amd/pmc: - Add spurious_8042 to Xbox Ally - Add support for Van Gogh SoC hp-wmi: - Add Omen 16-wf1xxx fan support - Add Omen MAX 16-ah0xx fan support and thermal profile - mark Victus 16-r0 and 16-s0 for victus_s fan and thermal profile support huawei-wmi: - add keys for HONOR models intel/hid: - Add Nova Lake support intel/speed_select_if: - Convert PCIBIOS_* return codes to errnos intel-uncore-freq: - Add additional client processors - fix all header kernel-doc warnings ISST: isst_if.h: - fix all kernel-doc warnings msi-wmi-platform: - Fix typo in WMI GUID - Only load on MSI devices -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaR9GsgAKCRBZrE9hU+XO MU2qAQD8C75ebO/ltd9LZ4oTzfe0P6kmV+28z0D97tFsFmt0hwEA/fZHU/bV1+3c +gRnoLCl4yPh214OgWnmsciPhW3iyAM= =AtKC -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "This one has lots of new HW entries which adds to the size in diffstat but the individual changes are simple. Fixes - acer-wmi: Ignore backlight event - alienware-wmi-wmax: Fix quirk match table order & drop redundant entries - amd/pmc: - Add Xbox Ally to spurious 8042 quirk list - Quirk list Lenovo Legion Go 2 NVMe resume - msi-wmi-platform: - Correct GUID to uppercase - GUID is uncleverly copy-pasted from an example so add a DMI whitelist - intel/speed_select_if: PCIBIOS_* return code conversion - intel-uncore-freq & ISST: Fix kernel doc warnings New HW support - alienware-wmi-wmax: - Alienware 16 Aurora support - Alienware M support - Alienware X support - Dell G support - amd/pmc: - ROG Xbox Ally (non-X) support - huaway-wmi: HONOR MagicBoox X16/X14 PrintScreen & YOYO keys - hp-wmi: - Omen 16-wf1xxx fan support - Omen MAX 16-ah0xx fan + thermal profile support - Victus 16-r0 and 16-s0 fan + thermal profile support - intel/hid: Intel Nova Lake support - intel-uncore-freq: - Intel Panther Lake support - Intel Wildcat Lake support - Intel Nova Lake support" * tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (21 commits) platform/x86: intel-uncore-freq: fix all header kernel-doc warnings platform/x86: acer-wmi: Ignore backlight event platform/x86/intel/speed_select_if: Convert PCIBIOS_* return codes to errnos platform/x86/intel/hid: Add Nova Lake support platform/x86: alienware-wmi-wmax: Add AWCC support to Alienware 16 Aurora platform/x86: hp-wmi: Add Omen MAX 16-ah0xx fan support and thermal profile platform/x86: msi-wmi-platform: Fix typo in WMI GUID platform/x86: msi-wmi-platform: Only load on MSI devices platform/x86/amd: pmc: Add Lenovo Legion Go 2 to pmc quirk list platform/x86/amd/pmc: Add spurious_8042 to Xbox Ally platform/x86/amd/pmc: Add support for Van Gogh SoC platform/x86: alienware-wmi-wmax: Add support for the whole "G" family platform/x86: alienware-wmi-wmax: Add support for the whole "X" family platform/x86: alienware-wmi-wmax: Add support for the whole "M" family platform/x86: alienware-wmi-wmax: Drop redundant DMI entries platform/x86: alienware-wmi-wmax: Fix "Alienware m16 R1 AMD" quirk order platform/x86: ISST: isst_if.h: fix all kernel-doc warnings platform/x86: intel-uncore-freq: Add additional client processors platform/x86: hp-wmi: Add Omen 16-wf1xxx fan support platform/x86: huawei-wmi: add keys for HONOR models ...	2025-11-20 09:39:34 -08:00
Jakub Kicinski	9e203721ec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile `e1bb28bf13` ("selftest: af_unix: Add test for SO_PEEK_OFF.") `45a1cd8346` ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-20 09:13:26 -08:00
Linus Torvalds	23cb64fb76	soc: fixes for 6.18, part 3 These are mainly devicetree fixes for the arm platforms from Rockchips NXP, ASpeed and Broadcom, addressing issues with accidental overclocking, pinctrl, network and dtc warnings. There are additional fixes for regressions with the i.MX reset and memory controller drivers as well as the Tegra memory controller driver Minor updates to the MAINTAINERS file, tee documentation and defconfigs bring those up to date with recent changes elsewhere. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmkd2xgACgkQmmx57+YA GNki7g//aJ2V6J9T4qpg7puhACrmbv4WWBynSiC/hImUcsN7rf93GaYDL9NzKm1q 6OWdOnvJIlinyyCFob0PxBrtNZprzVzCSL2bRlQx+q5FKrzJoW1aHpOcyuSANMan 2UXAjbL2YobMj4uZ9SrnkzxO+Cd4gIuJJSH1HPLuGOwR3VSeiXUMhsrVxMB2g1Ch r8a1WhuRhnPy1qEKGbUp11CO1gnzG+wSmQOYUf+4mIIioax2OIr39eQlkY6NdOfl 4OAWfrcbE0C7DaiURTB0xVgHGQKVw7Z1KemIl+uN9uF2YJUHbdZs0wnX/K+o8afm wxnSsmcFfXDgthXlvpt4LkU3j94CtYbAIDYL/Xp6O9nuYekDtpPjRmQHpbrt6b0V HRNf70ePm2nMX7QwWnC/L/I7AW6OY2m74WPjmlKJ19jKLNHcVa3G9K7uafc2lZbB w7/UdMriXaJ2rcZsg2RG/Qo+SRMuB3+uFMQjvM6JwRKV2gaJi4v2JqD7/te9tbGl FjbBc5aGVJmRiFuDhjO+Z6NTRlKWQpTljLoYzY1rmyXejAbMfrBnvOYO4avJ/ojE xL6UjuTZeyQ5EwkmMUzrI5vzSx0AL29QwSz4Sf7B5pkQYu8YddbMneKWjmAYleWs VQ5OsU2x84OIQgt2ORCZJd3p/FIOGnckT1Tl3xkZZ1ttdEvJOwQ= =5SRf -----END PGP SIGNATURE----- Merge tag 'soc-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "These are mainly devicetree fixes for the arm platforms from Rockchips NXP, ASpeed and Broadcom, addressing issues with accidental overclocking, pinctrl, network and dtc warnings. There are additional fixes for regressions with the i.MX reset and memory controller drivers as well as the Tegra memory controller driver. Minor updates to the MAINTAINERS file, tee documentation and defconfigs bring those up to date with recent changes elsewhere" * tag 'soc-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits) MAINTAINERS: sync omap devicetree maintainers with omap platform MAINTAINERS: Update Krzysztof Kozlowski's email arm64: dts: rockchip: fix PCIe 3.3V regulator voltage on orangepi-5 arm64: dts: rockchip: disable HS400 on RK3588 Tiger arm64: dts: rockchip: drop reset from rk3576 i2c9 node tee: <uapi/linux/tee.h: fix all kernel-doc issues arm64: dts: rockchip: Fix USB power enable pin for BTT CB2 and Pi2 arm64: dts: broadcom: bcm2712: rpi-5: Add ethernet0 alias arm64: dts: broadcom: Assign clock rates in eth node for RPi5 reset: imx8mp-audiomix: Fix bad mask values ARM: dts: BCM53573: Fix address of Luxul XAP-1440's Ethernet PHY arm64: defconfig: Fix V3D deferred probe timeout arm64: dts: rockchip: Fix vccio4-supply on rk3566-pinetab2 arm64: dts: rockchip: include rk3399-base instead of rk3399 in rk3399-op1 arm64: dts: imx8mp-kontron: Fix USB OTG role switching arm64: dts: imx95: Fix MSI mapping for PCIe endpoint nodes arm64: dts: imx8-ss-img: Avoid gpio0_mipi_csi GPIOs being deferred arm: imx_v6_v7_defconfig: enable ext4 directly memory: tegra210: Fix incorrect client ids arm64: dts: rockchip: Fix indentation on rk3399 haikou demo dtso ...	2025-11-19 09:36:04 -08:00
Peter Hutterer	ae8966b7b5	Input: rename INPUT_PROP_HAPTIC_TOUCHPAD to INPUT_PROP_PRESSUREPAD And expand it to encompass all pressure pads. Definition: "pressure pad" as used here as includes all touchpads that use physical pressure to convert to click, without physical hinges. Also called haptic touchpads in general parlance, Synaptics calls them ForcePads. Most (all?) pressure pads are currently advertised as INPUT_PROP_BUTTONPAD. The suggestion to identify them as pressure pads by defining the resolution on ABS_MT_PRESSURE has been in the docs since commit `20ccc8dd38` ("Documentation: input: define ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams") but few devices provide this information. In userspace it's thus impossible to determine whether a device is a true pressure pad (pressure equals pressure) or a normal clickpad with (pressure equals finger size). Commit `7075ae4ac9` ("Input: add INPUT_PROP_HAPTIC_TOUCHPAD") introduces INPUT_PROP_HAPTIC_TOUCHPAD but restricted it to those touchpads that have support for userspace-controlled effects. Let's expand and rename that definition to include all pressure pad touchpads since those that do support FF effects can be identified by the presence of the FF_HAPTIC bit. This means: - clickpad: INPUT_PROP_BUTTONPAD - pressurepad: INPUT_PROP_BUTTONPAD + INPUT_PROP_PRESSUREPAD - pressurepad with configurable haptics: INPUT_PROP_BUTTONPAD + INPUT_PROP_PRESSUREPAD + FF_HAPTIC Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net> Acked-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://patch.msgid.link/20251106114534.GA405512@tassie Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2025-11-17 23:18:32 -08:00
Linus Torvalds	e7c375b181	vfs-6.18-rc7.fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaRtBJwAKCRCRxhvAZXjc ou5CAQCJb5y2ULKklblICU1wR7Nr15WvTW7VVOcv44RJ22S3NgEAy4DLDBFBw8zC 8e7Hp8gxbjsq8ZJmU088aobFcqbZOwk= =TAnu -----END PGP SIGNATURE----- Merge tag 'vfs-6.18-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix unitialized variable in statmount_string() - Fix hostfs mounting when passing host root during boot - Fix dynamic lookup to fail on cell lookup failure - Fix missing file type when reading bfs inodes from disk - Enforce checking of sb_min_blocksize() calls and update all callers accordingly - Restore write access before closing files opened by open_exec() in binfmt_misc - Always freeze efivarfs during suspend/hibernate cycles - Fix statmount()'s and listmount()'s grab_requested_mnt_ns() helper to actually allow mount namespace file descriptor in addition to mount namespace ids - Fix tmpfs remount when noswap is specified - Switch Landlock to iput_not_last() to remove false-positives from might_sleep() annotations in iput() - Remove dead node_to_mnt_ns() code - Ensure that per-queue kobjects are successfully created * tag 'vfs-6.18-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: landlock: fix splats from iput() after it started calling might_sleep() fs: add iput_not_last() shmem: fix tmpfs reconfiguration (remount) when noswap is set fs/namespace: correctly handle errors returned by grab_requested_mnt_ns power: always freeze efivarfs binfmt_misc: restore write access before closing files opened by open_exec() block: add __must_check attribute to sb_min_blocksize() virtio-fs: fix incorrect check for fsvq->kobj xfs: check the return value of sb_min_blocksize() in xfs_fs_fill_super isofs: check the return value of sb_min_blocksize() in isofs_fill_super exfat: check return value of sb_min_blocksize in exfat_read_boot_sector vfat: fix missing sb_min_blocksize() return value checks mnt: Remove dead code which might prevent from building bfs: Reconstruct file type when loading from disk afs: Fix dynamic lookup to fail on cell lookup failure hostfs: Fix only passing host root in boot stage with new mount fs: Fix uninitialized 'offp' in statmount_string()	2025-11-17 09:11:27 -08:00
Muminul Islam	c91fe5f162	mshv: Extend create partition ioctl to support cpu features The existing mshv create partition ioctl does not provide a way to specify which cpu features are enabled in the guest. Instead, it attempts to enable all features and those that are not supported are silently disabled by the hypervisor. This was done to reduce unnecessary complexity and is sufficient for many cases. However, new scenarios require fine-grained control over these features. Define a new mshv_create_partition_v2 structure which supports passing the disabled processor and xsave feature bits through to the create partition hypercall directly. Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables the new structure. If unset, the original mshv_create_partition struct is used, with the old behavior of enabling all features. Co-developed-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Muminul Islam <muislam@microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Magnus Kulke	f91bc8f61a	mshv: Allow mappings that overlap in uaddr Currently the MSHV driver rejects mappings that would overlap in userspace. Some VMMs require the same memory to be mapped to different parts of the guest's address space, and so working around this restriction is difficult. The hypervisor itself doesn't prohibit mappings that overlap in uaddr, (really in SPA; system physical addresses), so supporting this in the driver doesn't require any extra work: only the checks need to be removed. Since no userspace code until now has been able to overlap regions in userspace, relaxing this constraint can't break any existing code. Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:17 +00:00
Alexei Starovoitov	e47b68bda4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc5+ Cross-merge BPF and other fixes after downstream PR. Minor conflict in kernel/bpf/helpers.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-14 17:43:41 -08:00
Arnd Bergmann	124c98b100	TEE kernel-doc fixes for v6.18 -----BEGIN PGP SIGNATURE----- iQJOBAABCgA4FiEE0qerISgy2SKkqO79Wr/6JGat8H4FAmkW2ocaHGplbnMud2lr bGFuZGVyQGxpbmFyby5vcmcACgkQWr/6JGat8H4eYhAAneMo7XeOE2939WbTrFUJ yF34fhCiKpxZ/IHqDJzCa1gVdEFW7oyedwOH+Q4k61ch7KmbvL3syilrrhch/Gai 3K+B5500new57tl8DoKGuO0w2e6Oxw1ZrNzw+aJSflckgyHYXqRtKD7B5PHYyWjN tDm6F2CNpgkcuV7V8EBHCPWiRZPqx3BcH9ghtLPlK7B75i6+o8oWWjqqxhB0OxB2 WhFM7ny73JaN6kCUeAz+Hx7OHE7Y3qHqg5bdC2q3rHOnWWjEt8YILUYZBPw8Z0W2 lYUmOgbCxaTTM0CbV3Ye4m2YL6jffqUv9JaHxDzoeffhuKiQd/6PfzB3N7n849c6 6SO72nMzYijZg2aXpJYrlgW2/6/WuThH2qUYvex21N61+yXPeKYoFDh2HBs9uMPj 46MZ7ExKkZiRi53LkGUjH0KxOABlmPO510wiPlk/uvOm+hlGEgku070jKahGDunW pGVAqri8sowqkjNo9EPkv2in9PBjZsf3Q4lH0uQ2a9plD3ZP94Hp3j9qvtb4Gepv 5kd/OBDzbY8OkXDgEJj+VJIfHc8zxeQ8PUe28icsX3inoddgdJYXg3dPjx69tfv9 g1AAxGIAmtx2CTv6UNep9sNBpx1blzV4ThFURKobvzzXTZ+HK2HrzKrsjRfKe4bv PHVCUDd7RkujkcU4gOvCFIg= =xhar -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmkXnXkACgkQmmx57+YA GNnZDw//d+5ko5rX/5QPpnh0BrHEFdsUYDDcbjPOJ+z0YJcXM9MnXozX0ubTK1HT B80yTG0mS50kBuRf9ZdIt3WFHsXgJfr8b1LLhLOrPBVgns7RTRLlFvr7CjLmSooM IEcHj7TrUTXS30XE2gHON2rhwlCwNxwht/kQlomyEf/6zHMSHqi1xXDsXeySHEkB QA6VkSLZOLMsgcqvwFUozv9DV/a+PVMv1LJvvAgFuvnpCacydyhBGJWzPFdUkFgk qfbO/OezPbXwt/XJTDIxbp6u5w1q0RhCXFCrg+WWcwPCBjS1yBa6o1apz00kjl0M RnZyR4F5Lg0vmDKTYePBA+ELdjWNrcAuCn9/kmm9E/taN59uqf7uF9X9oXXUlTQu eFGmDs9yWonX6KqjBTRKVCfYCagimpNQQ2xao1VDq58BxrpfjWfAJltgYxiTokdU BDuWsc9084xlH/fO9L+2WxW8weLYOaV+NE68zD6y8PyoPle7mA5XlSnFTuHi8gm5 LXCj7+PZ5sT2X0VS++U+Sx1TtEOwD5bYkiEwhrTe6+dumzAKl9/vIKriGgCAv97x Leahlh3nN7v7JIpGs0BZZ61XcvUOBARK7ceLvIRSaNrDzXmnbixeVRAIrenqCPXE aa0/vch4MyctXPHtN/TyhQI1hmD3bggx9UqtN0XnCT4PuwnZDT8= =nT6/ -----END PGP SIGNATURE----- Merge tag 'tee-fix-for-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee into arm/fixes TEE kernel-doc fixes for v6.18 * tag 'tee-fix-for-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee: tee: <uapi/linux/tee.h: fix all kernel-doc issues Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2025-11-14 22:22:00 +01:00
Linus Torvalds	ac9f4f306d	io_uring-6.18-20251113 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmkWeqcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplM6D/4jSqj7PedcnEDHNKHJR/0v5wSQWMugD0hg Xn1EFnmPJMLBacZrkobL3v6aCvA4WYQManc1VPdC94b9ofdbjDMVtcIDoljBXuAD A94kk4TiIsHzKn8Un5bprC20Xc/v1pQ3uFERtK7mnonGb6n+/RITIQuWqn/Fey3N 58K7njnInQXDS2U6rx4QTdyuyrg6EHHIoNh1GmKdQgSY1kcJnQz4C+Yrh3qIBlTV AEWkoIwSp15hXvmn8J4nvbbFEZypg0qTaBREdc7KXt8kPMNCLqQ1Z4FOXYDt0qo1 zm88Z/08mwXbxFLtfse8JJ7XHPgy1+G0rXmdwv4F6CEtCyklnChLynrTpqUK5FCH pvuzOfA3tPcyg6dgc7X2z40u3g+vISydjprs9m5eL6hgovDhH+EfDGzpWxUzuuNe AXNXb3N9ZrxZePGfAweW2BxZ+D2OwYdQjKWSrL+438Tb3Q8QiiYCwM7eLyl0kXKH QpjJJsbIEJgLXYRVHE5tu0XdMdYsZYZYUv0wVACQ0hw+EtgaNi2yjySBvRJUJPzZ sL6Lnu5UzbEqSf1y8YGcdCCvOgHc6CHFAdKMgz0CXUcRC+ftmGz4iEXhGJJdMemQ 6WRLvsRykF1FO93dpfaIjWv9k/d40cGt5LrjiINfb36NDlqBth+EY+HbSJHBDI54 p2wDeoP3Kw== =sd1e -----END PGP SIGNATURE----- Merge tag 'io_uring-6.18-20251113' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - Use the actual segments in a request when for bvec based buffers - Fix an odd case where the iovec might get leaked for a read/write request, if it was newly allocated, overflowed the alloc cache, and hit an early error - Minor tweak to the query API added in this release, returning the number of available entries * tag 'io_uring-6.18-20251113' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs io_uring/query: return number of available queries io_uring/rw: ensure allocated iovec gets cleared for early failure	2025-11-14 09:57:30 -08:00
Daniel Scally	08a99369f4	media: uapi: Add parameters structs to mali-c55-config.h Add structures describing the ISP parameters to mali-c55-config.h Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Nayden Kanchev <nayden.kanchev@arm.com> Co-developed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:49 +01:00
Jacopo Mondi	1ab3cb233d	media: mali-c55: Add image formats for Mali-C55 parameters buffer Add a new V4L2 meta format code for the Mali-C55 parameters. Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Nayden Kanchev <nayden.kanchev@arm.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:49 +01:00
Daniel Scally	c7f832f6f8	media: uapi: Add 3a stats buffer for mali-c55 Describe the format of the 3A statistics buffers in the userspace API header for the mali-c55 ISP. Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Nayden Kanchev <nayden.kanchev@arm.com> Co-developed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:49 +01:00
Daniel Scally	4d36f73236	media: Add MALI_C55_3A_STATS meta format Add a new meta format for the Mali-C55 ISP's 3A Statistics along with a new descriptor entry. Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Nayden Kanchev <nayden.kanchev@arm.com> Co-developed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:49 +01:00
Daniel Scally	8d0bbed21e	media: uapi: Add controls for Mali-C55 ISP Add definitions and documentation for the custom control that will be needed by the Mali-C55 ISP driver. This will be a read only bitmask of the driver's capabilities, informing userspace of which blocks are fitted and which are absent. Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:49 +01:00
Daniel Scally	2477ab0376	media: uapi: Add 20-bit bayer formats The Mali-C55 requires input data be in 20-bit format, MSB aligned. Add some new media bus format macros to represent that input format. Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Co-developed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:49 +01:00
Daniel Scally	ec4ac3cb71	media: uapi: Add MEDIA_BUS_FMT_RGB202020_1X60 format code The Mali-C55 ISP by ARM requires 20-bits per colour channel input on the bus. Add a new media bus format code to represent it. Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Nayden Kanchev <nayden.kanchev@arm.com> Co-developed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:49 +01:00
Jacopo Mondi	4566208285	media: uapi: Convert Amlogic C3 to V4L2 extensible params With the introduction of common types for extensible parameters format, convert the c3-isp-config.h header to use the new types. Factor-out the documentation that is now part of the common header and only keep the driver-specific on in place. The conversion to use common types doesn't impact userspace as the new types are either identical to the ones already existing in the C3 ISP uAPI or are 1-to-1 type convertible. Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Keke Li <keke.li@amlogic.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:48 +01:00
Jacopo Mondi	1e8152db64	media: uapi: Convert RkISP1 to V4L2 extensible params With the introduction of common types for extensible parameters format, convert the rkisp1-config.h header to use the new types. Factor out the documentation that is now part of the common header and only keep the driver-specific on in place. The conversion to use common types doesn't impact userspace as the new types are either identical to the ones already existing in the RkISP1 uAPI or are 1-to-1 type convertible. Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Michael Riesch <michael.riesch@collabora.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:48 +01:00
Jacopo Mondi	e36dbd1cf3	media: uapi: Introduce V4L2 generic ISP types Introduce v4l2-isp.h in the Linux kernel uAPI. The header includes types for generic ISP configuration parameters and will be extended in the future with support for generic ISP statistics formats. Generic ISP parameters support is provided by introducing two new types that represent an extensible and versioned buffer of ISP configuration parameters. The v4l2_params_buffer represents the container for the ISP configuration data block. The generic type is defined with a 0-sized data member that the ISP driver implementations shall properly size according to their capabilities. The v4l2_params_block_header structure represents the header to be prepend to each ISP configuration block. Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Michael Riesch <michael.riesch@collabora.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>	2025-11-14 15:48:48 +01:00
Jakub Kicinski	c99ebb6132	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c `96a9178a29` ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") `61b7ade9ba` ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 12:35:38 -08:00
David Wei	00d9148127	io_uring/zcrx: share an ifq between rings Add a way to share an ifq from a src ring that is real (i.e. bound to a HW RX queue) with other rings. This is done by passing a new flag IORING_ZCRX_IFQ_REG_IMPORT in the registration struct io_uring_zcrx_ifq_reg, alongside the fd of an exported zcrx ifq. Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-13 11:19:37 -07:00
Pavel Begunkov	d7af80b213	io_uring/zcrx: export zcrx via a file Add an option to wrap a zcrx instance into a file and expose it to the user space. Currently, users can't do anything meaningful with the file, but it'll be used in a next patch to import it into another io_uring instance. It's implemented as a new op called ZCRX_CTRL_EXPORT for the IORING_REGISTER_ZCRX_CTRL registration opcode. Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-13 11:19:37 -07:00
Pavel Begunkov	475eb39b00	io_uring/zcrx: add sync refill queue flushing Add an zcrx interface via IORING_REGISTER_ZCRX_CTRL that forces the kernel to flush / consume entries from the refill queue. Just as with the IORING_REGISTER_ZCRX_REFILL attempt, the motivation is to address cases where the refill queue becomes full, and the user can't return buffers and needs to stash them. It's still a slow path, and the user should size refill queue appropriately, but it should be helpful for handling temporary traffic spikes and other unpredictable conditions. The interface is simpler comparing to ZCRX_REFILL as it doesn't need temporary refill entry arrays and gives natural batching, whereas ZCRX_REFILL requires even more user logic to be somewhat efficient. Also, add a structure for the operation. It's not currently used but can serve for future improvements like limiting the number of buffers to process, etc. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-13 11:19:37 -07:00
Pavel Begunkov	d663976dad	io_uring/zcrx: introduce IORING_REGISTER_ZCRX_CTRL It'll be annoying and take enough of boilerplate code to implement new zcrx features as separate io_uring register opcode. Introduce IORING_REGISTER_ZCRX_CTRL that will multiplex such calls to zcrx. Note, there are no real users of the opcode in this patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-13 11:19:37 -07:00
Pavel Begunkov	4aaa9bc4d5	io_uring/query: introduce rings info query Same problem as with zcrx in the previous patch, the user needs to know SQ/CQ header sizes to allocated memory before setup to use it for user provided rings, i.e. IORING_SETUP_NO_MMAP, however that information is only returned after registration, hence the user is guessing kernel implementation details. Return the header size and alignment, which is split with the same motivation, to allow the user to know the real structure size without alignment in case there will be more flexible placement schemes in the future. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-13 11:17:36 -07:00
Pavel Begunkov	2647e2ecc0	io_uring/query: introduce zcrx query Add a new query type IO_URING_QUERY_ZCRX returning the user some basic information about the interface, which includes allowed flags for areas and registration and supported IORING_REGISTER_ZCRX_CTRL subcodes. There is also a chicken-egg problem with user provided refill queue memory, where offsets and size information is returned after registration, but to properly allocate memory you need to know it beforehand, which is why the userspace currently has to guess the RQ headers size and severely overestimates it. Return the size information. It's split into "size" and "alignment" fields because for default placement modes the user is interested in the aligned size, however if it gets support for more flexible placement, it'll need to only know the actual header size. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-13 11:17:36 -07:00
Jens Axboe	ecb8490b2f	Merge branch 'io_uring-6.18' into for-6.19/io_uring Merge 6.18-rc io_uring fixes, as certain coming changes depend on some of these. * io_uring-6.18: io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs io_uring/query: return number of available queries io_uring/rw: ensure allocated iovec gets cleared for early failure io_uring: fix regbuf vector size truncation io_uring: fix types for region size calulation io_uring/zcrx: remove sync refill uapi io_uring: fix buffer auto-commit for multishot uring_cmd io_uring: correct __must_hold annotation in io_install_fixed_file io_uring zcrx: add MAINTAINERS entry io_uring: Fix code indentation error io_uring/sqpoll: be smarter on when to update the stime usage io_uring/sqpoll: switch away from getrusage() for CPU accounting io_uring: fix incorrect unlikely() usage in io_waitid_prep() Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-13 07:26:37 -07:00
Andrei Vagin	78f0e33cd6	fs/namespace: correctly handle errors returned by grab_requested_mnt_ns grab_requested_mnt_ns was changed to return error codes on failure, but its callers were not updated to check for error pointers, still checking only for a NULL return value. This commit updates the callers to use IS_ERR() or IS_ERR_OR_NULL() and PTR_ERR() to correctly check for and propagate errors. This also makes sure that the logic actually works and mount namespace file descriptors can be used to refere to mounts. Christian Brauner <brauner@kernel.org> says: Rework the patch to be more ergonomic and in line with our overall error handling patterns. Fixes: `7b9d14af87` ("fs: allow mount namespace fd") Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrei Vagin <avagin@google.com> Link: https://patch.msgid.link/20251111062815.2546189-1-avagin@google.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-12 10:42:49 +01:00
Jiaqi Yan	ad9c62bd89	KVM: arm64: VM exit to userspace to handle SEA When APEI fails to handle a stage-2 synchronous external abort (SEA), today KVM injects an asynchronous SError to the VCPU then resumes it, which usually results in unpleasant guest kernel panic. One major situation of guest SEA is when vCPU consumes recoverable uncorrected memory error (UER). Although SError and guest kernel panic effectively stops the propagation of corrupted memory, guest may re-use the corrupted memory if auto-rebooted; in worse case, guest boot may run into poisoned memory. So there is room to recover from an UER in a more graceful manner. Alternatively KVM can redirect the synchronous SEA event to VMM to - Reduce blast radius if possible. VMM can inject a SEA to VCPU via KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison consumption or fault is not from guest kernel, blast radius can be limited to the triggering thread in guest userspace, so VM can keep running. - Allow VMM to protect from future memory poison consumption by unmapping the page from stage-2, or to interrupt guest of the poisoned page so guest kernel can unmap it from stage-1 page table. - Allow VMM to track SEA events that VM customers care about, to restart VM when certain number of distinct poison events have happened, to provide observability to customers in log management UI. Introduce an userspace-visible feature to enable VMM handle SEA: - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior when host APEI fails to claim a SEA, userspace can opt in this new capability to let KVM exit to userspace during SEA if it is not owned by host. - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this. KVM fills kvm_run.arm_sea with as much as possible information about the SEA, enabling VMM to emulate SEA to guest by itself. - Sanitized ESR_EL2. The general rule is to keep only the bits useful for userspace and relevant to guest memory. - Flags indicating if faulting guest physical address is valid. - Faulting guest physical and virtual addresses if valid. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://msgid.link/20251013185903.1372553-2-jiaqiyan@google.com Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-12 01:27:12 -08:00
Jeff Layton	1602bad16d	vfs: expose delegation support to userland Now that support for recallable directory delegations is available, expose this functionality to userland with new F_SETDELEG and F_GETDELEG commands for fcntl(). Note that this also allows userland to request a FL_DELEG type lease on files too. Userland applications that do will get signalled when there are metadata changes in addition to just data changes (which is a limitation of FL_LEASE leases). These commands accept a new "struct delegation" argument that contains a flags field for future expansion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-17-52f3feebb2f2@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-12 09:38:37 +01:00
Saeed Mahameed	0e535824d0	devlink: Introduce switchdev_inactive eswitch mode Adds DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE attribute to UAPI and documentation. Before having traffic flow through an eswitch, a user may want to have the ability to block traffic towards the FDB until FDB is fully programmed and the user is ready to send traffic to it. For example: when two eswitches are present for vports in a multi-PF setup, one eswitch may take over the traffic from the other when the user chooses. Before this take over, a user may want to first program the inactive eswitch and then once ready redirect traffic to this new eswitch. switchdev modes transition semantics: legacy->switchdev_inactive: Create switchdev mode normally, traffic not allowed to flow yet. switchdev_inactive->switchdev: Enable traffic to flow. switchdev->switchdev_inactive: Block traffic on the FDB, FDB and representros state and content is preserved. When eswitch is configured to this mode, traffic is ignored/dropped on this eswitch FDB, while current configuration is kept, e.g FDB rules and netdev representros are kept available, FDB programming is allowed. Example: # start inactive switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive # setup TC rules, representors etc .. # activate devlink dev eswitch set pci/0000:08:00.1 mode switchdev Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251108070404.1551708-2-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-11 13:17:53 +01:00
Li Nan	62ed1b5822	md: allow configuring logical block size Previously, raid array used the maximum logical block size (LBS) of all member disks. Adding a larger LBS disk at runtime could unexpectedly increase RAID's LBS, risking corruption of existing partitions. This can be reproduced by: ``` # LBS of sd[de] is 512 bytes, sdf is 4096 bytes. mdadm -CRq /dev/md0 -l1 -n3 /dev/sd[de] missing --assume-clean # LBS is 512 cat /sys/block/md0/queue/logical_block_size # create partition md0p1 parted -s /dev/md0 mklabel gpt mkpart primary 1MiB 100% lsblk \| grep md0p1 # LBS becomes 4096 after adding sdf mdadm --add -q /dev/md0 /dev/sdf cat /sys/block/md0/queue/logical_block_size # partition lost partprobe /dev/md0 lsblk \| grep md0p1 ``` Simply restricting larger-LBS disks is inflexible. In some scenarios, only disks with 512 bytes LBS are available currently, but later, disks with 4KB LBS may be added to the array. Making LBS configurable is the best way to solve this scenario. After this patch, the raid will: - store LBS in disk metadata - add a read-write sysfs 'mdX/logical_block_size' Future mdadm should support setting LBS via metadata field during RAID creation and the new sysfs. Though the kernel allows runtime LBS changes, users should avoid modifying it after creating partitions or filesystems to prevent compatibility issues. Only 1.x metadata supports configurable LBS. 0.90 metadata inits all fields to default values at auto-detect. Supporting 0.90 would require more extensive changes and no such use case has been observed. Note that many RAID paths rely on PAGE_SIZE alignment, including for metadata I/O. A larger LBS than PAGE_SIZE will result in metadata read/write failures. So this config should be prevented. Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-6-linan666@huaweicloud.com Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com>	2025-11-11 11:20:15 +08:00
Pavel Begunkov	6a77267d97	io_uring/query: return number of available queries It's useful to know which query opcodes are available. Extend the structure and return that. It's a trivial change, and even though it can be painlessly extended later, it'd still require adding a v2 of the structure. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-10 14:59:35 -07:00
Randy Dunlap	aaf46c6a6d	tee: <uapi/linux/tee.h: fix all kernel-doc issues Fix kernel-doc warnings so that there no other kernel-doc issues in <uapi/linux/tee.h>: - add ending ':' to some struct members as needed for kernel-doc - change struct name in kernel-doc to match the actual struct name (2x) - add a @params: kernel-doc entry multiple times Warning: tee.h:265 struct member 'ret_origin' not described in 'tee_ioctl_open_session_arg' Warning: tee.h:265 struct member 'num_params' not described in 'tee_ioctl_open_session_arg' Warning: tee.h:265 struct member 'params' not described in 'tee_ioctl_open_session_arg' Warning: tee.h:351 struct member 'num_params' not described in 'tee_iocl_supp_recv_arg' Warning: tee.h:351 struct member 'params' not described in 'tee_iocl_supp_recv_arg' Warning: tee.h:372 struct member 'num_params' not described in 'tee_iocl_supp_send_arg' Warning: tee.h:372 struct member 'params' not described in 'tee_iocl_supp_send_arg' Warning: tee.h:298: expecting prototype for struct tee_ioctl_invoke_func_arg. Prototype was for struct tee_ioctl_invoke_arg instead Warning: tee.h:473: expecting prototype for struct tee_ioctl_invoke_func_arg. Prototype was for struct tee_ioctl_object_invoke_arg instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>	2025-11-10 09:47:54 +01:00
Jakub Kicinski	f05d26198c	psp: add stats from psp spec to driver facing api Provide a driver api for reporting device statistics required by the "Implementation Requirements" section of the PSP Architecture Specification. Use a warning to ensure drivers report stats required by the spec. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251106002608.1578518-4-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:53:57 -08:00
Jakub Kicinski	dae4a92399	psp: report basic stats from the core Track and report stats common to all psp devices from the core. A 'stale-event' is when the core marks the rx state of an active psp_assoc as incapable of authenticating psp encapsulated data. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251106002608.1578518-2-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-07 18:53:56 -08:00
Linus Torvalds	9dc520632a	io_uring-6.18-20251106 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmkNN50QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv4cD/4tGXmgNcKDidyOKBy8ylY/BxTkZOVc1ZnX TXyrKYmzrgezgjY9sHQ0xBJTaDNN+8flMlRJs+eSUBZtQ+inoD6hF8t8bQg7zD79 SY/cdKUyC+hZINSNamy7k1ARqVfK/oyIZ1tYC5MEg/dWacW3BjV5P+AO5YIIyB4R tQjz4rXwHg+i8xqxQDaZLVYhY3VC5C3tl3zyxeykhoUsUeGEXNpiqKGP/3LMyFzc T+0IfBrZl7DXNv1+vpVfNspNekZrP+3qWkjQwvA+7DMrUhtnTNbmtsKRJDLK8afI Hq+7Rk6LCp7ibTY6AhNS7LMX4beQwPSIZ0tkvmqcvszyBI3Dsjc6bNu0GfylDJLU zk74gyflbOsCBIV4yhByiA8mooDRfPMH6vqfb3XQfxfPJKxgHC89EpLlgKBLnoXt Kcnj7UbnayneTVRNF8b9HuORBloI4S90cCMExXRza/v8wsLp+nbfLgqkb3CVBRKJ dTeitMKYsganwNH3dXT9BsY7UTK2EvziV8nDQtzP7Yn38VC8x+7wg5yV7+vjoQvV sI0GnrVIDPWDaxQ0HM5tSgXO4mbErV4FQs0Soad7/9fMHU3SLzW9Z61kv797RNCw AcUdfC1kZqXC2x1DeIrEPXBUMKYI27JCzvPcgNMzFrWxrnfeIQUZKE00IDG4iY9p wZhFNNo/Ww== =D3Vm -----END PGP SIGNATURE----- Merge tag 'io_uring-6.18-20251106' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - Remove the sync refill API that was added in this release, in anticipation of doing it in a better way for the next release - Fix type extension for calculating size off nr_pages, like we do in other spots * tag 'io_uring-6.18-20251106' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: fix types for region size calulation io_uring/zcrx: remove sync refill uapi	2025-11-07 07:52:45 -08:00
Daniel Golle	c6230446b1	net: dsa: add tagging driver for MaxLinear GSW1xx switch family Add support for a new DSA tagging protocol driver for the MaxLinear GSW1xx switch family. The GSW1xx switches use a proprietary 8-byte special tag inserted between the source MAC address and the EtherType field to indicate the source and destination ports for frames traversing the CPU port. Implement the tag handling logic to insert the special tag on transmit and parse it on receive. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Link: https://patch.msgid.link/0e973ebfd9433c30c96f50670da9e9449a0d98f2.1762170107.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 14:16:17 -08:00
Jakub Kicinski	1ec9871fbb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c `9222582ec5` ("Revert "wifi: ath12k: Fix missing station power save configuration"") `6917e268c4` ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig `b1d16f7c00` ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") `93f53db9f9` ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 09:27:40 -08:00
Linus Torvalds	c2c2ccfd4b	Including fixes from bluetooth and wireless. Current release - new code bugs: - ptp: expose raw cycles only for clocks with free-running counter - bonding: fix null-deref in actor_port_prio setting - mdio: ERR_PTR-check regmap pointer returned by device_node_to_regmap() - eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG Previous releases - regressions: - virtio_net: fix perf regression due to bad alignment of virtio_net_hdr_v1_hash - Revert "wifi: ath10k: avoid unnecessary wait for service ready message" caused regressions for QCA988x and QCA9984 - Revert "wifi: ath12k: Fix missing station power save configuration" caused regressions for WCN7850 - eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory corruptions after kexec Previous releases - always broken: - virtio-net: fix received packet length check for big packets - sctp: fix races in socket diag handling - wifi: add an hrtimer-based delayed work item to avoid low granularity of timers set relatively far in the future, and use it where it matters (e.g. when performing AP-scheduled channel switch) - eth: mlx5e: - correctly propagate error in case of module EEPROM read failure - fix HW-GRO on systems with PAGE_SIZE == 64kB - dsa: b53: fixes for tagging, link configuration / RMII, FDB, multicast - phy: lan8842: implement latest errata Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkMyx0ACgkQMUZtbf5S Irs4gA//fRGPHfEksbVO0lAR9QVUKq/Wvokt6DnLXQsi7js7kun7ymVFyU8tVacP sGoYbTTXO34XpijKVMTFe5WnnnSF3cri1/e+PYNXndKGHOheLvz0aq+CniiLexaT ag/opHX9+Io6x8DyOVkkRJYByvEcXgy1vFjhk5O04wn7tGPBNzvaKQJzjfVIiiWP thma/e77jVUqsNptS/VzJNCDwPB3Qi0fqylRovu7A59Kmr7GLZxaqZQpvM5/XjU3 s3NhNjNDawmiwxN2AIztXo+vMReqFaiCr+z36OcruE04CFslkQGmE/5g3FdDGg+Q RE7Wfgk2UJ+Q5Y1jnQWNYOkGaMd6bMgPQD/8zZvwEf163Gh1YBh0rtDY07DWV5FR wp5cmeNDo2Mf+nnCM2UaoUQS2AAmkub2aAIjnlvrefeJMKciyMqtQe+V02aIxuvK BPmeSCN1IOyAIDCRE3Fd6JFyk+4Z4YC6Hdm/WOG517eGrDh3yV9JE/+C3jdh3JwT lvaScKhCdyq/9mFqEcU/yR5Kc4Od6Rw0K0s5DM4LAbKSsxI6hp6SkuoFHUsqFsRR HL3ucy4c4hmLxz6AA5/+UVIPzFsY5cwOnhs3AU2UVuA95fH0QWqX2u2ll0rC9mAW xDXa/ai0/KAGvVYqjC+MDyF8zrtzSG7C40ZsxhdcU84s94ZNVFI= =a/Ml -----END PGP SIGNATURE----- Merge tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: Including fixes from bluetooth and wireless. Current release - new code bugs: - ptp: expose raw cycles only for clocks with free-running counter - bonding: fix null-deref in actor_port_prio setting - mdio: ERR_PTR-check regmap pointer returned by device_node_to_regmap() - eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG Previous releases - regressions: - virtio_net: fix perf regression due to bad alignment of virtio_net_hdr_v1_hash - Revert "wifi: ath10k: avoid unnecessary wait for service ready message" caused regressions for QCA988x and QCA9984 - Revert "wifi: ath12k: Fix missing station power save configuration" caused regressions for WCN7850 - eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory corruptions after kexec Previous releases - always broken: - virtio-net: fix received packet length check for big packets - sctp: fix races in socket diag handling - wifi: add an hrtimer-based delayed work item to avoid low granularity of timers set relatively far in the future, and use it where it matters (e.g. when performing AP-scheduled channel switch) - eth: mlx5e: - correctly propagate error in case of module EEPROM read failure - fix HW-GRO on systems with PAGE_SIZE == 64kB - dsa: b53: fixes for tagging, link configuration / RMII, FDB, multicast - phy: lan8842: implement latest errata" * tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) selftests/vsock: avoid false-positives when checking dmesg net: bridge: fix MST static key usage net: bridge: fix use-after-free due to MST port state bypass lan966x: Fix sleeping in atomic context bonding: fix NULL pointer dereference in actor_port_prio setting net: dsa: microchip: Fix reserved multicast address table programming net: wan: framer: pef2256: Switch to devm_mfd_add_devices() net: libwx: fix device bus LAN ID net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages net/mlx5e: SHAMPO, Fix skb size check for 64K pages net/mlx5e: SHAMPO, Fix header mapping for 64K pages net: ti: icssg-prueth: Fix fdb hash size configuration net/mlx5e: Fix return value in case of module EEPROM read error net: gro_cells: Reduce lock scope in gro_cell_poll libie: depend on DEBUG_FS when building LIBIE_FWLOG wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup netpoll: Fix deadlock in memory allocation under spinlock net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error virtio-net: fix received length check in big packets bnxt_en: Fix warning in bnxt_dl_reload_down() ...	2025-11-06 08:52:30 -08:00
Randy Dunlap	5f20bc206b	platform/x86: ISST: isst_if.h: fix all kernel-doc warnings Fix all kernel-doc warnings in <uapi/linux/isst_if.h>: - don't use "[]" in the variable name in kernel-doc - add a few missing entries - change "power_domain" to "power_domain_id" in kernel-doc to match the struct member name - add a leading '@' on a few existing kernel-doc lines - use '_' instead of '-' in struct member names Examples (but not all 27 warnings): Warning: include/uapi/linux/isst_if.h:63 struct member 'cpu_map' not described in 'isst_if_cpu_maps' Warning: ../include/uapi/linux/isst_if.h:95 struct member 'req_count' not described in 'isst_if_io_regs' Warning: include/uapi/linux/isst_if.h:132 struct member 'mbox_cmd' not described in 'isst_if_mbox_cmds' Warning: ../include/uapi/linux/isst_if.h:183 struct member 'supported' not described in 'isst_core_power' Warning: ../include/uapi/linux/isst_if.h:206 struct member 'power_domain_id' not described in 'isst_clos_param' Warning: ../include/uapi/linux/isst_if.h:239 struct member 'assoc_info' not described in 'isst_if_clos_assoc_cmds' Warning: ../include/uapi/linux/isst_if.h:286 struct member 'sst_tf_support' not described in 'isst_perf_level_info' Warning: ../include/uapi/linux/isst_if.h:375 struct member 'trl_freq_mhz' not described in 'isst_perf_level_data_info' Warning: ../include/uapi/linux/isst_if.h:475 struct member 'max_buckets' not described in 'isst_turbo_freq_info' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251023194615.180824-1-rdunlap@infradead.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2025-11-06 14:19:20 +02:00
Anton Protopopov	b4ce5923e7	bpf, x86: add new map type: instructions array On bpf(BPF_PROG_LOAD) syscall user-supplied BPF programs are translated by the verifier into "xlated" BPF programs. During this process the original instructions offsets might be adjusted and/or individual instructions might be replaced by new sets of instructions, or deleted. Add a new BPF map type which is aimed to keep track of how, for a given program, the original instructions were relocated during the verification. Also, besides keeping track of the original -> xlated mapping, make x86 JIT to build the xlated -> jitted mapping for every instruction listed in an instruction array. This is required for every future application of instruction arrays: static keys, indirect jumps and indirect calls. A map of the BPF_MAP_TYPE_INSN_ARRAY type must be created with a u32 keys and value of size 8. The values have different semantics for userspace and for BPF space. For userspace a value consists of two u32 values – xlated and jitted offsets. For BPF side the value is a real pointer to a jitted instruction. On map creation/initialization, before loading the program, each element of the map should be initialized to point to an instruction offset within the program. Before the program load such maps should be made frozen. After the program verification xlated and jitted offsets can be read via the bpf(2) syscall. If a tracked instruction is removed by the verifier, then the xlated offset is set to (u32)-1 which is considered to be too big for a valid BPF program offset. One such a map can, obviously, be used to track one and only one BPF program. If the verification process was unsuccessful, then the same map can be re-used to verify the program with a different log level. However, if the program was loaded fine, then such a map, being frozen in any case, can't be reused by other programs even after the program release. Example. Consider the following original and xlated programs: Original prog: Xlated prog: 0: r1 = 0x0 0: r1 = 0 1: (u32 )(r10 - 0x4) = r1 1: (u32 )(r10 -4) = r1 2: r2 = r10 2: r2 = r10 3: r2 += -0x4 3: r2 += -4 4: r1 = 0x0 ll 4: r1 = map[id:88] 6: call 0x1 6: r1 += 272 7: r0 = (u32 )(r2 +0) 8: if r0 >= 0x1 goto pc+3 9: r0 <<= 3 10: r0 += r1 11: goto pc+1 12: r0 = 0 7: r6 = r0 13: r6 = r0 8: if r6 == 0x0 goto +0x2 14: if r6 == 0x0 goto pc+4 9: call 0x76 15: r0 = 0xffffffff8d2079c0 17: r0 = (u64 )(r0 +0) 10: (u64 )(r6 + 0x0) = r0 18: (u64 )(r6 +0) = r0 11: r0 = 0x0 19: r0 = 0x0 12: exit 20: exit An instruction array map, containing, e.g., instructions [0,4,7,12] will be translated by the verifier to [0,4,13,20]. A map with index 5 (the middle of 16-byte instruction) or indexes greater than 12 (outside the program boundaries) would be rejected. The functionality provided by this patch will be extended in consequent patches to implement BPF Static Keys, indirect jumps, and indirect calls. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251105090410.1250500-2-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-05 17:31:25 -08:00
Linus Torvalds	5624d4c378	platform-drivers-x86 for v6.18-3 Fixes and New Hotkey Support - input + dell-wmi-base: Electronic privacy screen on/off hotkey support - int3472: Fix unregister double free - wireless-hotkey: Fix Kconfig typo The following is an automated shortlog grouped by driver: dell-wmi-base: - Handle electronic privacy screen on/off events Input: - Add keycodes for electronic privacy screen on/off hotkeys int3472: - Fix double free of GPIO device during unregister MAINTAINERS: - Update int3472 maintainers x86: Kconfig: - fix minor typo in help for WIRELESS_HOTKEY -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaQso/QAKCRBZrE9hU+XO Me2tAQCoij2NER2aThaFPzTjBfvIKF4DbpsSo9V0I2r+gR6xzAD/UWmliCDGQ0dV NS28/L982I716VK2Mv5SvdG9BKxAlwM= =Nnez -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "Fixes and New Hotkey Support: - input + dell-wmi-base: Electronic privacy screen on/off hotkey support - int3472: Fix unregister double free - wireless-hotkey: Fix Kconfig typo" * tag 'platform-drivers-x86-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform: x86: Kconfig: fix minor typo in help for WIRELESS_HOTKEY platform/x86: dell-wmi-base: Handle electronic privacy screen on/off events Input: Add keycodes for electronic privacy screen on/off hotkeys MAINTAINERS: Update int3472 maintainers platform/x86: int3472: Fix double free of GPIO device during unregister	2025-11-05 11:08:10 -08:00
Damien Le Moal	b30ffcdc0c	block: introduce BLKREPORTZONESV2 ioctl Introduce the new BLKREPORTZONESV2 ioctl command to allow user applications access to the fast zone report implemented by blkdev_report_zones_cached(). This new ioctl is defined as number 142 and is documented in include/uapi/linux/fs.h. Unlike the existing BLKREPORTZONES ioctl, this new ioctl uses the flags field of struct blk_zone_report also as an input. If the user sets the BLK_ZONE_REP_CACHED flag as an input, then blkdev_report_zones_cached() is used to generate the zone report using cached zone information. If this flag is not set, then BLKREPORTZONESV2 behaves in the same manner as BLKREPORTZONES and the zone report is generated by accessing the zoned device. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-05 08:07:21 -07:00
Damien Le Moal	0bf0e2e466	block: track zone conditions The function blk_revalidate_zone_cond() already caches the condition of all zones of a zoned block device in the zones_cond array of a gendisk. However, the zone conditions are updated only when the device is scanned or revalidated. Implement tracking of the runtime changes to zone conditions using the new cond field in struct blk_zone_wplug. The size of this structure remains 112 Bytes as the new field replaces the 4 Bytes padding at the end of the structure. Beause zones that do not have a zone write plug can be in the empty, implicit open, explicit open or full condition, the zones_cond array of a disk is used to track the conditions, of zones that do not have a zone write plug. The condition of such zone is updated in the disk zones_cond array when a zone reset, reset all or finish operation is executed, and also when a zone write plug is removed from the disk hash table when the zone becomes full. Since a device may automatically close an implicitly open zone when writing to an empty or closed zone, if the total number of open zones has reached the device limit, the BLK_ZONE_COND_IMP_OPEN and BLK_ZONE_COND_CLOSED zone conditions cannot be precisely tracked. To overcome this, the zone condition BLK_ZONE_COND_ACTIVE is introduced to represent a zone that has the condition BLK_ZONE_COND_IMP_OPEN, BLK_ZONE_COND_EXP_OPEN or BLK_ZONE_COND_CLOSED. This follows the definition of an active zone as defined in the NVMe Zoned Namespace specifications. As such, for a zoned device that has a limit on the maximum number of open zones, we will never have more zones in the BLK_ZONE_COND_ACTIVE condition than the device limit. This is compatible with the SCSI ZBC and ATA ZAC specifications for SMR HDDs as these devices do not have a limit on the number of active zones. The function disk_zone_wplug_set_wp_offset() is modified to use the new helper disk_zone_wplug_update_cond() to update a zone write plug condition whenever a zone write plug write offset is updated on submission or merging of write BIOs to a zone. The functions blk_zone_reset_bio_endio(), blk_zone_reset_all_bio_endio() and blk_zone_finish_bio_endio() are modified to update the condition of the zones targeted by reset, reset_all and finish operations, either using though disk_zone_wplug_set_wp_offset() for zones that have a zone write plug, or using the disk_zone_set_cond() helper to update the zones_cond array of the disk for zones that do not have a zone write plug. When a zone write plug is removed from the disk hash table (when the zone becomes empty or full), the condition of struct blk_zone_wplug is used to update the disk zones_cond array. Conversely, when a zone write plug is added to the disk hash table, the zones_cond array is used to initialize the zone write plug condition. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-05 08:07:21 -07:00
Matthieu Baerts (NGI0)	f88191c7f3	mptcp: pm: in-kernel: record fullmesh endp nb Instead of iterating over all endpoints, under RCU read lock, just to check if one of them as the fullmesh flag, we can keep a counter of fullmesh endpoint, similar to what is done with the other flags. This counter is now checked, before iterating over all endpoints. Similar to the other counters, this new one is also exposed. A userspace app can then know when it is being used in a fullmesh mode, with potentially (too) many subflows. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-1-b4166772d6bb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:15:06 -08:00
Michael S. Tsirkin	c3838262b8	virtio_net: fix alignment for virtio_net_hdr_v1_hash Changing alignment of header would mean it's no longer safe to cast a 2 byte aligned pointer between formats. Use two 16 bit fields to make it 2 byte aligned as previously. This fixes the performance regression since commit ("virtio_net: enable gso over UDP tunnel support.") as it uses virtio_net_hdr_v1_hash_tunnel which embeds virtio_net_hdr_v1_hash. Pktgen in guest + XDP_DROP on TAP + vhost_net shows the TX PPS is recovered from 2.4Mpps to 4.45Mpps. Fixes: `56a06bd40f` ("virtio_net: enable gso over UDP tunnel support.") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20251031060551.126-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:14:07 -08:00
Thomas Gleixner	d923739e2e	rseq: Simplify the event notification Since commit `0190e4198e` ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags") the bits in task::rseq_event_mask are meaningless and just extra work in terms of setting them individually. Aside of that the only relevant point where an event has to be raised is context switch. Neither the CPU nor MM CID can change without going through a context switch. Collapse them all into a single boolean which simplifies the code a lot and remove the pointless invocations which have been sprinkled all over the place for no value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.336978188@linutronix.de	2025-11-04 08:30:09 +01:00
Dan Williams	c0c1262fbf	PCI: Add PCIe Device 3 Extended Capability enumeration PCIe r7.0 Section 7.7.9 Device 3 Extended Capability Structure, defines the canonical location for determining the Flit Mode of a device. This status is a dependency for PCIe IDE enabling. Add a new fm_enabled flag to 'struct pci_dev'. Cc: Lukas Wunner <lukas@wunner.de> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Samuel Ortiz <sameo@rivosinc.com> Cc: Alexey Kardashevskiy <aik@amd.com> Cc: Xu Yilun <yilun.xu@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251031212902.2256310-6-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2025-11-03 19:27:41 -08:00
Dan Williams	3225f52cde	PCI/TSM: Establish Secure Sessions and Link Encryption The PCIe 7.0 specification, section 11, defines the Trusted Execution Environment (TEE) Device Interface Security Protocol (TDISP). This protocol definition builds upon Component Measurement and Authentication (CMA), and link Integrity and Data Encryption (IDE). It adds support for assigning devices (PCI physical or virtual function) to a confidential VM such that the assigned device is enabled to access guest private memory protected by technologies like Intel TDX, AMD SEV-SNP, RISCV COVE, or ARM CCA. The "TSM" (TEE Security Manager) is a concept in the TDISP specification of an agent that mediates between a "DSM" (Device Security Manager) and system software in both a VMM and a confidential VM. A VMM uses TSM ABIs to setup link security and assign devices. A confidential VM uses TSM ABIs to transition an assigned device into the TDISP "RUN" state and validate its configuration. From a Linux perspective the TSM abstracts many of the details of TDISP, IDE, and CMA. Some of those details leak through at times, but for the most part TDISP is an internal implementation detail of the TSM. CONFIG_PCI_TSM adds an "authenticated" attribute and "tsm/" subdirectory to pci-sysfs. Consider that the TSM driver may itself be a PCI driver. Userspace can watch for the arrival of a "TSM" device, /sys/class/tsm/tsm0/uevent KOBJ_CHANGE, to know when the PCI core has initialized TSM services. The operations that can be executed against a PCI device are split into two mutually exclusive operation sets, "Link" and "Security" (struct pci_tsm_{link,security}_ops). The "Link" operations manage physical link security properties and communication with the device's Device Security Manager firmware. These are the host side operations in TDISP. The "Security" operations coordinate the security state of the assigned virtual device (TDI). These are the guest side operations in TDISP. Only "link" (Secure Session and physical Link Encryption) operations are defined at this stage. There are placeholders for the device security (Trusted Computing Base entry / exit) operations. The locking allows for multiple devices to be executing commands simultaneously, one outstanding command per-device and an rwsem synchronizes the implementation relative to TSM registration/unregistration events. Thanks to Wu Hao for his work on an early draft of this support. Cc: Lukas Wunner <lukas@wunner.de> Cc: Samuel Ortiz <sameo@rivosinc.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Link: https://patch.msgid.link/20251031212902.2256310-5-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2025-11-03 19:27:41 -08:00
Dan Williams	f16469ee73	PCI/IDE: Enumerate Selective Stream IDE capabilities Link encryption is a new PCIe feature enumerated by "PCIe r7.0 section 7.9.26 IDE Extended Capability". It is both a standalone port + endpoint capability, and a building block for the security protocol defined by "PCIe r7.0 section 11 TEE Device Interface Security Protocol (TDISP)". That protocol coordinates device security setup between a platform TSM (TEE Security Manager) and a device DSM (Device Security Manager). While the platform TSM can allocate resources like Stream ID and manage keys, it still requires system software to manage the IDE capability register block. Add register definitions and basic enumeration in preparation for Selective IDE Stream establishment. A follow on change selects the new CONFIG_PCI_IDE symbol. Note that while the IDE specification defines both a point-to-point "Link Stream" and a Root Port to endpoint "Selective Stream", only "Selective Stream" is considered for Linux as that is the predominant mode expected by Trusted Execution Environment Security Managers (TSMs), and it is the security model that limits the number of PCI components within the TCB in a PCIe topology with switches. Co-developed-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Link: https://patch.msgid.link/20251031212902.2256310-3-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2025-11-03 19:27:40 -08:00
Oleksij Rempel	e6e93fb013	ethtool: netlink: add ETHTOOL_MSG_MSE_GET and wire up PHY MSE access Introduce the userspace entry point for PHY MSE diagnostics via ethtool netlink. This exposes the core API added previously and returns both capability information and one or more snapshots. Userspace sends ETHTOOL_MSG_MSE_GET. The reply carries: - ETHTOOL_A_MSE_CAPABILITIES: scale limits and timing information - ETHTOOL_A_MSE_CHANNEL_* nests: one or more snapshots (per-channel if available, otherwise WORST, otherwise LINK) Link down returns -ENETDOWN. Changes: - YAML: add attribute sets (mse, mse-capabilities, mse-snapshot) and the mse-get operation - UAPI (generated): add ETHTOOL_A_MSE_* enums and message IDs, ETHTOOL_MSG_MSE_GET/REPLY - ethtool core: add net/ethtool/mse.c implementing the request, register genl op, and hook into ethnl dispatch - docs: document MSE_GET in ethtool-netlink.rst The include/uapi/linux/ethtool_netlink_generated.h is generated from Documentation/netlink/specs/ethtool.yaml. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251027122801.982364-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-03 18:32:27 -08:00
Samiullah Khawaja	c18d4b190a	net: Extend NAPI threaded polling to allow kthread based busy polling Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to enable and disable threaded busy polling. When threaded busy polling is enabled for a NAPI, enable NAPI_STATE_THREADED also. When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to signal napi_complete_done not to rearm interrupts. Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread go to sleep. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Martin Karsten <mkarsten@uwaterloo.ca> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-03 18:11:40 -08:00
Alexei Starovoitov	5dae7453ec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc4 Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-11-03 14:59:55 -08:00
Christian Brauner	76b6f5dfb3	nstree: add listns() Add a new listns() system call that allows userspace to iterate through namespaces in the system. This provides a programmatic interface to discover and inspect namespaces, enhancing existing namespace apis. Currently, there is no direct way for userspace to enumerate namespaces in the system. Applications must resort to scanning /proc/<pid>/ns/ across all processes, which is: 1. Inefficient - requires iterating over all processes 2. Incomplete - misses inactive namespaces that aren't attached to any running process but are kept alive by file descriptors, bind mounts, or parent namespace references 3. Permission-heavy - requires access to /proc for many processes 4. No ordering or ownership. 5. No filtering per namespace type: Must always iterate and check all namespaces. The list goes on. The listns() system call solves these problems by providing direct kernel-level enumeration of namespaces. It is similar to listmount() but obviously tailored to namespaces. /* * @req: Pointer to struct ns_id_req specifying search parameters * @ns_ids: User buffer to receive namespace IDs * @nr_ns_ids: Size of ns_ids buffer (maximum number of IDs to return) * @flags: Reserved for future use (must be 0) / ssize_t listns(const struct ns_id_req req, u64 ns_ids, size_t nr_ns_ids, unsigned int flags); Returns: - On success: Number of namespace IDs written to ns_ids - On error: Negative error code / * @size: Structure size * @ns_id: Starting point for iteration; use 0 for first call, then * use the last returned ID for subsequent calls to paginate * @ns_type: Bitmask of namespace types to include (from enum ns_type): * 0: Return all namespace types * MNT_NS: Mount namespaces * NET_NS: Network namespaces * USER_NS: User namespaces * etc. Can be OR'd together * @user_ns_id: Filter results to namespaces owned by this user namespace: * 0: Return all namespaces (subject to permission checks) * LISTNS_CURRENT_USER: Namespaces owned by caller's user namespace * Other value: Namespaces owned by the specified user namespace ID / struct ns_id_req { __u32 size; / sizeof(struct ns_id_req) / __u32 spare; / Reserved, must be 0 / __u64 ns_id; / Last seen namespace ID (for pagination) / __u32 ns_type; / Filter by namespace type(s) / __u32 spare2; / Reserved, must be 0 / __u64 user_ns_id; / Filter by owning user namespace / }; Example 1: List all namespaces void list_all_namespaces(void) { struct ns_id_req req = { .size = sizeof(req), .ns_id = 0, / Start from beginning / .ns_type = 0, / All types / .user_ns_id = 0, / All user namespaces / }; uint64_t ids[100]; ssize_t ret; printf("All namespaces in the system:\n"); do { ret = listns(&req, ids, 100, 0); if (ret < 0) { perror("listns"); break; } for (ssize_t i = 0; i < ret; i++) printf(" Namespace ID: %llu\n", (unsigned long long)ids[i]); / Continue from last seen ID / if (ret > 0) req.ns_id = ids[ret - 1]; } while (ret == 100); / Buffer was full, more may exist / } Example 2: List network namespaces only void list_network_namespaces(void) { struct ns_id_req req = { .size = sizeof(req), .ns_id = 0, .ns_type = NET_NS, / Only network namespaces / .user_ns_id = 0, }; uint64_t ids[100]; ssize_t ret; ret = listns(&req, ids, 100, 0); if (ret < 0) { perror("listns"); return; } printf("Network namespaces: %zd found\n", ret); for (ssize_t i = 0; i < ret; i++) printf(" netns ID: %llu\n", (unsigned long long)ids[i]); } Example 3: List namespaces owned by current user namespace void list_owned_namespaces(void) { struct ns_id_req req = { .size = sizeof(req), .ns_id = 0, .ns_type = 0, / All types / .user_ns_id = LISTNS_CURRENT_USER, / Current userns / }; uint64_t ids[100]; ssize_t ret; ret = listns(&req, ids, 100, 0); if (ret < 0) { perror("listns"); return; } printf("Namespaces owned by my user namespace: %zd\n", ret); for (ssize_t i = 0; i < ret; i++) printf(" ns ID: %llu\n", (unsigned long long)ids[i]); } Example 4: List multiple namespace types void list_network_and_mount_namespaces(void) { struct ns_id_req req = { .size = sizeof(req), .ns_id = 0, .ns_type = NET_NS \| MNT_NS, / Network and mount / .user_ns_id = 0, }; uint64_t ids[100]; ssize_t ret; ret = listns(&req, ids, 100, 0); printf("Network and mount namespaces: %zd found\n", ret); } Example 5: Pagination through large namespace sets void list_all_with_pagination(void) { struct ns_id_req req = { .size = sizeof(req), .ns_id = 0, .ns_type = 0, .user_ns_id = 0, }; uint64_t ids[50]; size_t total = 0; ssize_t ret; printf("Enumerating all namespaces with pagination:\n"); while (1) { ret = listns(&req, ids, 50, 0); if (ret < 0) { perror("listns"); break; } if (ret == 0) break; / No more namespaces / total += ret; printf(" Batch: %zd namespaces\n", ret); / Last ID in this batch becomes start of next batch / req.ns_id = ids[ret - 1]; if (ret < 50) break; / Partial batch = end of results */ } printf("Total: %zu namespaces\n", total); } Permission Model listns() respects namespace isolation and capabilities: (1) Global listing (user_ns_id = 0): - Requires CAP_SYS_ADMIN in the namespace's owning user namespace - OR the namespace must be in the caller's namespace context (e.g., a namespace the caller is currently using) - User namespaces additionally allow listing if the caller has CAP_SYS_ADMIN in that user namespace itself (2) Owner-filtered listing (user_ns_id != 0): - Requires CAP_SYS_ADMIN in the specified owner user namespace - OR the namespace must be in the caller's namespace context - This allows unprivileged processes to enumerate namespaces they own (3) Visibility: - Only "active" namespaces are listed - A namespace is active if it has a non-zero __ns_ref_active count - This includes namespaces used by running processes, held by open file descriptors, or kept active by bind mounts - Inactive namespaces (kept alive only by internal kernel references) are not visible via listns() Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-19-2e6f823ebdc0@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-03 17:41:18 +01:00
Christian Brauner	3760342fd6	nstree: assign fixed ids to the initial namespaces The initial set of namespace comes with fixed inode numbers making it easy for userspace to identify them solely based on that information. This has long preceeded anything here. Similarly, let's assign fixed namespace ids for the initial namespaces. Kill the cookie and use a sequentially increasing number. This has the nice side-effect that the owning user namespace will always have a namespace id that is smaller than any of it's descendant namespaces. Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-15-2e6f823ebdc0@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-03 17:41:17 +01:00
Pavel Begunkov	819630bd6f	io_uring/zcrx: remove sync refill uapi There is a better way to handle the problem IORING_REGISTER_ZCRX_REFILL solves. The uapi can also be slightly adjusted to accommodate future extensions. Remove the feature for now, it'll be reworked for the next release. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-03 08:55:58 -07:00
Chaitanya Kulkarni	bc49af56ee	blktrace: add support for REQ_OP_WRITE_ZEROES tracing Currently, REQ_OP_WRITE_ZEROES operations are not handled in the blktrace infrastructure, resulting in incorrect or missing operation labels in ftrace blktrace output. This manifests as write-zeroes operations appearing with incorrect labels like "N" instead of a proper "WZ" designation. This patch adds complete support for REQ_OP_WRITE_ZEROES across the blktrace infrastructure: Add BLK_TC_WRITE_ZEROES trace category in blktrace_api.h and update BLK_TC_END_V2 marker accordingly Map REQ_OP_WRITE_ZEROES to BLK_TC_WRITE_ZEROES in __blk_add_trace() to ensure proper trace event categorization Update fill_rwbs() to generate "WZ" label for write-zeroes operations in ftrace output, making them easily identifiable Add "write-zeroes" string mapping in act_to_str array for debugfs filter interface Update blk_fill_rwbs() to handle REQ_OP_WRITE_ZEROES for block layer event tracing With this fix, write-zeroes operations are now correctly traced and displayed. =========================================================== BEFORE THIS PATCH =========================================================== blkdiscard -z -o 0 -l 40960 /dev/nvme0n1 blkdiscard-3809 [030] ..... 1212.253701: block_bio_queue: 259,0 NS 0 + 80 [blkdiscard] blkdiscard-3809 [030] ..... 1212.253703: block_getrq: 259,0 NS 0 + 80 [blkdiscard] blkdiscard-3809 [030] ..... 1212.253704: block_io_start: 259,0 NS 40960 () 0 + 80 be,0,4 [blkdiscard] blkdiscard-3809 [030] ..... 1212.253704: block_plug: [blkdiscard] blkdiscard-3809 [030] ..... 1212.253706: block_unplug: [blkdiscard] 1 blkdiscard-3809 [030] ..... 1212.253706: block_rq_insert: 259,0 NS 40960 () 0 + 80 be,0,4 [blkdiscard] kworker/30:1H-566 [030] ..... 1212.253726: block_rq_issue: 259,0 NS 40960 () 0 + 80 be,0,4 [kworker/30:1H] <idle>-0 [030] d.h1. 1212.253957: block_rq_complete: 259,0 NS () 0 + 80 be,0,4 [0] <idle>-0 [030] dNh1. 1212.253960: block_io_done: 259,0 NS 0 () 0 + 0 none,0,0 [swapper/30] Trace Event Breakdown: Event \| Device \| Op \| Sector \| Sectors \| Byte Size \| Calculation block_bio_queue \| 259,0 \| NS \| 0 \| 80 \| - \| 80 × 512 = 40,960 block_getrq \| 259,0 \| NS \| 0 \| 80 \| - \| 80 × 512 = 40,960 block_io_start \| 259,0 \| NS \| 0 \| 80 \| 40960 \| Direct from trace block_rq_insert \| 259,0 \| NS \| 0 \| 80 \| 40960 \| Direct from trace block_rq_issue \| 259,0 \| NS \| 0 \| 80 \| 40960 \| Direct from trace block_rq_complete \| 259,0 \| NS \| 0 \| 80 \| - \| 80 × 512 = 40,960 block_io_done \| 259,0 \| NS \| 0 \| 0 \| 0 \| Completion (no data) Total Bytes Transferred: Sectors: 80 Bytes: 80 × 512 = 40,960 bytes =========================================================== AFTER THIS PATCH =========================================================== blkdiscard -z -o 0 -l 40960 /dev/nvme0n1 blkdiscard-2477 [020] ..... 960.989131: block_bio_queue: 259,0 WZS 0 + 80 [blkdiscard] blkdiscard-2477 [020] ..... 960.989134: block_getrq: 259,0 WZS 0 + 80 [blkdiscard] blkdiscard-2477 [020] ..... 960.989135: block_io_start: 259,0 WZS 40960 () 0 + 80 be,0,4 [blkdiscard] blkdiscard-2477 [020] ..... 960.989138: block_plug: [blkdiscard] blkdiscard-2477 [020] ..... 960.989140: block_unplug: [blkdiscard] 1 blkdiscard-2477 [020] ..... 960.989141: block_rq_insert: 259,0 WZS 40960 () 0 + 80 be,0,4 [blkdiscard] kworker/20:1H-736 [020] ..... 960.989166: block_rq_issue: 259,0 WZS 40960 () 0 + 80 be,0,4 [kworker/20:1H] <idle>-0 [020] d.h1. 960.989476: block_rq_complete: 259,0 WZS () 0 + 80 be,0,4 [0] <idle>-0 [020] dNh1. 960.989482: block_io_done: 259,0 WZS 0 () 0 + 0 none,0,0 [swapper/20] Trace Event Breakdown: Event \| Device \| Op \| Sector \| Sectors \| Byte Size \| Calculation block_bio_queue \| 259,0 \| WZS \| 0 \| 80 \| - \| 80 × 512 = 40,960 block_getrq \| 259,0 \| WZS \| 0 \| 80 \| - \| 80 × 512 = 40,960 block_io_start \| 259,0 \| WZS \| 0 \| 80 \| 40960 \| Direct from trace block_rq_insert \| 259,0 \| WZS \| 0 \| 80 \| 40960 \| Direct from trace block_rq_issue \| 259,0 \| WZS \| 0 \| 80 \| 40960 \| Direct from trace block_rq_complete \| 259,0 \| WZS \| 0 \| 80 \| - \| 80 × 512 = 40,960 block_io_done \| 259,0 \| WZS \| 0 \| 0 \| 0 \| Completion (no data) Total Bytes Transferred: Sectors: 80 Bytes: 80 × 512 = 40,960 bytes Tested with ftrace blktrace on NVMe devices using blkdiscard with the -z (write-zeroes) flag. Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-03 08:30:56 -07:00
Ivan Vecera	30176bf7c8	dpll: add phase-adjust-gran pin attribute Phase-adjust values are currently limited by a min-max range. Some hardware requires, for certain pin types, that values be multiples of a specific granularity, as in the zl3073x driver. Add a `phase-adjust-gran` pin attribute and an appropriate field in dpll_pin_properties. If set by the driver, use its value to validate user-provided phase-adjust values. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20251029153207.178448-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-31 17:59:17 -07:00
Christian Brauner	036375522b	pidfs: expose coredump signal Userspace needs access to the signal that caused the coredump before the coredumping process has been reaped. Expose it as part of the coredump information in struct pidfd_info. After the process has been reaped that info is also available as part of PIDFD_INFO_EXIT's exit_code field. Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-8-ca449b7b7aa0@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-10-30 14:25:14 +01:00
Christian Brauner	dfd78546c9	pidfd: add a new supported_mask field Some of the future fields in struct pidfd_info can be optional. If the kernel has nothing to emit in that field, then it doesn't set the flag in the reply. This presents a problem: There is currently no way to know what mask flags the kernel supports since one can't always count on them being in the reply. Add a new PIDFD_INFO_SUPPORTED_MASK flag and field that the kernel can set in the reply. Userspace can use this to determine if the fields it requires from the kernel are supported. This also gives us a way to deprecate fields in the future, if that should become necessary. Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-5-ca449b7b7aa0@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-10-30 14:25:13 +01:00
Christian Brauner	4061c43a99	pidfs: add missing PIDFD_INFO_SIZE_VER1 We grew struct pidfd_info not too long ago. Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-3-ca449b7b7aa0@kernel.org Fixes: `1d8db6fd69` ("pidfs, coredump: add PIDFD_INFO_COREDUMP") Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-10-30 14:25:13 +01:00
Peter Zijlstra	c69993ecdd	perf: Support deferred user unwind Add support for deferred userspace unwind to perf. Where perf currently relies on in-place stack unwinding; from NMI context and all that. This moves the userspace part of the unwind to right before the return-to-userspace. This has two distinct benefits, the biggest is that it moves the unwind to a faultable context. It becomes possible to fault in debug info (.eh_frame, SFrame etc.) that might not otherwise be readily available. And secondly, it de-duplicates the user callchain where multiple samples happen during the same kernel entry. To facilitate this the perf interface is extended with a new record type: PERF_RECORD_CALLCHAIN_DEFERRED and two new attribute flags: perf_event_attr::defer_callchain - to request the user unwind be deferred perf_event_attr::defer_output - to request PERF_RECORD_CALLCHAIN_DEFERRED records The existing PERF_RECORD_SAMPLE callchain section gets a new context type: PERF_CONTEXT_USER_DEFERRED After which will come a single entry, denoting the 'cookie' of the deferred callchain that should be attached here, matching the 'cookie' field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED. The 'defer_callchain' flag is expected on all events with PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event responsible for collecting side-band events (like mmap, comm etc.). Setting 'defer_output' on multiple events will get you duplicated PERF_RECORD_CALLCHAIN_DEFERRED records. Based on earlier patches by Josh and Steven. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net	2025-10-29 10:29:58 +01:00
Qinxin Xia	f74ee32963	tools/dma: move dma_map_benchmark from selftests to tools/dma dma_map_benchmark is a standalone developer tool rather than an automated selftest. It has no pass/fail criteria, expects manual invocation, and is built as a normal userspace binary. Move it to tools/dma/ and add a minimal Makefile. Suggested-by: Marek Szyprowski <m.szyprowski@samsung.com> Suggested-by: Barry Song <baohua@kernel.org> Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251028120900.2265511-3-xiaqinxin@huawei.com	2025-10-29 09:41:40 +01:00
PIYUSH CHOUDHARY	18cd0a9c7a	video: fb: Fix typo in comment in fb.h Fix typo: "verical" -> "vertical" in macro description Signed-off-by: PIYUSH CHOUDHARY <mercmerc961@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org	2025-10-28 22:59:19 +01:00
Hans de Goede	8f3eaad981	Input: Add keycodes for electronic privacy screen on/off hotkeys Add keycodes for hotkeys toggling the electronic privacy screen found on some laptops on/off. There already is an API for eprivacy screens as kernel-mode-setting drm connector object properties: https://www.kernel.org/doc/html/latest/gpu/drm-kms.html#standard-connector-properties this API also supports reporting when the eprivacy screen is turned on/off by the embedded-controller (EC) in response to hotkey presses. But on some laptops (e.g. the Dell Latitude 7300) the firmware does not allow querying the presence nor the status of the eprivacy screen at boot. This makes it impossible to implement the drm connector properties API since drm objects do not allow adding new properties after creation and the presence of the eprivacy cannot be detected at boot. The first notice of the presence of an eprivacy screen on these laptops is an EC generated (WMI) event when the eprivacy screen hotkeys are pressed. In this case the new keycodes this change adds can be generated to notify userspace of the eprivacy screen on/off hotkeys being pressed, so that userspace can show the usual on-screen-display (OSD) notification for eprivacy screen on/off to the user. This is similar to how e.g. touchpad on/off keycodes are used to show the touchpad on/off OSD. Signed-off-by: Hans de Goede <hansg@kernel.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://patch.msgid.link/20251020152331.52870-2-hansg@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2025-10-28 17:11:57 +02:00
Xu Kuohai	feeaf1346f	bpf: Add overwrite mode for BPF ring buffer When the BPF ring buffer is full, a new event cannot be recorded until one or more old events are consumed to make enough space for it. In cases such as fault diagnostics, where recent events are more useful than older ones, this mechanism may lead to critical events being lost. So add overwrite mode for BPF ring buffer to address it. In this mode, the new event overwrites the oldest event when the buffer is full. The basic idea is as follows: 1. producer_pos tracks the next position to record new event. When there is enough free space, producer_pos is simply advanced by producer to make space for the new event. 2. To avoid waiting for consumer when the buffer is full, a new variable, overwrite_pos, is introduced for producer. It points to the oldest event committed in the buffer. It is advanced by producer to discard one or more oldest events to make space for the new event when the buffer is full. 3. pending_pos tracks the oldest event to be committed. pending_pos is never passed by producer_pos, so multiple producers never write to the same position at the same time. The following example diagrams show how it works in a 4096-byte ring buffer. 1. At first, {producer,overwrite,pending,consumer}_pos are all set to 0. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| +-----------------------------------------------------------------------+ ^ \| \| producer_pos = 0 overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 2. Now reserve a 512-byte event A. There is enough free space, so A is allocated at offset 0. And producer_pos is advanced to 512, the end of A. Since A is not submitted, the BUSY bit is set. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| A \| \| \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 512 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 3. Reserve event B, size 1024. B is allocated at offset 512 with BUSY bit set, and producer_pos is advanced to the end of B. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| A \| B \| \| \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 1536 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 4. Reserve event C, size 2048. C is allocated at offset 1536, and producer_pos is advanced to 3584. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| [BUSY] \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ \| \| \| \| \| producer_pos = 3584 \| overwrite_pos = 0 pending_pos = 0 consumer_pos = 0 5. Submit event A. The BUSY bit of A is cleared. B becomes the oldest event to be committed, so pending_pos is advanced to 512, the start of B. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| \| [BUSY] \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| \| \| pending_pos = 512 producer_pos = 3584 \| overwrite_pos = 0 consumer_pos = 0 6. Submit event B. The BUSY bit of B is cleared, and pending_pos is advanced to the start of C, which is now the oldest event to be committed. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| A \| B \| C \| \| \| \| \| [BUSY] \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| \| \| pending_pos = 1536 producer_pos = 3584 \| overwrite_pos = 0 consumer_pos = 0 7. Reserve event D, size 1536 (3 * 512). There are 2048 bytes not being written between producer_pos (currently 3584) and pending_pos, so D is allocated at offset 3584, and producer_pos is advanced by 1536 (from 3584 to 5120). Since event D will overwrite all bytes of event A and the first 512 bytes of event B, overwrite_pos is advanced to the start of event C, the oldest event that is not overwritten. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| D End \| \| C \| D Begin\| \| [BUSY] \| \| [BUSY] \| [BUSY] \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| pending_pos = 1536 \| \| overwrite_pos = 1536 \| \| \| producer_pos=5120 \| consumer_pos = 0 8. Reserve event E, size 1024. Although there are 512 bytes not being written between producer_pos and pending_pos, E cannot be reserved, as it would overwrite the first 512 bytes of event C, which is still being written. 9. Submit event C and D. pending_pos is advanced to the end of D. 0 512 1024 1536 2048 2560 3072 3584 4096 +-----------------------------------------------------------------------+ \| \| \| \| \| \| D End \| \| C \| D Begin\| \| \| \| \| \| +-----------------------------------------------------------------------+ ^ ^ ^ \| \| \| \| \| overwrite_pos = 1536 \| \| \| producer_pos=5120 \| pending_pos=5120 \| consumer_pos = 0 The performance data for overwrite mode will be provided in a follow-up patch that adds overwrite-mode benchmarks. A sample of performance data for non-overwrite mode, collected on an x86_64 CPU and an arm64 CPU, before and after this patch, is shown below. As we can see, no obvious performance regression occurs. - x86_64 (AMD EPYC 9654) Before: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.623 ± 0.027M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 15.812 ± 0.014M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 7.871 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 6.703 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 2.896 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 2.054 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 1.864 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 1.580 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 1.484 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 1.369 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 1.316 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 1.272 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 1.239 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 1.226 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 1.213 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 1.193 ± 0.001M/s (drops 0.000 ± 0.000M/s) After: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.845 ± 0.036M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 15.889 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 8.155 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 6.708 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 2.918 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 2.065 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 1.870 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 1.582 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 1.482 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 1.372 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 1.323 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 1.264 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 1.236 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 1.209 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 1.189 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 1.165 ± 0.002M/s (drops 0.000 ± 0.000M/s) - arm64 (HiSilicon Kunpeng 920) Before: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.310 ± 0.623M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 9.947 ± 0.004M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 6.634 ± 0.011M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 4.502 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 3.888 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 3.372 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 3.189 ± 0.010M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 2.998 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 3.086 ± 0.018M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 2.845 ± 0.004M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 2.815 ± 0.008M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 2.771 ± 0.009M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 2.814 ± 0.011M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 2.752 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 2.695 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 2.710 ± 0.006M/s (drops 0.000 ± 0.000M/s) After: Ringbuf, multi-producer contention ================================== rb-libbpf nr_prod 1 11.283 ± 0.550M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 9.993 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 6.898 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 5.257 ± 0.001M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 3.830 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 3.528 ± 0.013M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 3.265 ± 0.018M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 2.990 ± 0.007M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 2.929 ± 0.014M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 2.898 ± 0.010M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 2.818 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 2.789 ± 0.012M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 2.770 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 2.651 ± 0.007M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 2.669 ± 0.005M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 2.695 ± 0.009M/s (drops 0.000 ± 0.000M/s) Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251018035738.4039621-2-xukuohai@huaweicloud.com	2025-10-27 19:42:39 -07:00
Wilfred Mallawa	82cb5be6ad	net/tls: support setting the maximum payload size During a handshake, an endpoint may specify a maximum record size limit. Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the maximum record size. Meaning that, the outgoing records from the kernel can exceed a lower size negotiated during the handshake. In such a case, the TLS endpoint must send a fatal "record_overflow" alert [1], and thus the record is discarded. Upcoming Western Digital NVMe-TCP hardware controllers implement TLS support. For these devices, supporting TLS record size negotiation is necessary because the maximum TLS record size supported by the controller is less than the default 16KB currently used by the kernel. Currently, there is no way to inform the kernel of such a limit. This patch adds support to a new setsockopt() option `TLS_TX_MAX_PAYLOAD_LEN` that allows for setting the maximum plaintext fragment size. Once set, outgoing records are no larger than the size specified. This option can be used to specify the record size limit. [1] https://www.rfc-editor.org/rfc/rfc8449 Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251022001937.20155-1-wilfred.opensource@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-27 16:13:42 -07:00
Mykyta Yatsenko	531b87d865	bpf: widen dynptr size/offset to 64 bit Dynptr currently caps size and offset at 24 bits, which isn’t sufficient for file-backed use cases; even 32 bits can be limiting. Refactor dynptr helpers/kfuncs to use 64-bit size and offset, ensuring consistency across the APIs. This change does not affect internals of xdp, skb or other dynptrs, which continue to behave as before. Also it does not break binary compatibility. The widening enables large-file access support via dynptr, implemented in the next patches. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251026203853.135105-3-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-27 09:56:26 -07:00
Jakub Kicinski	2b7553db91	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-23 10:53:08 -07:00

1 2 3 4 5 ...

12002 Commits (960609b22be58baa16823103894de3c6858e6cf4)